Add scan flow MVP and local Axiom skill workspace

This snapshot establishes the camera-to-result recognition flow and related tests while checking in the project skill/docs assets required for the configured local tooling.
2026-04-19 21:11:32 +02:00
parent 577214d474
commit a60a76b797
679 changed files with 138964 additions and 73 deletions
--- a/.claude/skills/axiom-ios-vision/.openskills.json
+++ b/.claude/skills/axiom-ios-vision/.openskills.json
@@ -0,0 +1,7 @@
+{
+  "source": "CharlesWiltgen/Axiom",
+  "sourceType": "git",
+  "repoUrl": "https://github.com/CharlesWiltgen/Axiom",
+  "subpath": ".claude-plugin/plugins/axiom/skills/axiom-ios-vision",
+  "installedAt": "2026-04-12T08:05:35.627Z"
+}
--- a/.claude/skills/axiom-ios-vision/SKILL.md
+++ b/.claude/skills/axiom-ios-vision/SKILL.md
@@ -0,0 +1,151 @@
+---
+name: axiom-ios-vision
+description: Use when implementing ANY computer vision feature - image analysis, object detection, pose detection, person segmentation, subject lifting, hand/body pose tracking.
+license: MIT
+---
+
+# iOS Computer Vision Router
+
+**You MUST use this skill for ANY computer vision work using the Vision framework.**
+
+## When to Use
+
+Use this router when:
+- Analyzing images or video
+- Detecting objects, faces, or people
+- Tracking hand or body pose
+- Segmenting people or subjects
+- Lifting subjects from backgrounds
+- Recognizing text in images (OCR)
+- Detecting barcodes or QR codes
+- Scanning documents
+- Using VisionKit or DataScannerViewController
+- Integrating with Visual Intelligence (iOS 26+ system camera feature)
+
+## Routing Logic
+
+### Vision Work
+
+**Implementation patterns** → `/skill axiom-vision`
+- Subject segmentation (VisionKit)
+- Hand pose detection (21 landmarks)
+- Body pose detection (2D/3D)
+- Person segmentation
+- Face detection
+- Isolating objects while excluding hands
+- Text recognition (VNRecognizeTextRequest)
+- Barcode/QR detection (VNDetectBarcodesRequest)
+- Document scanning (VNDocumentCameraViewController)
+- Live scanning (DataScannerViewController)
+- Structured document extraction (RecognizeDocumentsRequest, iOS 26+)
+
+**API reference** → `/skill axiom-vision-ref`
+- Complete Vision framework API
+- VNDetectHumanHandPoseRequest
+- VNDetectHumanBodyPoseRequest
+- VNGenerateForegroundInstanceMaskRequest
+- VNRecognizeTextRequest (fast/accurate modes)
+- VNDetectBarcodesRequest (symbologies)
+- DataScannerViewController delegates
+- RecognizeDocumentsRequest (iOS 26+)
+- Coordinate conversion patterns
+
+**Visual Intelligence integration** → `/skill axiom-vision-ref` (see Visual Intelligence Integration section)
+- Making app content discoverable to Visual Intelligence camera
+- `IntentValueQuery` and `SemanticContentDescriptor`
+- Deep linking from Visual Intelligence results
+
+**Diagnostics** → `/skill axiom-vision-diag`
+- Subject not detected
+- Hand pose missing landmarks
+- Low confidence observations
+- Performance issues
+- Coordinate conversion bugs
+- Text not recognized or wrong characters
+- Barcodes not detected
+- DataScanner showing blank or no items
+- Document edges not detected
+
+## Decision Tree
+
+1. Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → vision
+2. Visual Intelligence system integration (camera feature, iOS 26+)? → vision-ref (Visual Intelligence section)
+3. Need API reference / code examples? → vision-ref
+4. Debugging issues (detection failures, confidence, coordinates)? → vision-diag
+
+## Anti-Rationalization
+
+| Thought | Reality |
+|---------|---------|
+| "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision covers them. |
+| "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision has the patterns. |
+| "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision covers complex scenarios. |
+| "Visual Intelligence is just the camera API" | Visual Intelligence is a system-level feature requiring IntentValueQuery and SemanticContentDescriptor. vision-ref has the integration section. |
+
+## Critical Patterns
+
+**vision**:
+- Subject segmentation with VisionKit
+- Hand pose detection (21 landmarks)
+- Body pose detection (2D/3D, up to 4 people)
+- Isolating objects while excluding hands
+- CoreImage HDR compositing
+- Text recognition (fast vs accurate modes)
+- Barcode detection (symbology selection)
+- Document scanning with perspective correction
+- Live scanning with DataScannerViewController
+- Structured document extraction (iOS 26+)
+
+**vision-diag**:
+- Subject detection failures
+- Landmark tracking issues
+- Performance optimization
+- Observation confidence thresholds
+- Text recognition failures (language, contrast)
+- Barcode detection issues (symbology, distance)
+- DataScanner troubleshooting
+- Document edge detection problems
+
+## Example Invocations
+
+User: "How do I detect hand pose in an image?"
+→ Invoke: `/skill axiom-vision`
+
+User: "Isolate a subject but exclude the user's hands"
+→ Invoke: `/skill axiom-vision`
+
+User: "How do I read text from an image?"
+→ Invoke: `/skill axiom-vision`
+
+User: "Scan QR codes with the camera"
+→ Invoke: `/skill axiom-vision`
+
+User: "How do I implement document scanning?"
+→ Invoke: `/skill axiom-vision`
+
+User: "Use DataScannerViewController for live text"
+→ Invoke: `/skill axiom-vision`
+
+User: "Subject detection isn't working"
+→ Invoke: `/skill axiom-vision-diag`
+
+User: "Text recognition returns wrong characters"
+→ Invoke: `/skill axiom-vision-diag`
+
+User: "Barcode not being detected"
+→ Invoke: `/skill axiom-vision-diag`
+
+User: "Show me VNDetectHumanBodyPoseRequest examples"
+→ Invoke: `/skill axiom-vision-ref`
+
+User: "What symbologies does VNDetectBarcodesRequest support?"
+→ Invoke: `/skill axiom-vision-ref`
+
+User: "RecognizeDocumentsRequest API reference"
+→ Invoke: `/skill axiom-vision-ref`
+
+User: "How do I make my app work with Visual Intelligence?"
+→ Invoke: `/skill axiom-vision-ref`
+
+User: "How do users discover my app content through the camera?"
+→ Invoke: `/skill axiom-vision-ref`