Add scan flow MVP and local Axiom skill workspace
This snapshot establishes the camera-to-result recognition flow and related tests while checking in the project skill/docs assets required for the configured local tooling.
This commit is contained in:
7
.claude/skills/axiom-ios-vision/.openskills.json
Normal file
7
.claude/skills/axiom-ios-vision/.openskills.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"source": "CharlesWiltgen/Axiom",
|
||||
"sourceType": "git",
|
||||
"repoUrl": "https://github.com/CharlesWiltgen/Axiom",
|
||||
"subpath": ".claude-plugin/plugins/axiom/skills/axiom-ios-vision",
|
||||
"installedAt": "2026-04-12T08:05:35.627Z"
|
||||
}
|
||||
151
.claude/skills/axiom-ios-vision/SKILL.md
Normal file
151
.claude/skills/axiom-ios-vision/SKILL.md
Normal file
@@ -0,0 +1,151 @@
|
||||
---
|
||||
name: axiom-ios-vision
|
||||
description: Use when implementing ANY computer vision feature - image analysis, object detection, pose detection, person segmentation, subject lifting, hand/body pose tracking.
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# iOS Computer Vision Router
|
||||
|
||||
**You MUST use this skill for ANY computer vision work using the Vision framework.**
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this router when:
|
||||
- Analyzing images or video
|
||||
- Detecting objects, faces, or people
|
||||
- Tracking hand or body pose
|
||||
- Segmenting people or subjects
|
||||
- Lifting subjects from backgrounds
|
||||
- Recognizing text in images (OCR)
|
||||
- Detecting barcodes or QR codes
|
||||
- Scanning documents
|
||||
- Using VisionKit or DataScannerViewController
|
||||
- Integrating with Visual Intelligence (iOS 26+ system camera feature)
|
||||
|
||||
## Routing Logic
|
||||
|
||||
### Vision Work
|
||||
|
||||
**Implementation patterns** → `/skill axiom-vision`
|
||||
- Subject segmentation (VisionKit)
|
||||
- Hand pose detection (21 landmarks)
|
||||
- Body pose detection (2D/3D)
|
||||
- Person segmentation
|
||||
- Face detection
|
||||
- Isolating objects while excluding hands
|
||||
- Text recognition (VNRecognizeTextRequest)
|
||||
- Barcode/QR detection (VNDetectBarcodesRequest)
|
||||
- Document scanning (VNDocumentCameraViewController)
|
||||
- Live scanning (DataScannerViewController)
|
||||
- Structured document extraction (RecognizeDocumentsRequest, iOS 26+)
|
||||
|
||||
**API reference** → `/skill axiom-vision-ref`
|
||||
- Complete Vision framework API
|
||||
- VNDetectHumanHandPoseRequest
|
||||
- VNDetectHumanBodyPoseRequest
|
||||
- VNGenerateForegroundInstanceMaskRequest
|
||||
- VNRecognizeTextRequest (fast/accurate modes)
|
||||
- VNDetectBarcodesRequest (symbologies)
|
||||
- DataScannerViewController delegates
|
||||
- RecognizeDocumentsRequest (iOS 26+)
|
||||
- Coordinate conversion patterns
|
||||
|
||||
**Visual Intelligence integration** → `/skill axiom-vision-ref` (see Visual Intelligence Integration section)
|
||||
- Making app content discoverable to Visual Intelligence camera
|
||||
- `IntentValueQuery` and `SemanticContentDescriptor`
|
||||
- Deep linking from Visual Intelligence results
|
||||
|
||||
**Diagnostics** → `/skill axiom-vision-diag`
|
||||
- Subject not detected
|
||||
- Hand pose missing landmarks
|
||||
- Low confidence observations
|
||||
- Performance issues
|
||||
- Coordinate conversion bugs
|
||||
- Text not recognized or wrong characters
|
||||
- Barcodes not detected
|
||||
- DataScanner showing blank or no items
|
||||
- Document edges not detected
|
||||
|
||||
## Decision Tree
|
||||
|
||||
1. Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → vision
|
||||
2. Visual Intelligence system integration (camera feature, iOS 26+)? → vision-ref (Visual Intelligence section)
|
||||
3. Need API reference / code examples? → vision-ref
|
||||
4. Debugging issues (detection failures, confidence, coordinates)? → vision-diag
|
||||
|
||||
## Anti-Rationalization
|
||||
|
||||
| Thought | Reality |
|
||||
|---------|---------|
|
||||
| "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision covers them. |
|
||||
| "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision has the patterns. |
|
||||
| "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision covers complex scenarios. |
|
||||
| "Visual Intelligence is just the camera API" | Visual Intelligence is a system-level feature requiring IntentValueQuery and SemanticContentDescriptor. vision-ref has the integration section. |
|
||||
|
||||
## Critical Patterns
|
||||
|
||||
**vision**:
|
||||
- Subject segmentation with VisionKit
|
||||
- Hand pose detection (21 landmarks)
|
||||
- Body pose detection (2D/3D, up to 4 people)
|
||||
- Isolating objects while excluding hands
|
||||
- CoreImage HDR compositing
|
||||
- Text recognition (fast vs accurate modes)
|
||||
- Barcode detection (symbology selection)
|
||||
- Document scanning with perspective correction
|
||||
- Live scanning with DataScannerViewController
|
||||
- Structured document extraction (iOS 26+)
|
||||
|
||||
**vision-diag**:
|
||||
- Subject detection failures
|
||||
- Landmark tracking issues
|
||||
- Performance optimization
|
||||
- Observation confidence thresholds
|
||||
- Text recognition failures (language, contrast)
|
||||
- Barcode detection issues (symbology, distance)
|
||||
- DataScanner troubleshooting
|
||||
- Document edge detection problems
|
||||
|
||||
## Example Invocations
|
||||
|
||||
User: "How do I detect hand pose in an image?"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "Isolate a subject but exclude the user's hands"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "How do I read text from an image?"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "Scan QR codes with the camera"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "How do I implement document scanning?"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "Use DataScannerViewController for live text"
|
||||
→ Invoke: `/skill axiom-vision`
|
||||
|
||||
User: "Subject detection isn't working"
|
||||
→ Invoke: `/skill axiom-vision-diag`
|
||||
|
||||
User: "Text recognition returns wrong characters"
|
||||
→ Invoke: `/skill axiom-vision-diag`
|
||||
|
||||
User: "Barcode not being detected"
|
||||
→ Invoke: `/skill axiom-vision-diag`
|
||||
|
||||
User: "Show me VNDetectHumanBodyPoseRequest examples"
|
||||
→ Invoke: `/skill axiom-vision-ref`
|
||||
|
||||
User: "What symbologies does VNDetectBarcodesRequest support?"
|
||||
→ Invoke: `/skill axiom-vision-ref`
|
||||
|
||||
User: "RecognizeDocumentsRequest API reference"
|
||||
→ Invoke: `/skill axiom-vision-ref`
|
||||
|
||||
User: "How do I make my app work with Visual Intelligence?"
|
||||
→ Invoke: `/skill axiom-vision-ref`
|
||||
|
||||
User: "How do users discover my app content through the camera?"
|
||||
→ Invoke: `/skill axiom-vision-ref`
|
||||
Reference in New Issue
Block a user