Add scan flow MVP and local Axiom skill workspace
This snapshot establishes the camera-to-result recognition flow and related tests while checking in the project skill/docs assets required for the configured local tooling.
This commit is contained in:
7
.claude/skills/axiom-ios-ml/.openskills.json
Normal file
7
.claude/skills/axiom-ios-ml/.openskills.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"source": "CharlesWiltgen/Axiom",
|
||||
"sourceType": "git",
|
||||
"repoUrl": "https://github.com/CharlesWiltgen/Axiom",
|
||||
"subpath": ".claude-plugin/plugins/axiom/skills/axiom-ios-ml",
|
||||
"installedAt": "2026-04-12T08:05:35.624Z"
|
||||
}
|
||||
136
.claude/skills/axiom-ios-ml/SKILL.md
Normal file
136
.claude/skills/axiom-ios-ml/SKILL.md
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
name: axiom-ios-ml
|
||||
description: Use when deploying ANY machine learning model on-device, converting models to CoreML, compressing models, or implementing speech-to-text. Covers CoreML conversion, MLTensor, model compression (quantization/palettization/pruning), stateful models, KV-cache, multi-function models, async prediction, SpeechAnalyzer, SpeechTranscriber.
|
||||
license: MIT
|
||||
---
|
||||
|
||||
# iOS Machine Learning Router
|
||||
|
||||
**You MUST use this skill for ANY on-device machine learning or speech-to-text work.**
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this router when:
|
||||
- Converting PyTorch/TensorFlow models to CoreML
|
||||
- Deploying ML models on-device
|
||||
- Compressing models (quantization, palettization, pruning)
|
||||
- Working with large language models (LLMs)
|
||||
- Implementing KV-cache for transformers
|
||||
- Using MLTensor for model stitching
|
||||
- Building speech-to-text features
|
||||
- Transcribing audio (live or recorded)
|
||||
|
||||
## Boundary with ios-ai
|
||||
|
||||
**ios-ml vs ios-ai — know the difference:**
|
||||
|
||||
| Developer Intent | Router |
|
||||
|-----------------|--------|
|
||||
| "Use Apple Intelligence / Foundation Models" | **ios-ai** — Apple's on-device LLM |
|
||||
| "Run my own ML model on device" | **ios-ml** — CoreML conversion + deployment |
|
||||
| "Add text generation with @Generable" | **ios-ai** — Foundation Models structured output |
|
||||
| "Deploy a custom LLM with KV-cache" | **ios-ml** — Custom model optimization |
|
||||
| "Use Vision framework for image analysis" | **ios-vision** — Not ML deployment |
|
||||
| "Use pre-trained Apple NLP models" | **ios-ai** — Apple's models, not custom |
|
||||
|
||||
**Rule of thumb**: If the developer is converting/compressing/deploying their own model → ios-ml. If they're using Apple's built-in AI → ios-ai. If they're doing computer vision → ios-vision.
|
||||
|
||||
## Routing Logic
|
||||
|
||||
### CoreML Work
|
||||
|
||||
**Implementation patterns** → `/skill coreml`
|
||||
- Model conversion workflow
|
||||
- MLTensor for model stitching
|
||||
- Stateful models with KV-cache
|
||||
- Multi-function models (adapters/LoRA)
|
||||
- Async prediction patterns
|
||||
- Compute unit selection
|
||||
|
||||
**API reference** → `/skill coreml-ref`
|
||||
- CoreML Tools Python API
|
||||
- MLModel lifecycle
|
||||
- MLTensor operations
|
||||
- MLComputeDevice availability
|
||||
- State management APIs
|
||||
- Performance reports
|
||||
|
||||
**Diagnostics** → `/skill coreml-diag`
|
||||
- Model won't load
|
||||
- Slow inference
|
||||
- Memory issues
|
||||
- Compression accuracy loss
|
||||
- Compute unit problems
|
||||
|
||||
### Speech Work
|
||||
|
||||
**Implementation patterns** → `/skill speech`
|
||||
- SpeechAnalyzer setup (iOS 26+)
|
||||
- SpeechTranscriber configuration
|
||||
- Live transcription
|
||||
- File transcription
|
||||
- Volatile vs finalized results
|
||||
- Model asset management
|
||||
|
||||
## Decision Tree
|
||||
|
||||
1. Implementing / converting ML models? → coreml
|
||||
2. CoreML API reference? → coreml-ref
|
||||
3. Debugging ML issues (load, inference, compression)? → coreml-diag
|
||||
4. Speech-to-text / transcription? → speech
|
||||
|
||||
## Anti-Rationalization
|
||||
|
||||
| Thought | Reality |
|
||||
|---------|---------|
|
||||
| "CoreML is just load and predict" | CoreML has compression, stateful models, compute unit selection, and async prediction. coreml covers all. |
|
||||
| "My model is small, no optimization needed" | Even small models benefit from compute unit selection and async prediction. coreml has the patterns. |
|
||||
| "I'll just use SFSpeechRecognizer" | iOS 26 has SpeechAnalyzer with better accuracy and offline support. speech skill covers the modern API. |
|
||||
|
||||
## Critical Patterns
|
||||
|
||||
**coreml**:
|
||||
- Model conversion (PyTorch → CoreML)
|
||||
- Compression (palettization, quantization, pruning)
|
||||
- Stateful KV-cache for LLMs
|
||||
- Multi-function models for adapters
|
||||
- MLTensor for pipeline stitching
|
||||
- Async concurrent prediction
|
||||
|
||||
**coreml-diag**:
|
||||
- Load failures and caching
|
||||
- Inference performance issues
|
||||
- Memory pressure from models
|
||||
- Accuracy degradation from compression
|
||||
|
||||
**speech**:
|
||||
- SpeechAnalyzer + SpeechTranscriber setup
|
||||
- AssetInventory model management
|
||||
- Live transcription with volatile results
|
||||
- Audio format conversion
|
||||
|
||||
## Example Invocations
|
||||
|
||||
User: "How do I convert a PyTorch model to CoreML?"
|
||||
→ Invoke: `/skill coreml`
|
||||
|
||||
User: "Compress my model to fit on iPhone"
|
||||
→ Invoke: `/skill coreml`
|
||||
|
||||
User: "Implement KV-cache for my language model"
|
||||
→ Invoke: `/skill coreml`
|
||||
|
||||
User: "Model loads slowly on first launch"
|
||||
→ Invoke: `/skill coreml-diag`
|
||||
|
||||
User: "My compressed model has bad accuracy"
|
||||
→ Invoke: `/skill coreml-diag`
|
||||
|
||||
User: "Add live transcription to my app"
|
||||
→ Invoke: `/skill speech`
|
||||
|
||||
User: "Transcribe audio files with SpeechAnalyzer"
|
||||
→ Invoke: `/skill speech`
|
||||
|
||||
User: "What's MLTensor and how do I use it?"
|
||||
→ Invoke: `/skill coreml-ref`
|
||||
473
.claude/skills/axiom-ios-ml/coreml-diag/SKILL.md
Normal file
473
.claude/skills/axiom-ios-ml/coreml-diag/SKILL.md
Normal file
@@ -0,0 +1,473 @@
|
||||
---
|
||||
name: coreml-diag
|
||||
description: CoreML diagnostics - model load failures, slow inference, memory issues, compression accuracy loss, compute unit problems, conversion errors.
|
||||
license: MIT
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
# CoreML Diagnostics
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Symptom | First Check | Pattern |
|
||||
|---------|-------------|---------|
|
||||
| Model won't load | Deployment target | 1a-1c |
|
||||
| Slow first load | Cache miss | 2a |
|
||||
| Slow inference | Compute units | 2b-2c |
|
||||
| High memory | Concurrent predictions | 3a-3b |
|
||||
| Bad accuracy after compression | Granularity | 4a-4c |
|
||||
| Conversion fails | Operation support | 5a-5b |
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
CoreML issue
|
||||
├─ Load failure?
|
||||
│ ├─ "Unsupported model version" → 1a
|
||||
│ ├─ "Failed to create compute plan" → 1b
|
||||
│ └─ Other load error → 1c
|
||||
├─ Performance issue?
|
||||
│ ├─ First load slow, subsequent fast? → 2a
|
||||
│ ├─ All predictions slow? → 2b
|
||||
│ └─ Slow only on specific device? → 2c
|
||||
├─ Memory issue?
|
||||
│ ├─ Memory grows during predictions? → 3a
|
||||
│ └─ Out of memory on load? → 3b
|
||||
├─ Accuracy degraded?
|
||||
│ ├─ After palettization? → 4a
|
||||
│ ├─ After quantization? → 4b
|
||||
│ └─ After pruning? → 4c
|
||||
└─ Conversion issue?
|
||||
├─ Operation not supported? → 5a
|
||||
└─ Wrong output? → 5b
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1a - "Unsupported model version"
|
||||
|
||||
**Symptom**: Model fails to load with version error.
|
||||
|
||||
**Cause**: Model compiled for newer OS than device supports.
|
||||
|
||||
**Diagnosis**:
|
||||
```python
|
||||
# Check model's minimum deployment target
|
||||
import coremltools as ct
|
||||
model = ct.models.MLModel("Model.mlpackage")
|
||||
print(model.get_spec().specificationVersion)
|
||||
```
|
||||
|
||||
| Spec Version | Minimum iOS |
|
||||
|--------------|-------------|
|
||||
| 4 | iOS 13 |
|
||||
| 5 | iOS 14 |
|
||||
| 6 | iOS 15 |
|
||||
| 7 | iOS 16 |
|
||||
| 8 | iOS 17 |
|
||||
| 9 | iOS 18 |
|
||||
|
||||
**Fix**: Re-convert with lower deployment target:
|
||||
```python
|
||||
mlmodel = ct.convert(
|
||||
traced,
|
||||
minimum_deployment_target=ct.target.iOS16 # Lower target
|
||||
)
|
||||
```
|
||||
|
||||
**Tradeoff**: Loses newer optimizations (SDPA fusion, per-block quantization, MLTensor).
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1b - "Failed to create compute plan"
|
||||
|
||||
**Symptom**: Model loads on some devices but not others.
|
||||
|
||||
**Cause**: Unsupported operations for target compute unit.
|
||||
|
||||
**Diagnosis**:
|
||||
1. Open model in Xcode
|
||||
2. Create Performance Report
|
||||
3. Check "Unsupported" operations
|
||||
4. Hover for hints
|
||||
|
||||
**Fix**:
|
||||
```swift
|
||||
// Force CPU-only to bypass unsupported GPU/NE operations
|
||||
let config = MLModelConfiguration()
|
||||
config.computeUnits = .cpuOnly
|
||||
let model = try MLModel(contentsOf: url, configuration: config)
|
||||
```
|
||||
|
||||
**Better fix**: Update model precision or operations during conversion:
|
||||
```python
|
||||
# Float16 often better supported
|
||||
mlmodel = ct.convert(traced, compute_precision=ct.precision.FLOAT16)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1c - General Load Failures
|
||||
|
||||
**Symptom**: Model fails to load with unclear error.
|
||||
|
||||
**Checklist**:
|
||||
1. Check file exists and is readable
|
||||
2. Check compiled vs source model (runtime needs `.mlmodelc`)
|
||||
3. Check available disk space (cache needs room)
|
||||
4. Check model isn't corrupted (re-convert)
|
||||
|
||||
```swift
|
||||
// Debug logging
|
||||
let config = MLModelConfiguration()
|
||||
config.parameters = [.reporter: { print($0) }] // iOS 17+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2a - Slow First Load (Cache Miss)
|
||||
|
||||
**Symptom**: First prediction after install/update is slow, subsequent are fast.
|
||||
|
||||
**Cause**: Device specialization not cached.
|
||||
|
||||
**Diagnosis**:
|
||||
1. Profile with Core ML Instrument
|
||||
2. Look at Load event label:
|
||||
- "prepare and cache" = cache miss (slow)
|
||||
- "cached" = cache hit (fast)
|
||||
|
||||
**Why cache misses**:
|
||||
- First launch after install
|
||||
- System update invalidated cache
|
||||
- Low disk space cleared cache
|
||||
- Model file was modified
|
||||
|
||||
**Mitigation**:
|
||||
```swift
|
||||
// Warm cache in background at app launch
|
||||
Task.detached(priority: .background) {
|
||||
_ = try? await MLModel.load(contentsOf: modelURL)
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: Cache is tied to (model path + configuration + device). Different configs = different cache entries.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2b - All Predictions Slow
|
||||
|
||||
**Symptom**: Predictions consistently slow, not just first one.
|
||||
|
||||
**Diagnosis**:
|
||||
1. Create Xcode Performance Report
|
||||
2. Check compute unit distribution
|
||||
3. Look for high-cost operations
|
||||
|
||||
**Common causes**:
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| Running on CPU when GPU/NE available | Check `computeUnits` config |
|
||||
| Model too large for Neural Engine | Compress model |
|
||||
| Frequent CPU↔GPU↔NE transfers | Adjust segmentation |
|
||||
| Dynamic shapes recompiling | Use fixed/enumerated shapes |
|
||||
|
||||
**Profile compute unit usage**:
|
||||
```swift
|
||||
let plan = try await MLComputePlan.load(contentsOf: modelURL)
|
||||
for op in plan.modelStructure.operations {
|
||||
let info = plan.computeDeviceInfo(for: op)
|
||||
print("\(op.name): \(info.preferredDevice)")
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2c - Slow on Specific Device
|
||||
|
||||
**Symptom**: Fast on Mac, slow on iPhone (or vice versa).
|
||||
|
||||
**Cause**: Different hardware characteristics.
|
||||
|
||||
**Diagnosis**:
|
||||
```swift
|
||||
// Check available compute
|
||||
let devices = MLModel.availableComputeDevices
|
||||
print(devices) // Different per device
|
||||
```
|
||||
|
||||
**Common issues**:
|
||||
|
||||
| Scenario | Cause | Fix |
|
||||
|----------|-------|-----|
|
||||
| Fast on M-series Mac, slow on iPhone | Model optimized for GPU | Use palettization (Neural Engine) |
|
||||
| Fast on iPhone, slow on Intel Mac | No Neural Engine | Use quantization (GPU) |
|
||||
| Slow on older devices | Less compute power | Use more aggressive compression |
|
||||
|
||||
**Recommendation**: Profile on target devices, not just development Mac.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3a - Memory Grows During Predictions
|
||||
|
||||
**Symptom**: Memory increases with each prediction, doesn't release.
|
||||
|
||||
**Cause**: Input/output buffers accumulating from concurrent predictions.
|
||||
|
||||
**Diagnosis**:
|
||||
```
|
||||
Instruments → Allocations + Core ML template
|
||||
Look for: Many concurrent prediction intervals
|
||||
Check: MLMultiArray allocations growing
|
||||
```
|
||||
|
||||
**Fix**: Limit concurrent predictions:
|
||||
```swift
|
||||
actor PredictionLimiter {
|
||||
private let maxConcurrent = 2
|
||||
private var inFlight = 0
|
||||
|
||||
func predict(_ model: MLModel, input: MLFeatureProvider) async throws -> MLFeatureProvider {
|
||||
while inFlight >= maxConcurrent {
|
||||
await Task.yield()
|
||||
}
|
||||
inFlight += 1
|
||||
defer { inFlight -= 1 }
|
||||
return try await model.prediction(from: input)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3b - Out of Memory on Load
|
||||
|
||||
**Symptom**: App crashes or model fails to load on memory-constrained devices.
|
||||
|
||||
**Cause**: Model too large for device memory.
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Check model size
|
||||
ls -lh Model.mlpackage/Data/com.apple.CoreML/weights/
|
||||
```
|
||||
|
||||
**Fix options**:
|
||||
|
||||
| Approach | Compression | Memory Impact |
|
||||
|----------|-------------|---------------|
|
||||
| 8-bit palettization | 2x smaller | 2x less memory |
|
||||
| 4-bit palettization | 4x smaller | 4x less memory |
|
||||
| Pruning (50%) | ~2x smaller | ~2x less memory |
|
||||
|
||||
**Note**: Compressed weights are decompressed just-in-time (iOS 17+), so smaller on-disk = smaller in memory.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4a - Bad Accuracy After Palettization
|
||||
|
||||
**Symptom**: Model output degraded after palettization.
|
||||
|
||||
**Diagnosis**:
|
||||
1. What bit depth? (2-bit most likely to fail)
|
||||
2. What granularity? (per-tensor loses more than per-grouped-channel)
|
||||
|
||||
**Fix progression**:
|
||||
|
||||
```python
|
||||
# Step 1: Try grouped channels (iOS 18+)
|
||||
config = OpPalettizerConfig(
|
||||
nbits=4,
|
||||
granularity="per_grouped_channel",
|
||||
group_size=16
|
||||
)
|
||||
|
||||
# Step 2: If still bad, try more bits
|
||||
config = OpPalettizerConfig(nbits=6, ...)
|
||||
|
||||
# Step 3: If still need 4-bit, use calibration
|
||||
from coremltools.optimize.torch.palettization import DKMPalettizer
|
||||
# ... training-time compression
|
||||
```
|
||||
|
||||
**Key insight**: 4-bit per-tensor has only 16 clusters for entire weight matrix. Grouped channels = 16 clusters per 16 channels = much better granularity.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4b - Bad Accuracy After Quantization
|
||||
|
||||
**Symptom**: Model output degraded after INT8/INT4 quantization.
|
||||
|
||||
**Diagnosis**:
|
||||
1. What bit depth?
|
||||
2. What granularity?
|
||||
|
||||
**Fix progression**:
|
||||
|
||||
```python
|
||||
# Step 1: Use per-block (iOS 18+)
|
||||
config = OpLinearQuantizerConfig(
|
||||
dtype="int4",
|
||||
granularity="per_block",
|
||||
block_size=32
|
||||
)
|
||||
|
||||
# Step 2: Use calibration data
|
||||
from coremltools.optimize.torch.quantization import LayerwiseCompressor
|
||||
compressor = LayerwiseCompressor(model, config)
|
||||
quantized = compressor.compress(calibration_loader)
|
||||
```
|
||||
|
||||
**Note**: INT4 quantization works best on Mac GPU. For Neural Engine, prefer palettization.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4c - Bad Accuracy After Pruning
|
||||
|
||||
**Symptom**: Model output degraded after weight pruning.
|
||||
|
||||
**Diagnosis**:
|
||||
1. What sparsity level?
|
||||
2. Post-training or training-time?
|
||||
|
||||
**Thresholds** (model-dependent):
|
||||
- 0-30% sparsity: Usually safe
|
||||
- 30-50% sparsity: May need calibration
|
||||
- 50%+ sparsity: Usually needs training-time
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Use calibration-based pruning
|
||||
from coremltools.optimize.torch.pruning import LayerwiseCompressor
|
||||
|
||||
config = MagnitudePrunerConfig(
|
||||
target_sparsity=0.4,
|
||||
n_samples=128
|
||||
)
|
||||
compressor = LayerwiseCompressor(model, config)
|
||||
sparse = compressor.compress(calibration_loader)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5a - Operation Not Supported
|
||||
|
||||
**Symptom**: Conversion fails with unsupported operation error.
|
||||
|
||||
**Diagnosis**:
|
||||
```
|
||||
Error: "Op 'custom_op' is not supported for conversion"
|
||||
```
|
||||
|
||||
**Options**:
|
||||
|
||||
1. **Check if op is in coremltools**: May need newer version
|
||||
```bash
|
||||
pip install --upgrade coremltools
|
||||
```
|
||||
|
||||
2. **Use composite ops**: Split into supported primitives
|
||||
```python
|
||||
# Instead of custom_op(x)
|
||||
# Use: supported_op1(supported_op2(x))
|
||||
```
|
||||
|
||||
3. **Register custom op**: Advanced, requires MIL programming
|
||||
```python
|
||||
from coremltools.converters.mil import Builder as mb
|
||||
|
||||
@mb.register_torch_op
|
||||
def custom_op(context, node):
|
||||
# Map to MIL operations
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5b - Conversion Succeeds but Wrong Output
|
||||
|
||||
**Symptom**: Model converts but predictions differ from PyTorch.
|
||||
|
||||
**Diagnosis checklist**:
|
||||
|
||||
1. **Input normalization**: Ensure preprocessing matches
|
||||
```python
|
||||
# PyTorch often uses ImageNet normalization
|
||||
# CoreML may need explicit preprocessing
|
||||
```
|
||||
|
||||
2. **Shape ordering**: PyTorch (NCHW) vs CoreML (NHWC for some ops)
|
||||
```python
|
||||
# Check shapes in conversion
|
||||
ct.convert(..., inputs=[ct.ImageType(shape=(1, 3, 224, 224))])
|
||||
```
|
||||
|
||||
3. **Precision differences**: Float16 may differ from Float32
|
||||
```python
|
||||
# Force Float32 to match PyTorch
|
||||
ct.convert(..., compute_precision=ct.precision.FLOAT32)
|
||||
```
|
||||
|
||||
4. **Random ops**: Dropout, random initialization differ
|
||||
```python
|
||||
# Ensure eval mode
|
||||
model.eval()
|
||||
```
|
||||
|
||||
**Debug**:
|
||||
```python
|
||||
# Compare outputs layer by layer
|
||||
import numpy as np
|
||||
|
||||
torch_output = model(input).detach().numpy()
|
||||
coreml_output = mlmodel.predict({"input": input.numpy()})["output"]
|
||||
|
||||
print(f"Max diff: {np.max(np.abs(torch_output - coreml_output))}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pressure Scenario - "Model works on simulator but not device"
|
||||
|
||||
**Wrong approach**: Assume simulator bug, ignore.
|
||||
|
||||
**Right approach**:
|
||||
1. Check model spec version vs device iOS version (Pattern 1a)
|
||||
2. Check compute unit availability (Pattern 2c)
|
||||
3. Profile on actual device, not simulator
|
||||
4. Simulator uses host Mac's GPU/CPU, not device Neural Engine
|
||||
|
||||
---
|
||||
|
||||
## Pressure Scenario - "Ship now, optimize later"
|
||||
|
||||
**Wrong approach**: Compress to smallest possible size without testing.
|
||||
|
||||
**Right approach**:
|
||||
1. Ship Float16 baseline first
|
||||
2. Profile on target devices
|
||||
3. Apply compression incrementally with accuracy testing
|
||||
4. Document compression settings for future optimization
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Checklist
|
||||
|
||||
When CoreML isn't working:
|
||||
|
||||
- [ ] Check deployment target matches device iOS
|
||||
- [ ] Check model file is compiled (.mlmodelc)
|
||||
- [ ] Profile load: cached vs uncached
|
||||
- [ ] Profile prediction: which compute units
|
||||
- [ ] Check memory: concurrent predictions limited
|
||||
- [ ] For compression issues: try higher granularity
|
||||
- [ ] For conversion issues: check op support, precision
|
||||
|
||||
## Resources
|
||||
|
||||
**WWDC**: 2023-10047, 2023-10049, 2024-10159, 2024-10161
|
||||
|
||||
**Docs**: /coreml, /coreml/mlmodel
|
||||
|
||||
**Skills**: coreml, coreml-ref
|
||||
467
.claude/skills/axiom-ios-ml/coreml-ref/SKILL.md
Normal file
467
.claude/skills/axiom-ios-ml/coreml-ref/SKILL.md
Normal file
@@ -0,0 +1,467 @@
|
||||
---
|
||||
name: coreml-ref
|
||||
description: CoreML API reference - MLModel lifecycle, MLTensor operations, coremltools conversion, compression APIs, state management, compute device availability, performance profiling.
|
||||
license: MIT
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
# CoreML API Reference
|
||||
|
||||
## Part 1 - Model Lifecycle
|
||||
|
||||
### MLModel Loading
|
||||
|
||||
```swift
|
||||
// Synchronous load (blocks thread)
|
||||
let model = try MLModel(contentsOf: compiledModelURL)
|
||||
|
||||
// Async load (preferred)
|
||||
let model = try await MLModel.load(contentsOf: compiledModelURL)
|
||||
|
||||
// With configuration
|
||||
let config = MLModelConfiguration()
|
||||
config.computeUnits = .all // .cpuOnly, .cpuAndGPU, .cpuAndNeuralEngine
|
||||
let model = try await MLModel.load(contentsOf: url, configuration: config)
|
||||
```
|
||||
|
||||
### Model Asset Types
|
||||
|
||||
| Type | Extension | Purpose |
|
||||
|------|-----------|---------|
|
||||
| Source | `.mlmodel`, `.mlpackage` | Development, editing |
|
||||
| Compiled | `.mlmodelc` | Runtime execution |
|
||||
|
||||
**Note**: Xcode compiles source models automatically. At runtime, use compiled models.
|
||||
|
||||
### Caching Behavior
|
||||
|
||||
First load triggers device specialization (can be slow). Subsequent loads use cache.
|
||||
|
||||
```
|
||||
Load flow:
|
||||
├─ Check cache for (model path + configuration + device)
|
||||
│ ├─ Found → Cached load (fast)
|
||||
│ └─ Not found → Device specialization
|
||||
│ ├─ Parse model
|
||||
│ ├─ Optimize operations
|
||||
│ ├─ Segment for compute units
|
||||
│ ├─ Compile for each unit
|
||||
│ └─ Cache result
|
||||
```
|
||||
|
||||
Cache invalidated by: system updates, low disk space, model modification.
|
||||
|
||||
### Multi-Function Models
|
||||
|
||||
```swift
|
||||
// Load specific function
|
||||
let config = MLModelConfiguration()
|
||||
config.functionName = "sticker" // Function name from model
|
||||
|
||||
let model = try MLModel(contentsOf: url, configuration: config)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 2 - Compute Availability
|
||||
|
||||
### MLComputeDevice (iOS 17+)
|
||||
|
||||
```swift
|
||||
// Check available compute devices
|
||||
let devices = MLModel.availableComputeDevices
|
||||
|
||||
// Check for Neural Engine
|
||||
let hasNeuralEngine = devices.contains { device in
|
||||
if case .neuralEngine = device { return true }
|
||||
return false
|
||||
}
|
||||
|
||||
// Check for specific GPU
|
||||
for device in devices {
|
||||
switch device {
|
||||
case .cpu:
|
||||
print("CPU available")
|
||||
case .gpu(let gpu):
|
||||
print("GPU: \(gpu.name)")
|
||||
case .neuralEngine(let ne):
|
||||
print("Neural Engine: \(ne.totalCoreCount) cores")
|
||||
@unknown default:
|
||||
break
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### MLModelConfiguration.ComputeUnits
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `.all` | Best performance (default) |
|
||||
| `.cpuOnly` | CPU only |
|
||||
| `.cpuAndGPU` | Exclude Neural Engine |
|
||||
| `.cpuAndNeuralEngine` | Exclude GPU |
|
||||
|
||||
---
|
||||
|
||||
## Part 3 - Prediction APIs
|
||||
|
||||
### Synchronous Prediction
|
||||
|
||||
```swift
|
||||
// Single prediction (NOT thread-safe)
|
||||
let output = try model.prediction(from: input)
|
||||
|
||||
// Batch prediction
|
||||
let outputs = try model.predictions(from: batch)
|
||||
```
|
||||
|
||||
### Async Prediction (iOS 17+)
|
||||
|
||||
```swift
|
||||
// Single prediction (thread-safe, supports concurrency)
|
||||
let output = try await model.prediction(from: input)
|
||||
|
||||
// With cancellation
|
||||
let output = try await withTaskCancellationHandler {
|
||||
try await model.prediction(from: input)
|
||||
} onCancel: {
|
||||
// Prediction will be cancelled
|
||||
}
|
||||
```
|
||||
|
||||
### State-Based Prediction
|
||||
|
||||
```swift
|
||||
// Create state from model
|
||||
let state = model.makeState()
|
||||
|
||||
// Prediction with state (state updated in-place)
|
||||
let output = try model.prediction(from: input, using: state)
|
||||
|
||||
// Async with state
|
||||
let output = try await model.prediction(from: input, using: state)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 4 - MLTensor (iOS 18+)
|
||||
|
||||
### Creating Tensors
|
||||
|
||||
```swift
|
||||
import CoreML
|
||||
|
||||
// From MLShapedArray
|
||||
let shapedArray = MLShapedArray<Float>(scalars: [1, 2, 3, 4], shape: [2, 2])
|
||||
let tensor = MLTensor(shapedArray)
|
||||
|
||||
// From nested collections
|
||||
let tensor = MLTensor([[1.0, 2.0], [3.0, 4.0]])
|
||||
|
||||
// Zeros/ones
|
||||
let zeros = MLTensor(zeros: [3, 3], scalarType: Float.self)
|
||||
```
|
||||
|
||||
### Math Operations
|
||||
|
||||
```swift
|
||||
// Element-wise
|
||||
let sum = tensor1 + tensor2
|
||||
let product = tensor1 * tensor2
|
||||
let scaled = tensor * 2.0
|
||||
|
||||
// Reductions
|
||||
let mean = tensor.mean()
|
||||
let sum = tensor.sum()
|
||||
let max = tensor.max()
|
||||
|
||||
// Comparison
|
||||
let mask = tensor .> mean // Boolean mask
|
||||
|
||||
// Softmax
|
||||
let probs = tensor.softmax()
|
||||
```
|
||||
|
||||
### Indexing and Reshaping
|
||||
|
||||
```swift
|
||||
// Slicing (Python-like syntax)
|
||||
let row = tensor[0] // First row
|
||||
let col = tensor[.all, 0] // First column
|
||||
let slice = tensor[0..<2, 1..<3]
|
||||
|
||||
// Reshaping
|
||||
let reshaped = tensor.reshaped(to: [4])
|
||||
let expanded = tensor.expandingShape(at: 0)
|
||||
```
|
||||
|
||||
### Materialization
|
||||
|
||||
**Critical**: Tensor operations are async. Must materialize to access data.
|
||||
|
||||
```swift
|
||||
// Materialize to MLShapedArray (blocks until complete)
|
||||
let array = await tensor.shapedArray(of: Float.self)
|
||||
|
||||
// Access scalars
|
||||
let values = array.scalars
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 5 - Core ML Tools (Python)
|
||||
|
||||
### Basic Conversion
|
||||
|
||||
```python
|
||||
import coremltools as ct
|
||||
import torch
|
||||
|
||||
# Trace PyTorch model
|
||||
model.eval()
|
||||
traced = torch.jit.trace(model, example_input)
|
||||
|
||||
# Convert
|
||||
mlmodel = ct.convert(
|
||||
traced,
|
||||
inputs=[ct.TensorType(shape=example_input.shape)],
|
||||
outputs=[ct.TensorType(name="output")],
|
||||
minimum_deployment_target=ct.target.iOS18
|
||||
)
|
||||
|
||||
mlmodel.save("Model.mlpackage")
|
||||
```
|
||||
|
||||
### Dynamic Shapes
|
||||
|
||||
```python
|
||||
# Fixed shape
|
||||
ct.TensorType(shape=(1, 3, 224, 224))
|
||||
|
||||
# Range dimension
|
||||
ct.TensorType(shape=(1, ct.RangeDim(1, 2048)))
|
||||
|
||||
# Enumerated shapes
|
||||
ct.TensorType(shape=ct.EnumeratedShapes(shapes=[(1, 256), (1, 512), (1, 1024)]))
|
||||
```
|
||||
|
||||
### State Types
|
||||
|
||||
```python
|
||||
# For stateful models (KV-cache)
|
||||
states = [
|
||||
ct.StateType(
|
||||
name="keyCache",
|
||||
wrapped_type=ct.TensorType(shape=(1, 32, 2048, 128))
|
||||
),
|
||||
ct.StateType(
|
||||
name="valueCache",
|
||||
wrapped_type=ct.TensorType(shape=(1, 32, 2048, 128))
|
||||
)
|
||||
]
|
||||
|
||||
mlmodel = ct.convert(traced, inputs=inputs, states=states, ...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 6 - Compression APIs (coremltools.optimize)
|
||||
|
||||
### Post-Training Palettization
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpPalettizerConfig,
|
||||
OptimizationConfig,
|
||||
palettize_weights
|
||||
)
|
||||
|
||||
# Per-tensor (iOS 17+)
|
||||
config = OpPalettizerConfig(mode="kmeans", nbits=4)
|
||||
|
||||
# Per-grouped-channel (iOS 18+, better accuracy)
|
||||
config = OpPalettizerConfig(
|
||||
mode="kmeans",
|
||||
nbits=4,
|
||||
granularity="per_grouped_channel",
|
||||
group_size=16
|
||||
)
|
||||
|
||||
opt_config = OptimizationConfig(global_config=config)
|
||||
compressed = palettize_weights(model, opt_config)
|
||||
```
|
||||
|
||||
### Post-Training Quantization
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpLinearQuantizerConfig,
|
||||
OptimizationConfig,
|
||||
linear_quantize_weights
|
||||
)
|
||||
|
||||
# INT8 per-channel (iOS 17+)
|
||||
config = OpLinearQuantizerConfig(mode="linear", dtype="int8")
|
||||
|
||||
# INT4 per-block (iOS 18+)
|
||||
config = OpLinearQuantizerConfig(
|
||||
mode="linear",
|
||||
dtype="int4",
|
||||
granularity="per_block",
|
||||
block_size=32
|
||||
)
|
||||
|
||||
opt_config = OptimizationConfig(global_config=config)
|
||||
compressed = linear_quantize_weights(model, opt_config)
|
||||
```
|
||||
|
||||
### Post-Training Pruning
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpMagnitudePrunerConfig,
|
||||
OptimizationConfig,
|
||||
prune_weights
|
||||
)
|
||||
|
||||
config = OpMagnitudePrunerConfig(target_sparsity=0.5)
|
||||
opt_config = OptimizationConfig(global_config=config)
|
||||
sparse = prune_weights(model, opt_config)
|
||||
```
|
||||
|
||||
### Training-Time Palettization (PyTorch)
|
||||
|
||||
```python
|
||||
from coremltools.optimize.torch.palettization import (
|
||||
DKMPalettizerConfig,
|
||||
DKMPalettizer
|
||||
)
|
||||
|
||||
config = DKMPalettizerConfig(global_config={"n_bits": 4})
|
||||
palettizer = DKMPalettizer(model, config)
|
||||
|
||||
# Prepare (inserts palettization layers)
|
||||
prepared = palettizer.prepare()
|
||||
|
||||
# Training loop
|
||||
for epoch in range(epochs):
|
||||
train_one_epoch(prepared, data_loader)
|
||||
palettizer.step()
|
||||
|
||||
# Finalize
|
||||
final = palettizer.finalize()
|
||||
```
|
||||
|
||||
### Calibration-Based Compression
|
||||
|
||||
```python
|
||||
from coremltools.optimize.torch.pruning import (
|
||||
MagnitudePrunerConfig,
|
||||
LayerwiseCompressor
|
||||
)
|
||||
|
||||
config = MagnitudePrunerConfig(
|
||||
target_sparsity=0.4,
|
||||
n_samples=128
|
||||
)
|
||||
|
||||
compressor = LayerwiseCompressor(model, config)
|
||||
compressed = compressor.compress(calibration_loader)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 7 - Multi-Function Models
|
||||
|
||||
### Merging Models
|
||||
|
||||
```python
|
||||
from coremltools.models import MultiFunctionDescriptor
|
||||
from coremltools.models.utils import save_multifunction
|
||||
|
||||
# Create descriptor
|
||||
desc = MultiFunctionDescriptor()
|
||||
desc.add_function("function_a", "model_a.mlpackage")
|
||||
desc.add_function("function_b", "model_b.mlpackage")
|
||||
|
||||
# Merge (deduplicates shared weights)
|
||||
save_multifunction(desc, "merged.mlpackage")
|
||||
```
|
||||
|
||||
### Inspecting Functions (Xcode)
|
||||
|
||||
Open model in Xcode → Predictions tab → Functions listed above inputs.
|
||||
|
||||
---
|
||||
|
||||
## Part 8 - Performance Profiling
|
||||
|
||||
### MLComputePlan (iOS 18+)
|
||||
|
||||
```swift
|
||||
let plan = try await MLComputePlan.load(contentsOf: modelURL)
|
||||
|
||||
// Inspect operations
|
||||
for op in plan.modelStructure.operations {
|
||||
let info = plan.computeDeviceInfo(for: op)
|
||||
print("Op: \(op.name)")
|
||||
print(" Preferred: \(info.preferredDevice)")
|
||||
print(" Estimated cost: \(info.estimatedCost)")
|
||||
}
|
||||
```
|
||||
|
||||
### Xcode Performance Reports
|
||||
|
||||
1. Open model in Xcode
|
||||
2. Select Performance tab
|
||||
3. Click + to create report
|
||||
4. Select device and compute units
|
||||
5. Click "Run Test"
|
||||
|
||||
**New in iOS 18**: Shows estimated time per operation, compute device support hints.
|
||||
|
||||
### Core ML Instrument
|
||||
|
||||
```
|
||||
Instruments → Core ML template
|
||||
├─ Load events: "cached" vs "prepare and cache"
|
||||
├─ Prediction intervals
|
||||
├─ Compute unit usage
|
||||
└─ Neural Engine activity
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 9 - Deployment Targets
|
||||
|
||||
| Target | Key Features |
|
||||
|--------|--------------|
|
||||
| iOS 16 | Weight compression (palettization, quantization, pruning) |
|
||||
| iOS 17 | Async prediction, MLComputeDevice, activation quantization |
|
||||
| iOS 18 | MLTensor, State, SDPA fusion, per-block quantization, multi-function |
|
||||
|
||||
**Recommendation**: Always set `minimum_deployment_target=ct.target.iOS18` for best optimizations.
|
||||
|
||||
---
|
||||
|
||||
## Part 10 - Conversion Pass Pipelines
|
||||
|
||||
```python
|
||||
# Default pipeline
|
||||
mlmodel = ct.convert(traced, ...)
|
||||
|
||||
# With palettization support
|
||||
mlmodel = ct.convert(
|
||||
traced,
|
||||
pass_pipeline=ct.PassPipeline.DEFAULT_PALETTIZATION,
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
**WWDC**: 2023-10047, 2023-10049, 2024-10159, 2024-10161
|
||||
|
||||
**Docs**: /coreml, /coreml/mlmodel, /coreml/mltensor, /documentation/coremltools
|
||||
|
||||
**Skills**: coreml, coreml-diag
|
||||
468
.claude/skills/axiom-ios-ml/coreml/SKILL.md
Normal file
468
.claude/skills/axiom-ios-ml/coreml/SKILL.md
Normal file
@@ -0,0 +1,468 @@
|
||||
---
|
||||
name: coreml
|
||||
description: Use when deploying custom ML models on-device, converting PyTorch models, compressing models, implementing LLM inference, or optimizing CoreML performance. Covers model conversion, compression, stateful models, KV-cache, multi-function models, MLTensor.
|
||||
license: MIT
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
# CoreML On-Device Machine Learning
|
||||
|
||||
## Overview
|
||||
|
||||
CoreML enables on-device machine learning inference across all Apple platforms. It abstracts hardware details while leveraging Apple Silicon's CPU, GPU, and Neural Engine for high-performance, private, and efficient execution.
|
||||
|
||||
**Key principle**: Start with the simplest approach, then optimize based on profiling. Don't over-engineer compression or caching until you have real performance data.
|
||||
|
||||
## Decision Tree - CoreML vs Foundation Models
|
||||
|
||||
```
|
||||
Need on-device ML?
|
||||
├─ Text generation (LLM)?
|
||||
│ ├─ Simple prompts, structured output? → Foundation Models (ios-ai skill)
|
||||
│ └─ Custom model, fine-tuned, specific architecture? → CoreML
|
||||
├─ Custom trained model?
|
||||
│ └─ Yes → CoreML
|
||||
├─ Image/audio/sensor processing?
|
||||
│ └─ Yes → CoreML
|
||||
└─ Apple's built-in intelligence?
|
||||
└─ Yes → Foundation Models (ios-ai skill)
|
||||
```
|
||||
|
||||
## Red Flags
|
||||
|
||||
Use this skill when you see:
|
||||
- "Convert PyTorch model to CoreML"
|
||||
- "Model too large for device"
|
||||
- "Slow inference performance"
|
||||
- "LLM on-device"
|
||||
- "KV-cache" or "stateful model"
|
||||
- "Model compression" or "quantization"
|
||||
- MLModel, MLTensor, or coremltools in context
|
||||
|
||||
## Pattern 1 - Basic Model Conversion
|
||||
|
||||
The standard PyTorch → CoreML workflow.
|
||||
|
||||
```python
|
||||
import coremltools as ct
|
||||
import torch
|
||||
|
||||
# Trace the model
|
||||
model.eval()
|
||||
traced_model = torch.jit.trace(model, example_input)
|
||||
|
||||
# Convert to CoreML
|
||||
mlmodel = ct.convert(
|
||||
traced_model,
|
||||
inputs=[ct.TensorType(shape=example_input.shape)],
|
||||
minimum_deployment_target=ct.target.iOS18
|
||||
)
|
||||
|
||||
# Save
|
||||
mlmodel.save("MyModel.mlpackage")
|
||||
```
|
||||
|
||||
**Critical**: Always set `minimum_deployment_target` to enable latest optimizations.
|
||||
|
||||
## Pattern 2 - Model Compression (Post-Training)
|
||||
|
||||
Three techniques, each with different tradeoffs:
|
||||
|
||||
### Palettization (Best for Neural Engine)
|
||||
|
||||
Clusters weights into lookup tables. Use per-grouped-channel for better accuracy.
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpPalettizerConfig,
|
||||
OptimizationConfig,
|
||||
palettize_weights
|
||||
)
|
||||
|
||||
# 4-bit with grouped channels (iOS 18+)
|
||||
op_config = OpPalettizerConfig(
|
||||
mode="kmeans",
|
||||
nbits=4,
|
||||
granularity="per_grouped_channel",
|
||||
group_size=16
|
||||
)
|
||||
|
||||
config = OptimizationConfig(global_config=op_config)
|
||||
compressed_model = palettize_weights(model, config)
|
||||
```
|
||||
|
||||
| Bits | Compression | Accuracy Impact |
|
||||
|------|-------------|-----------------|
|
||||
| 8-bit | 2x | Minimal |
|
||||
| 6-bit | 2.7x | Low |
|
||||
| 4-bit | 4x | Moderate (use grouped channels) |
|
||||
| 2-bit | 8x | High (requires training-time) |
|
||||
|
||||
### Quantization (Best for GPU on Mac)
|
||||
|
||||
Linear mapping to INT8/INT4. Use per-block for better accuracy.
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpLinearQuantizerConfig,
|
||||
OptimizationConfig,
|
||||
linear_quantize_weights
|
||||
)
|
||||
|
||||
# INT4 per-block quantization (iOS 18+)
|
||||
op_config = OpLinearQuantizerConfig(
|
||||
mode="linear",
|
||||
dtype="int4",
|
||||
granularity="per_block",
|
||||
block_size=32
|
||||
)
|
||||
|
||||
config = OptimizationConfig(global_config=op_config)
|
||||
compressed_model = linear_quantize_weights(model, config)
|
||||
```
|
||||
|
||||
### Pruning (Combine with other techniques)
|
||||
|
||||
Sets weights to zero for sparse representation. Can combine with palettization.
|
||||
|
||||
```python
|
||||
from coremltools.optimize.coreml import (
|
||||
OpMagnitudePrunerConfig,
|
||||
OptimizationConfig,
|
||||
prune_weights
|
||||
)
|
||||
|
||||
op_config = OpMagnitudePrunerConfig(
|
||||
target_sparsity=0.4 # 40% zeros
|
||||
)
|
||||
|
||||
config = OptimizationConfig(global_config=op_config)
|
||||
sparse_model = prune_weights(model, config)
|
||||
```
|
||||
|
||||
## Pattern 3 - Training-Time Compression
|
||||
|
||||
When post-training compression loses too much accuracy, fine-tune with compression.
|
||||
|
||||
```python
|
||||
from coremltools.optimize.torch.palettization import (
|
||||
DKMPalettizerConfig,
|
||||
DKMPalettizer
|
||||
)
|
||||
|
||||
# Configure 4-bit palettization
|
||||
config = DKMPalettizerConfig(global_config={"n_bits": 4})
|
||||
|
||||
# Prepare model
|
||||
palettizer = DKMPalettizer(model, config)
|
||||
prepared_model = palettizer.prepare()
|
||||
|
||||
# Fine-tune (your training loop)
|
||||
for epoch in range(num_epochs):
|
||||
train_epoch(prepared_model, data_loader)
|
||||
palettizer.step()
|
||||
|
||||
# Finalize
|
||||
final_model = palettizer.finalize()
|
||||
```
|
||||
|
||||
**Tradeoff**: Better accuracy than post-training, but requires training data and time.
|
||||
|
||||
## Pattern 4 - Calibration-Based Compression (iOS 18+)
|
||||
|
||||
Middle ground: uses calibration data without full training.
|
||||
|
||||
```python
|
||||
from coremltools.optimize.torch.pruning import (
|
||||
MagnitudePrunerConfig,
|
||||
LayerwiseCompressor
|
||||
)
|
||||
|
||||
# Configure
|
||||
config = MagnitudePrunerConfig(
|
||||
target_sparsity=0.4,
|
||||
n_samples=128 # Calibration samples
|
||||
)
|
||||
|
||||
# Create pruner
|
||||
pruner = LayerwiseCompressor(model, config)
|
||||
|
||||
# Calibrate
|
||||
sparse_model = pruner.compress(calibration_data_loader)
|
||||
```
|
||||
|
||||
## Pattern 5 - Stateful Models (KV-Cache for LLMs)
|
||||
|
||||
For transformer models, use state to avoid recomputing key/value vectors.
|
||||
|
||||
### PyTorch Model with State
|
||||
|
||||
```python
|
||||
class StatefulLLM(nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
# Register state buffers
|
||||
self.register_buffer("keyCache", torch.zeros(batch, heads, seq_len, dim))
|
||||
self.register_buffer("valueCache", torch.zeros(batch, heads, seq_len, dim))
|
||||
|
||||
def forward(self, input_ids, causal_mask):
|
||||
# Update caches in-place during forward
|
||||
# ... attention with KV-cache ...
|
||||
return logits
|
||||
```
|
||||
|
||||
### Conversion with State
|
||||
|
||||
```python
|
||||
import coremltools as ct
|
||||
|
||||
mlmodel = ct.convert(
|
||||
traced_model,
|
||||
inputs=[
|
||||
ct.TensorType(name="input_ids", shape=(1, ct.RangeDim(1, 2048))),
|
||||
ct.TensorType(name="causal_mask", shape=(1, 1, ct.RangeDim(1, 2048), ct.RangeDim(1, 2048)))
|
||||
],
|
||||
states=[
|
||||
ct.StateType(name="keyCache", ...),
|
||||
ct.StateType(name="valueCache", ...)
|
||||
],
|
||||
minimum_deployment_target=ct.target.iOS18
|
||||
)
|
||||
```
|
||||
|
||||
### Using State at Runtime
|
||||
|
||||
```swift
|
||||
// Create state from model
|
||||
let state = model.makeState()
|
||||
|
||||
// Run prediction with state (updated in-place)
|
||||
let output = try model.prediction(from: input, using: state)
|
||||
```
|
||||
|
||||
**Performance**: 1.6x speedup on Mistral-7B (M3 Max) compared to manual KV-cache I/O.
|
||||
|
||||
## Pattern 6 - Multi-Function Models (Adapters/LoRA)
|
||||
|
||||
Deploy multiple adapters in a single model, sharing base weights.
|
||||
|
||||
```python
|
||||
from coremltools.models import MultiFunctionDescriptor
|
||||
from coremltools.models.utils import save_multifunction
|
||||
|
||||
# Convert individual models
|
||||
sticker_model = ct.convert(sticker_adapter_model, ...)
|
||||
storybook_model = ct.convert(storybook_adapter_model, ...)
|
||||
|
||||
# Save individually
|
||||
sticker_model.save("sticker.mlpackage")
|
||||
storybook_model.save("storybook.mlpackage")
|
||||
|
||||
# Merge with shared weights
|
||||
desc = MultiFunctionDescriptor()
|
||||
desc.add_function("sticker", "sticker.mlpackage")
|
||||
desc.add_function("storybook", "storybook.mlpackage")
|
||||
|
||||
save_multifunction(desc, "MultiAdapter.mlpackage")
|
||||
```
|
||||
|
||||
### Loading Specific Function
|
||||
|
||||
```swift
|
||||
let config = MLModelConfiguration()
|
||||
config.functionName = "sticker" // or "storybook"
|
||||
|
||||
let model = try MLModel(contentsOf: modelURL, configuration: config)
|
||||
```
|
||||
|
||||
## Pattern 7 - MLTensor for Pipeline Stitching (iOS 18+)
|
||||
|
||||
Simplifies computation between models (decoding, post-processing).
|
||||
|
||||
```swift
|
||||
import CoreML
|
||||
|
||||
// Create tensors
|
||||
let scores = MLTensor(shape: [1, vocab_size], scalars: logits)
|
||||
|
||||
// Operations (executed asynchronously on Apple Silicon)
|
||||
let topK = scores.topK(k: 10)
|
||||
let probs = (topK.values / temperature).softmax()
|
||||
|
||||
// Sample from distribution
|
||||
let sampled = probs.multinomial(numSamples: 1)
|
||||
|
||||
// Materialize to access data (blocks until complete)
|
||||
let shapedArray = await sampled.shapedArray(of: Int32.self)
|
||||
```
|
||||
|
||||
**Key insight**: MLTensor operations are async. Call `shapedArray()` to materialize results.
|
||||
|
||||
## Pattern 8 - Async Prediction for Concurrency
|
||||
|
||||
Thread-safe concurrent predictions for throughput.
|
||||
|
||||
```swift
|
||||
class ImageProcessor {
|
||||
let model: MLModel
|
||||
|
||||
func processImages(_ images: [CGImage]) async throws -> [Output] {
|
||||
try await withThrowingTaskGroup(of: Output.self) { group in
|
||||
for image in images {
|
||||
group.addTask {
|
||||
// Check cancellation before expensive work
|
||||
try Task.checkCancellation()
|
||||
|
||||
let input = try self.prepareInput(image)
|
||||
// Async prediction - thread safe!
|
||||
return try await self.model.prediction(from: input)
|
||||
}
|
||||
}
|
||||
|
||||
return try await group.reduce(into: []) { $0.append($1) }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Warning**: Limit concurrent predictions to avoid memory pressure from multiple input/output buffers.
|
||||
|
||||
```swift
|
||||
// Limit concurrency
|
||||
let semaphore = AsyncSemaphore(value: 2)
|
||||
|
||||
for image in images {
|
||||
group.addTask {
|
||||
await semaphore.wait()
|
||||
defer { semaphore.signal() }
|
||||
return try await process(image)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### Don't - Load models on main thread at launch
|
||||
|
||||
```swift
|
||||
// BAD - blocks UI
|
||||
class AppDelegate {
|
||||
let model = try! MLModel(contentsOf: url) // Blocks!
|
||||
}
|
||||
|
||||
// GOOD - lazy async loading
|
||||
class ModelManager {
|
||||
private var model: MLModel?
|
||||
|
||||
func getModel() async throws -> MLModel {
|
||||
if let model { return model }
|
||||
model = try await Task.detached {
|
||||
try MLModel(contentsOf: url)
|
||||
}.value
|
||||
return model!
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Don't - Reload model for each prediction
|
||||
|
||||
```swift
|
||||
// BAD - reloads every time
|
||||
func predict(_ input: Input) throws -> Output {
|
||||
let model = try MLModel(contentsOf: url) // Expensive!
|
||||
return try model.prediction(from: input)
|
||||
}
|
||||
|
||||
// GOOD - keep model loaded
|
||||
class Predictor {
|
||||
private let model: MLModel
|
||||
|
||||
func predict(_ input: Input) throws -> Output {
|
||||
try model.prediction(from: input)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Don't - Compress without profiling first
|
||||
|
||||
```swift
|
||||
// BAD - blind compression
|
||||
let compressed = palettize_weights(model, 2bit_config) // May break accuracy!
|
||||
|
||||
// GOOD - profile, then compress iteratively
|
||||
// 1. Profile Float16 baseline
|
||||
// 2. Try 8-bit → check accuracy
|
||||
// 3. Try 6-bit → check accuracy
|
||||
// 4. Try 4-bit with grouped channels → check accuracy
|
||||
// 5. Only use 2-bit with training-time compression
|
||||
```
|
||||
|
||||
### Don't - Ignore deployment target
|
||||
|
||||
```python
|
||||
# BAD - misses optimizations
|
||||
mlmodel = ct.convert(traced_model, inputs=[...])
|
||||
|
||||
# GOOD - enables SDPA fusion, per-block quantization, etc.
|
||||
mlmodel = ct.convert(
|
||||
traced_model,
|
||||
inputs=[...],
|
||||
minimum_deployment_target=ct.target.iOS18
|
||||
)
|
||||
```
|
||||
|
||||
## Pressure Scenarios
|
||||
|
||||
### Scenario 1 - "Model is 5GB, need it under 2GB for iPhone"
|
||||
|
||||
**Wrong approach**: Jump straight to 2-bit palettization.
|
||||
|
||||
**Right approach**:
|
||||
1. Start with 8-bit palettization → check accuracy
|
||||
2. Try 6-bit → check accuracy
|
||||
3. Try 4-bit with `per_grouped_channel` → check accuracy
|
||||
4. If still too large, use calibration-based compression
|
||||
5. If still losing accuracy, use training-time compression
|
||||
|
||||
### Scenario 2 - "LLM inference is too slow"
|
||||
|
||||
**Wrong approach**: Try different compute units randomly.
|
||||
|
||||
**Right approach**:
|
||||
1. Profile with Core ML Instrument
|
||||
2. Check if load is cached (look for "cached" vs "prepare and cache")
|
||||
3. Enable stateful KV-cache
|
||||
4. Check SDPA optimization is enabled (iOS 18+ deployment target)
|
||||
5. Consider INT4 quantization for GPU on Mac
|
||||
|
||||
### Scenario 3 - "Need multiple LoRA adapters in one app"
|
||||
|
||||
**Wrong approach**: Ship separate models for each adapter.
|
||||
|
||||
**Right approach**:
|
||||
1. Convert each adapter model separately
|
||||
2. Use `MultiFunctionDescriptor` to merge with shared base
|
||||
3. Load specific function via `config.functionName`
|
||||
4. Weights are deduplicated automatically
|
||||
|
||||
## Checklist
|
||||
|
||||
Before deploying a CoreML model:
|
||||
|
||||
- [ ] Set `minimum_deployment_target` to latest supported iOS
|
||||
- [ ] Profile baseline Float16 performance
|
||||
- [ ] Check if model load is cached
|
||||
- [ ] Consider compression only if size/performance requires it
|
||||
- [ ] Test accuracy after each compression step
|
||||
- [ ] Use async prediction for concurrent workloads
|
||||
- [ ] Limit concurrent predictions to manage memory
|
||||
- [ ] Use state for transformer KV-cache
|
||||
- [ ] Use multi-function for adapter variants
|
||||
|
||||
## Resources
|
||||
|
||||
**WWDC**: 2023-10047, 2023-10049, 2024-10159, 2024-10161
|
||||
|
||||
**Docs**: /coreml, /coreml/mlmodel, /coreml/mltensor
|
||||
|
||||
**Skills**: coreml-ref, coreml-diag, axiom-ios-ai (Foundation Models)
|
||||
496
.claude/skills/axiom-ios-ml/speech/SKILL.md
Normal file
496
.claude/skills/axiom-ios-ml/speech/SKILL.md
Normal file
@@ -0,0 +1,496 @@
|
||||
---
|
||||
name: speech
|
||||
description: Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling.
|
||||
license: MIT
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
# Speech-to-Text with SpeechAnalyzer
|
||||
|
||||
## Overview
|
||||
|
||||
SpeechAnalyzer is Apple's new speech-to-text API introduced in iOS 26. It powers Notes, Voice Memos, Journal, and Call Summarization. The on-device model is faster, more accurate, and better for long-form/distant audio than SFSpeechRecognizer.
|
||||
|
||||
**Key principle**: SpeechAnalyzer is modular—add transcription modules to an analysis session. Results stream asynchronously using Swift's AsyncSequence.
|
||||
|
||||
## Decision Tree - SpeechAnalyzer vs SFSpeechRecognizer
|
||||
|
||||
```
|
||||
Need speech-to-text?
|
||||
├─ iOS 26+ only?
|
||||
│ └─ Yes → SpeechAnalyzer (preferred)
|
||||
├─ Need iOS 10-25 support?
|
||||
│ └─ Yes → SFSpeechRecognizer (or DictationTranscriber)
|
||||
├─ Long-form audio (meetings, lectures)?
|
||||
│ └─ Yes → SpeechAnalyzer
|
||||
├─ Distant audio (across room)?
|
||||
│ └─ Yes → SpeechAnalyzer
|
||||
└─ Short dictation commands?
|
||||
└─ Either works
|
||||
```
|
||||
|
||||
**SpeechAnalyzer advantages**:
|
||||
- Better for long-form and conversational audio
|
||||
- Works well with distant speakers (meetings)
|
||||
- On-device, private
|
||||
- Model managed by system (no app size increase)
|
||||
- Powers Notes, Voice Memos, Journal
|
||||
|
||||
**DictationTranscriber** (iOS 26+): Same languages as SFSpeechRecognizer, but doesn't require user to enable Siri/dictation in Settings.
|
||||
|
||||
## Red Flags
|
||||
|
||||
Use this skill when you see:
|
||||
- "Live transcription"
|
||||
- "Transcribe audio"
|
||||
- "Speech-to-text"
|
||||
- "SpeechAnalyzer" or "SpeechTranscriber"
|
||||
- "Volatile results"
|
||||
- Building Notes-like or Voice Memos-like features
|
||||
|
||||
## Pattern 1 - File Transcription (Simplest)
|
||||
|
||||
Transcribe an audio file to text in one function.
|
||||
|
||||
```swift
|
||||
import Speech
|
||||
|
||||
func transcribe(file: URL, locale: Locale) async throws -> AttributedString {
|
||||
// Set up transcriber
|
||||
let transcriber = SpeechTranscriber(
|
||||
locale: locale,
|
||||
preset: .offlineTranscription
|
||||
)
|
||||
|
||||
// Collect results asynchronously
|
||||
async let transcriptionFuture = try transcriber.results
|
||||
.reduce(AttributedString()) { str, result in
|
||||
str + result.text
|
||||
}
|
||||
|
||||
// Set up analyzer with transcriber module
|
||||
let analyzer = SpeechAnalyzer(modules: [transcriber])
|
||||
|
||||
// Analyze the file
|
||||
if let lastSample = try await analyzer.analyzeSequence(from: file) {
|
||||
try await analyzer.finalizeAndFinish(through: lastSample)
|
||||
} else {
|
||||
await analyzer.cancelAndFinishNow()
|
||||
}
|
||||
|
||||
return try await transcriptionFuture
|
||||
}
|
||||
```
|
||||
|
||||
**Key points**:
|
||||
- `analyzeSequence(from:)` reads file and feeds audio to analyzer
|
||||
- `finalizeAndFinish(through:)` ensures all results are finalized
|
||||
- Results are `AttributedString` with timing metadata
|
||||
|
||||
## Pattern 2 - Live Transcription Setup
|
||||
|
||||
For real-time transcription from microphone.
|
||||
|
||||
### Step 1 - Configure SpeechTranscriber
|
||||
|
||||
```swift
|
||||
import Speech
|
||||
|
||||
class TranscriptionManager: ObservableObject {
|
||||
private var transcriber: SpeechTranscriber?
|
||||
private var analyzer: SpeechAnalyzer?
|
||||
private var analyzerFormat: AudioFormatDescription?
|
||||
private var inputBuilder: AsyncStream<AnalyzerInput>.Continuation?
|
||||
|
||||
@Published var finalizedTranscript = AttributedString()
|
||||
@Published var volatileTranscript = AttributedString()
|
||||
|
||||
func setUp() async throws {
|
||||
// Create transcriber with options
|
||||
transcriber = SpeechTranscriber(
|
||||
locale: Locale.current,
|
||||
transcriptionOptions: [],
|
||||
reportingOptions: [.volatileResults], // Enable real-time updates
|
||||
attributeOptions: [.audioTimeRange] // Include timing
|
||||
)
|
||||
|
||||
guard let transcriber else { throw TranscriptionError.setupFailed }
|
||||
|
||||
// Create analyzer with transcriber module
|
||||
analyzer = SpeechAnalyzer(modules: [transcriber])
|
||||
|
||||
// Get required audio format
|
||||
analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
|
||||
compatibleWith: [transcriber]
|
||||
)
|
||||
|
||||
// Ensure model is available
|
||||
try await ensureModel(for: transcriber)
|
||||
|
||||
// Create input stream
|
||||
let (stream, builder) = AsyncStream<AnalyzerInput>.makeStream()
|
||||
inputBuilder = builder
|
||||
|
||||
// Start analyzer
|
||||
try await analyzer?.start(inputSequence: stream)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2 - Ensure Model Availability
|
||||
|
||||
```swift
|
||||
func ensureModel(for transcriber: SpeechTranscriber) async throws {
|
||||
let locale = Locale.current
|
||||
|
||||
// Check if language is supported
|
||||
let supported = await SpeechTranscriber.supportedLocales
|
||||
guard supported.contains(where: {
|
||||
$0.identifier(.bcp47) == locale.identifier(.bcp47)
|
||||
}) else {
|
||||
throw TranscriptionError.localeNotSupported
|
||||
}
|
||||
|
||||
// Check if model is installed
|
||||
let installed = await SpeechTranscriber.installedLocales
|
||||
if installed.contains(where: {
|
||||
$0.identifier(.bcp47) == locale.identifier(.bcp47)
|
||||
}) {
|
||||
return // Already installed
|
||||
}
|
||||
|
||||
// Download model
|
||||
if let downloader = try await AssetInventory.assetInstallationRequest(
|
||||
supporting: [transcriber]
|
||||
) {
|
||||
// Track progress if needed
|
||||
let progress = downloader.progress
|
||||
try await downloader.downloadAndInstall()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: Models are stored in system storage, not app storage. Limited number of languages can be allocated at once.
|
||||
|
||||
### Step 3 - Handle Results
|
||||
|
||||
```swift
|
||||
func startResultHandling() {
|
||||
Task {
|
||||
guard let transcriber else { return }
|
||||
|
||||
do {
|
||||
for try await result in transcriber.results {
|
||||
let text = result.text
|
||||
|
||||
if result.isFinal {
|
||||
// Finalized result - won't change
|
||||
finalizedTranscript += text
|
||||
volatileTranscript = AttributedString()
|
||||
|
||||
// Access timing info
|
||||
for run in text.runs {
|
||||
if let timeRange = run.audioTimeRange {
|
||||
print("Time: \(timeRange)")
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Volatile result - will be replaced
|
||||
volatileTranscript = text
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
print("Transcription failed: \(error)")
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Pattern 3 - Audio Recording and Streaming
|
||||
|
||||
Connect AVAudioEngine to SpeechAnalyzer.
|
||||
|
||||
```swift
|
||||
import AVFoundation
|
||||
|
||||
class AudioRecorder {
|
||||
private let audioEngine = AVAudioEngine()
|
||||
private var outputContinuation: AsyncStream<AVAudioPCMBuffer>.Continuation?
|
||||
private let transcriptionManager: TranscriptionManager
|
||||
|
||||
func startRecording() async throws {
|
||||
// Request permission
|
||||
guard await AVAudioApplication.requestRecordPermission() else {
|
||||
throw RecordingError.permissionDenied
|
||||
}
|
||||
|
||||
// Configure audio session (iOS)
|
||||
#if os(iOS)
|
||||
let session = AVAudioSession.sharedInstance()
|
||||
try session.setCategory(.playAndRecord, mode: .spokenAudio)
|
||||
try session.setActive(true, options: .notifyOthersOnDeactivation)
|
||||
#endif
|
||||
|
||||
// Set up transcriber
|
||||
try await transcriptionManager.setUp()
|
||||
transcriptionManager.startResultHandling()
|
||||
|
||||
// Stream audio to transcriber
|
||||
for await buffer in try audioStream() {
|
||||
try await transcriptionManager.streamAudio(buffer)
|
||||
}
|
||||
}
|
||||
|
||||
private func audioStream() throws -> AsyncStream<AVAudioPCMBuffer> {
|
||||
let inputNode = audioEngine.inputNode
|
||||
let format = inputNode.outputFormat(forBus: 0)
|
||||
|
||||
inputNode.installTap(
|
||||
onBus: 0,
|
||||
bufferSize: 4096,
|
||||
format: format
|
||||
) { [weak self] buffer, time in
|
||||
self?.outputContinuation?.yield(buffer)
|
||||
}
|
||||
|
||||
audioEngine.prepare()
|
||||
try audioEngine.start()
|
||||
|
||||
return AsyncStream { continuation in
|
||||
outputContinuation = continuation
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Stream Audio with Format Conversion
|
||||
|
||||
```swift
|
||||
extension TranscriptionManager {
|
||||
private var converter: AVAudioConverter?
|
||||
|
||||
func streamAudio(_ buffer: AVAudioPCMBuffer) async throws {
|
||||
guard let inputBuilder, let analyzerFormat else {
|
||||
throw TranscriptionError.notSetUp
|
||||
}
|
||||
|
||||
// Convert to analyzer's required format
|
||||
let converted = try convertBuffer(buffer, to: analyzerFormat)
|
||||
|
||||
// Send to analyzer
|
||||
let input = AnalyzerInput(buffer: converted)
|
||||
inputBuilder.yield(input)
|
||||
}
|
||||
|
||||
private func convertBuffer(
|
||||
_ buffer: AVAudioPCMBuffer,
|
||||
to format: AudioFormatDescription
|
||||
) throws -> AVAudioPCMBuffer {
|
||||
// Lazy initialize converter
|
||||
if converter == nil {
|
||||
let sourceFormat = buffer.format
|
||||
let destFormat = AVAudioFormat(cmAudioFormatDescription: format)!
|
||||
converter = AVAudioConverter(from: sourceFormat, to: destFormat)
|
||||
}
|
||||
|
||||
guard let converter else {
|
||||
throw TranscriptionError.conversionFailed
|
||||
}
|
||||
|
||||
let outputBuffer = AVAudioPCMBuffer(
|
||||
pcmFormat: converter.outputFormat,
|
||||
frameCapacity: buffer.frameLength
|
||||
)!
|
||||
|
||||
try converter.convert(to: outputBuffer, from: buffer)
|
||||
return outputBuffer
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Pattern 4 - Stopping Transcription
|
||||
|
||||
Properly finalize to get remaining volatile results as finalized.
|
||||
|
||||
```swift
|
||||
func stopRecording() async {
|
||||
// Stop audio
|
||||
audioEngine.stop()
|
||||
audioEngine.inputNode.removeTap(onBus: 0)
|
||||
outputContinuation?.finish()
|
||||
|
||||
// Finalize transcription (converts remaining volatile to final)
|
||||
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
|
||||
|
||||
// Cancel any pending tasks
|
||||
recognizerTask?.cancel()
|
||||
}
|
||||
```
|
||||
|
||||
**Critical**: Always call `finalizeAndFinishThroughEndOfInput()` to ensure volatile results are finalized.
|
||||
|
||||
## Pattern 5 - Model Asset Management
|
||||
|
||||
### Check Supported Languages
|
||||
|
||||
```swift
|
||||
// Languages the API supports
|
||||
let supported = await SpeechTranscriber.supportedLocales
|
||||
|
||||
// Languages currently installed on device
|
||||
let installed = await SpeechTranscriber.installedLocales
|
||||
```
|
||||
|
||||
### Deallocate Languages
|
||||
|
||||
Limited number of languages can be allocated. Deallocate unused ones.
|
||||
|
||||
```swift
|
||||
func deallocateLanguages() async {
|
||||
let allocated = await AssetInventory.allocatedLocales
|
||||
for locale in allocated {
|
||||
await AssetInventory.deallocate(locale: locale)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Pattern 6 - Displaying Results with Timing
|
||||
|
||||
Highlight text during audio playback using timing metadata.
|
||||
|
||||
```swift
|
||||
struct TranscriptView: View {
|
||||
let transcript: AttributedString
|
||||
@Binding var playbackTime: CMTime
|
||||
|
||||
var body: some View {
|
||||
Text(highlightedTranscript)
|
||||
}
|
||||
|
||||
var highlightedTranscript: AttributedString {
|
||||
var result = transcript
|
||||
|
||||
for (range, run) in transcript.runs {
|
||||
guard let timeRange = run.audioTimeRange else { continue }
|
||||
|
||||
let isActive = timeRange.containsTime(playbackTime)
|
||||
if isActive {
|
||||
result[range].backgroundColor = .yellow
|
||||
}
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### Don't - Forget to finalize
|
||||
|
||||
```swift
|
||||
// BAD - volatile results lost
|
||||
func stopRecording() {
|
||||
audioEngine.stop()
|
||||
// Missing finalize!
|
||||
}
|
||||
|
||||
// GOOD - volatile results become finalized
|
||||
func stopRecording() async {
|
||||
audioEngine.stop()
|
||||
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
|
||||
}
|
||||
```
|
||||
|
||||
### Don't - Ignore format conversion
|
||||
|
||||
```swift
|
||||
// BAD - format mismatch may fail silently
|
||||
inputBuilder.yield(AnalyzerInput(buffer: rawBuffer))
|
||||
|
||||
// GOOD - convert to analyzer's format
|
||||
let format = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
|
||||
let converted = try convertBuffer(rawBuffer, to: format)
|
||||
inputBuilder.yield(AnalyzerInput(buffer: converted))
|
||||
```
|
||||
|
||||
### Don't - Skip model availability check
|
||||
|
||||
```swift
|
||||
// BAD - may crash if model not installed
|
||||
let transcriber = SpeechTranscriber(locale: locale, ...)
|
||||
// Start using immediately
|
||||
|
||||
// GOOD - ensure model is ready
|
||||
let transcriber = SpeechTranscriber(locale: locale, ...)
|
||||
try await ensureModel(for: transcriber)
|
||||
// Now safe to use
|
||||
```
|
||||
|
||||
## Presets Reference
|
||||
|
||||
| Preset | Use Case |
|
||||
|--------|----------|
|
||||
| `.offlineTranscription` | File transcription, no real-time feedback needed |
|
||||
| `.progressiveLiveTranscription` | Live transcription with volatile updates |
|
||||
|
||||
## Options Reference
|
||||
|
||||
### TranscriptionOptions
|
||||
- Default: None (standard transcription)
|
||||
|
||||
### ReportingOptions
|
||||
- `.volatileResults`: Enable real-time approximate results
|
||||
|
||||
### AttributeOptions
|
||||
- `.audioTimeRange`: Include CMTimeRange for each text segment
|
||||
|
||||
## Platform Availability
|
||||
|
||||
| Platform | SpeechTranscriber | DictationTranscriber |
|
||||
|----------|-------------------|---------------------|
|
||||
| iOS 26+ | Yes | Yes |
|
||||
| macOS Tahoe+ | Yes | Yes |
|
||||
| watchOS 26+ | No | Yes |
|
||||
| tvOS 26+ | TBD | TBD |
|
||||
|
||||
**Hardware requirements**: Varies by device. Use `supportedLocales` to check.
|
||||
|
||||
## Integration with Apple Intelligence
|
||||
|
||||
Combine with Foundation Models for summarization:
|
||||
|
||||
```swift
|
||||
import FoundationModels
|
||||
|
||||
func generateTitle(for transcript: String) async throws -> String {
|
||||
let session = LanguageModelSession()
|
||||
let prompt = "Generate a short, clever title for this story: \(transcript)"
|
||||
let response = try await session.respond(to: prompt)
|
||||
return response.content
|
||||
}
|
||||
```
|
||||
|
||||
See `axiom-ios-ai` skill for Foundation Models details.
|
||||
|
||||
## Checklist
|
||||
|
||||
Before shipping speech-to-text:
|
||||
|
||||
- [ ] Check locale support with `supportedLocales`
|
||||
- [ ] Ensure model with `AssetInventory.assetInstallationRequest`
|
||||
- [ ] Handle download progress for user feedback
|
||||
- [ ] Convert audio to `bestAvailableAudioFormat`
|
||||
- [ ] Enable `.volatileResults` for live transcription
|
||||
- [ ] Call `finalizeAndFinishThroughEndOfInput()` on stop
|
||||
- [ ] Handle timing with `.audioTimeRange` if needed
|
||||
- [ ] Clear volatile results when finalized result arrives
|
||||
- [ ] Request microphone permission before recording
|
||||
|
||||
## Resources
|
||||
|
||||
**WWDC**: 2025-277
|
||||
|
||||
**Docs**: /speech, /speech/speechanalyzer, /speech/speechtranscriber
|
||||
|
||||
**Skills**: coreml (on-device ML), axiom-ios-ai (Foundation Models)
|
||||
Reference in New Issue
Block a user