This snapshot establishes the camera-to-result recognition flow and related tests while checking in the project skill/docs assets required for the configured local tooling.
479 lines
15 KiB
Markdown
479 lines
15 KiB
Markdown
---
|
|
name: axiom-hang-diagnostics
|
|
description: Use when app freezes, UI unresponsive, main thread blocked, watchdog termination, or diagnosing hang reports from Xcode Organizer or MetricKit
|
|
license: MIT
|
|
---
|
|
|
|
# Hang Diagnostics
|
|
|
|
Systematic diagnosis and resolution of app hangs. A hang occurs when the main thread is blocked for more than 1 second, making the app unresponsive to user input.
|
|
|
|
## Red Flags — Check This Skill When
|
|
|
|
| Symptom | This Skill Applies |
|
|
|---------|-------------------|
|
|
| App freezes briefly during use | Yes — likely hang |
|
|
| UI doesn't respond to touches | Yes — main thread blocked |
|
|
| "App not responding" system dialog | Yes — severe hang |
|
|
| Xcode Organizer shows hang diagnostics | Yes — field hang reports |
|
|
| MetricKit MXHangDiagnostic received | Yes — aggregated hang data |
|
|
| Animations stutter or skip | Maybe — could be hitch, not hang |
|
|
| App feels slow but responsive | No — performance issue, not hang |
|
|
|
|
## What Is a Hang
|
|
|
|
A **hang** is when the main runloop cannot process events for more than 1 second. The user taps, but nothing happens.
|
|
|
|
```
|
|
User taps → Main thread busy/blocked → Event queued → 1+ second delay → HANG
|
|
```
|
|
|
|
**Key distinction**: The main thread handles ALL user input. If it's busy or blocked, the entire UI freezes.
|
|
|
|
### Hang vs Hitch vs Lag
|
|
|
|
| Issue | Duration | User Experience | Tool |
|
|
|-------|----------|-----------------|------|
|
|
| **Hang** | >1 second | App frozen, unresponsive | Time Profiler, System Trace |
|
|
| **Hitch** | 1-3 frames (16-50ms) | Animation stutters | Animation Hitches instrument |
|
|
| **Lag** | 100-500ms | Feels slow but responsive | Time Profiler |
|
|
|
|
**This skill covers hangs.** For hitches, see `axiom-swiftui-performance`. For general lag, see `axiom-performance-profiling`.
|
|
|
|
## The Two Causes of Hangs
|
|
|
|
Every hang has one of two root causes:
|
|
|
|
### 1. Main Thread Busy
|
|
|
|
The main thread is doing work instead of processing events.
|
|
|
|
**Subcategories**:
|
|
|
|
| Type | Example | Fix |
|
|
|------|---------|-----|
|
|
| **Proactive work** | Pre-computing data user hasn't requested | Lazy initialization, compute on demand |
|
|
| **Irrelevant work** | Processing all notifications, not just relevant ones | Filter notifications, targeted observers |
|
|
| **Suboptimal API** | Using blocking API when async exists | Switch to async API |
|
|
|
|
### 2. Main Thread Blocked
|
|
|
|
The main thread is waiting for something else.
|
|
|
|
**Subcategories**:
|
|
|
|
| Type | Example | Fix |
|
|
|------|---------|-----|
|
|
| **Synchronous IPC** | Calling system service synchronously | Use async API variant |
|
|
| **File I/O** | `Data(contentsOf:)` on main thread | Move to background queue |
|
|
| **Network** | Synchronous URL request | Use URLSession async |
|
|
| **Lock contention** | Waiting for lock held by background thread | Reduce critical section, use actors |
|
|
| **Semaphore/dispatch_sync** | Blocking on background work | Restructure to async completion |
|
|
|
|
## Decision Tree — Diagnosing Hangs
|
|
|
|
```
|
|
START: App hangs reported
|
|
│
|
|
├─→ Do you have hang diagnostics from Organizer or MetricKit?
|
|
│ │
|
|
│ ├─→ YES: Examine stack trace
|
|
│ │ │
|
|
│ │ ├─→ Stack shows your code running
|
|
│ │ │ → BUSY: Main thread doing work
|
|
│ │ │ → Profile with Time Profiler
|
|
│ │ │
|
|
│ │ └─→ Stack shows waiting (semaphore, lock, dispatch_sync)
|
|
│ │ → BLOCKED: Main thread waiting
|
|
│ │ → Profile with System Trace
|
|
│ │
|
|
│ └─→ NO: Can you reproduce?
|
|
│ │
|
|
│ ├─→ YES: Profile with Time Profiler first
|
|
│ │ │
|
|
│ │ ├─→ High CPU on main thread
|
|
│ │ │ → BUSY: Optimize the work
|
|
│ │ │
|
|
│ │ └─→ Low CPU, thread blocked
|
|
│ │ → Use System Trace to find what's blocking
|
|
│ │
|
|
│ └─→ NO: Enable MetricKit in app
|
|
│ → Wait for field reports
|
|
│ → Check Organizer > Hangs
|
|
```
|
|
|
|
## Tool Selection
|
|
|
|
| Scenario | Primary Tool | Why |
|
|
|----------|-------------|-----|
|
|
| **Reproduces locally** | Time Profiler | See exactly what main thread is doing |
|
|
| **Blocked thread suspected** | System Trace | Shows thread state, lock contention |
|
|
| **Field reports only** | Xcode Organizer | Aggregated hang diagnostics |
|
|
| **Want in-app data** | MetricKit | MXHangDiagnostic with call stacks |
|
|
| **Need precise timing** | System Trace | Nanosecond-level thread analysis |
|
|
|
|
## Time Profiler Workflow for Hangs
|
|
|
|
1. **Launch Instruments** → Select Time Profiler template
|
|
2. **Record during hang** → Reproduce the freeze
|
|
3. **Stop recording** → Find the hang period in timeline
|
|
4. **Select hang region** → Drag to select frozen timespan
|
|
5. **Examine call tree** → Look for main thread work
|
|
|
|
**What to look for**:
|
|
- Functions with high "Self Time" on main thread
|
|
- Unexpectedly deep call stacks
|
|
- System calls that shouldn't be on main thread
|
|
|
|
## System Trace Workflow for Blocked Hangs
|
|
|
|
1. **Launch Instruments** → Select System Trace template
|
|
2. **Record during hang** → Capture thread states
|
|
3. **Find main thread** → Filter to main thread
|
|
4. **Look for red/orange** → Blocked states
|
|
5. **Examine blocking reason** → Lock, semaphore, IPC
|
|
|
|
**Thread states**:
|
|
- **Running (blue)**: Executing code
|
|
- **Preempted (orange)**: Runnable but not scheduled
|
|
- **Blocked (red)**: Waiting for resource
|
|
|
|
## Common Hang Patterns and Fixes
|
|
|
|
### Pattern 1: Synchronous File I/O
|
|
|
|
**Before (hangs)**:
|
|
```swift
|
|
// Main thread blocks on file read
|
|
func loadUserData() {
|
|
let data = try! Data(contentsOf: largeFileURL) // BLOCKS
|
|
processData(data)
|
|
}
|
|
```
|
|
|
|
**After (async)**:
|
|
```swift
|
|
func loadUserData() {
|
|
Task.detached {
|
|
let data = try Data(contentsOf: largeFileURL)
|
|
await MainActor.run {
|
|
self.processData(data)
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Pattern 2: Unfiltered Notification Observer
|
|
|
|
**Before (processes all)**:
|
|
```swift
|
|
NotificationCenter.default.addObserver(
|
|
self,
|
|
selector: #selector(handleChange),
|
|
name: .NSManagedObjectContextObjectsDidChange,
|
|
object: nil // Receives ALL contexts
|
|
)
|
|
```
|
|
|
|
**After (filtered)**:
|
|
```swift
|
|
NotificationCenter.default.addObserver(
|
|
self,
|
|
selector: #selector(handleChange),
|
|
name: .NSManagedObjectContextObjectsDidChange,
|
|
object: relevantContext // Only this context
|
|
)
|
|
```
|
|
|
|
### Pattern 3: Expensive Formatter Creation
|
|
|
|
**Before (creates each time)**:
|
|
```swift
|
|
func formatDate(_ date: Date) -> String {
|
|
let formatter = DateFormatter() // EXPENSIVE
|
|
formatter.dateStyle = .medium
|
|
return formatter.string(from: date)
|
|
}
|
|
```
|
|
|
|
**After (cached)**:
|
|
```swift
|
|
private static let dateFormatter: DateFormatter = {
|
|
let formatter = DateFormatter()
|
|
formatter.dateStyle = .medium
|
|
return formatter
|
|
}()
|
|
|
|
func formatDate(_ date: Date) -> String {
|
|
Self.dateFormatter.string(from: date)
|
|
}
|
|
```
|
|
|
|
### Pattern 4: dispatch_sync to Main Thread
|
|
|
|
**Before (deadlock risk)**:
|
|
```swift
|
|
// From background thread
|
|
DispatchQueue.main.sync { // BLOCKS if main is blocked
|
|
updateUI()
|
|
}
|
|
```
|
|
|
|
**After (async)**:
|
|
```swift
|
|
DispatchQueue.main.async {
|
|
self.updateUI()
|
|
}
|
|
```
|
|
|
|
### Pattern 5: Semaphore for Async Result
|
|
|
|
**Before (blocks main thread)**:
|
|
```swift
|
|
func fetchDataSync() -> Data {
|
|
let semaphore = DispatchSemaphore(value: 0)
|
|
var result: Data?
|
|
|
|
URLSession.shared.dataTask(with: url) { data, _, _ in
|
|
result = data
|
|
semaphore.signal()
|
|
}.resume()
|
|
|
|
semaphore.wait() // BLOCKS MAIN THREAD
|
|
return result!
|
|
}
|
|
```
|
|
|
|
**After (async/await)**:
|
|
```swift
|
|
func fetchData() async throws -> Data {
|
|
let (data, _) = try await URLSession.shared.data(from: url)
|
|
return data
|
|
}
|
|
```
|
|
|
|
### Pattern 6: Lock Contention
|
|
|
|
**Before (shared lock)**:
|
|
```swift
|
|
class DataManager {
|
|
private let lock = NSLock()
|
|
private var cache: [String: Data] = [:]
|
|
|
|
func getData(for key: String) -> Data? {
|
|
lock.lock() // Main thread waits for background
|
|
defer { lock.unlock() }
|
|
return cache[key]
|
|
}
|
|
}
|
|
```
|
|
|
|
**After (actor)**:
|
|
```swift
|
|
actor DataManager {
|
|
private var cache: [String: Data] = [:]
|
|
|
|
func getData(for key: String) -> Data? {
|
|
cache[key] // Actor serializes access safely
|
|
}
|
|
}
|
|
```
|
|
|
|
### Pattern 7: App Launch Hang (Watchdog)
|
|
|
|
**Before (too much work)**:
|
|
```swift
|
|
func application(_ application: UIApplication,
|
|
didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
|
|
loadAllUserData() // Expensive
|
|
setupAnalytics() // Network calls
|
|
precomputeLayouts() // CPU intensive
|
|
return true
|
|
}
|
|
```
|
|
|
|
**After (deferred)**:
|
|
```swift
|
|
func application(_ application: UIApplication,
|
|
didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
|
|
// Only essential setup
|
|
setupMinimalUI()
|
|
return true
|
|
}
|
|
|
|
func applicationDidBecomeActive(_ application: UIApplication) {
|
|
// Defer non-essential work
|
|
Task {
|
|
await loadUserDataInBackground()
|
|
}
|
|
}
|
|
```
|
|
|
|
### Pattern 8: Image Processing on Main Thread
|
|
|
|
**Before (blocks UI)**:
|
|
```swift
|
|
func processImage(_ image: UIImage) {
|
|
let filtered = applyExpensiveFilter(image) // BLOCKS
|
|
imageView.image = filtered
|
|
}
|
|
```
|
|
|
|
**After (background processing)**:
|
|
```swift
|
|
func processImage(_ image: UIImage) {
|
|
imageView.image = placeholder
|
|
|
|
Task.detached(priority: .userInitiated) {
|
|
let filtered = applyExpensiveFilter(image)
|
|
await MainActor.run {
|
|
self.imageView.image = filtered
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Xcode Organizer Hang Diagnostics
|
|
|
|
**Window > Organizer > Select App > Hangs**
|
|
|
|
The Organizer shows aggregated hang data from users who opted into sharing diagnostics.
|
|
|
|
**Reading the report**:
|
|
1. **Hang Rate**: Hangs per day per device
|
|
2. **Call Stack**: Where the hang occurred
|
|
3. **Device/OS breakdown**: Which configurations affected
|
|
|
|
**Interpreting call stacks**:
|
|
- **Your code at top**: Main thread busy with your work
|
|
- **System API at top**: You called blocking API on main thread
|
|
- **pthread_mutex/semaphore**: Lock contention or explicit waiting
|
|
|
|
## MetricKit Hang Diagnostics
|
|
|
|
Adopt MetricKit to receive hang diagnostics in your app:
|
|
|
|
```swift
|
|
import MetricKit
|
|
|
|
class MetricsSubscriber: NSObject, MXMetricManagerSubscriber {
|
|
func didReceive(_ payloads: [MXDiagnosticPayload]) {
|
|
for payload in payloads {
|
|
if let hangDiagnostics = payload.hangDiagnostics {
|
|
for diagnostic in hangDiagnostics {
|
|
analyzeHang(diagnostic)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
private func analyzeHang(_ diagnostic: MXHangDiagnostic) {
|
|
// Duration of the hang
|
|
let duration = diagnostic.hangDuration
|
|
|
|
// Call stack tree (needs symbolication)
|
|
let callStack = diagnostic.callStackTree
|
|
|
|
// Send to your analytics
|
|
uploadHangDiagnostic(duration: duration, callStack: callStack)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key MXHangDiagnostic properties**:
|
|
- `hangDuration`: How long the hang lasted
|
|
- `callStackTree`: MXCallStackTree with frames
|
|
- `signatureIdentifier`: For grouping similar hangs
|
|
|
|
## Watchdog Terminations
|
|
|
|
The watchdog kills apps that hang during key transitions:
|
|
|
|
| Transition | Time Limit | Consequence |
|
|
|------------|-----------|-------------|
|
|
| **App launch** | ~20 seconds | App killed, crash logged |
|
|
| **Background transition** | ~5 seconds | App killed |
|
|
| **Foreground transition** | ~10 seconds | App killed |
|
|
|
|
**Watchdog disabled in**:
|
|
- Simulator
|
|
- Debugger attached
|
|
- Development builds (sometimes)
|
|
|
|
**Watchdog kills are logged as crashes** with exception type `EXC_CRASH (SIGKILL)` and termination reason `Namespace RUNNINGBOARD, Code 3735883980` (hex `0xDEAD10CC` — indicates app held a file lock or SQLite database lock while being suspended).
|
|
|
|
## Pressure Scenarios
|
|
|
|
### Scenario 1: Manager Says "Just Add a Loading Spinner"
|
|
|
|
**Situation**: App hangs during data load. Manager suggests adding spinner to "fix" it.
|
|
|
|
**Why this fails**: Adding a spinner doesn't prevent the hang—the UI still freezes, the spinner won't animate, and the app remains unresponsive.
|
|
|
|
**Correct response**: "A spinner won't animate during a hang because the main thread is blocked. We need to move this work off the main thread so the spinner can actually spin and the app stays responsive."
|
|
|
|
### Scenario 2: "It Works Fine in Testing"
|
|
|
|
**Situation**: QA can't reproduce the hang. Logs show it happens in production.
|
|
|
|
**Analysis**:
|
|
1. Field devices have different data sizes
|
|
2. Network conditions vary (slow connection = longer sync)
|
|
3. Background apps consume memory/CPU
|
|
4. Watchdog is disabled in debug builds
|
|
|
|
**Action**:
|
|
- Add MetricKit to capture field diagnostics
|
|
- Test with production-sized datasets
|
|
- Test without debugger attached
|
|
- Check Organizer for hang reports
|
|
|
|
### Scenario 3: "We've Always Done It This Way"
|
|
|
|
**Situation**: Legacy code calls synchronous API on main thread. Refactoring is "too risky."
|
|
|
|
**Why it matters**: Even if it worked before:
|
|
- Data may have grown larger
|
|
- OS updates may have changed timing
|
|
- New devices have different characteristics
|
|
- Users notice more as apps get faster
|
|
|
|
**Approach**:
|
|
1. Add metrics to measure current hang rate
|
|
2. Refactor incrementally with feature flags
|
|
3. A/B test to show improvement
|
|
4. Document risk of not fixing
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
| Anti-Pattern | Why It's Wrong | Instead |
|
|
|--------------|----------------|---------|
|
|
| `DispatchQueue.main.sync` from background | Can deadlock, always blocks | Use `.async` |
|
|
| Semaphore to convert async to sync | Blocks calling thread | Stay async with completion/await |
|
|
| File I/O on main thread | Unpredictable latency | Background queue |
|
|
| Unfiltered notification observer | Processes irrelevant events | Filter by object/name |
|
|
| Creating formatters in loops | Expensive initialization | Cache and reuse |
|
|
| Synchronous network request | Blocks on network latency | URLSession async |
|
|
|
|
## Hang Prevention Checklist
|
|
|
|
Before shipping, verify:
|
|
|
|
- [ ] No `Data(contentsOf:)` or file reads on main thread
|
|
- [ ] No `DispatchQueue.main.sync` from background threads
|
|
- [ ] No semaphore.wait() on main thread
|
|
- [ ] Formatters (DateFormatter, NumberFormatter) are cached
|
|
- [ ] Notification observers filter appropriately
|
|
- [ ] Launch work is minimized (defer non-essential)
|
|
- [ ] Image processing happens off main thread
|
|
- [ ] Database queries don't run on main thread
|
|
- [ ] MetricKit adopted for field diagnostics
|
|
|
|
## Resources
|
|
|
|
**WWDC**: 2021-10258, 2022-10082
|
|
|
|
**Docs**: /xcode/analyzing-responsiveness-issues-in-your-shipping-app, /metrickit/mxhangdiagnostic
|
|
|
|
**Skills**: axiom-metrickit-ref, axiom-performance-profiling, axiom-swift-concurrency, axiom-lldb (interactive thread inspection at freeze point)
|