7.0 KiB
7.0 KiB
Image Pipeline GPU/WASM/WebGL Design
Goal
Migrate the current CPU pixel-loop image pipeline to an incremental hybrid backend (WebGL first, optional WASM kernels, guarded WebGPU experiments) without breaking the existing worker fallback and cancellation semantics.
Current Branch Context
- This branch already reduced duplicate preview work (shared in-flight preview requests, preview coalescing, opt-in histogram behavior).
lib/image-pipeline/worker-client.tsis already the stable orchestration seam for worker-first execution with main-thread fallback.lib/image-pipeline/preview-renderer.tsandlib/image-pipeline/bridge.tscurrently depend on CPUImageDatamutation, so backend switching must preserve existing return contracts (ImageDatafor preview, encodedBlobfor full render).
Codebase Mapping Findings
lib/image-pipeline/render-core.ts
- Contains all effect kernels as CPU loops over
Uint8ClampedArray(curves,color-adjust,light-adjust,detail-adjust). applyPipelineStepis a strong integration seam: per-step dispatch already exists and can route by backend capability.- Abort checks are periodic and lightweight; migration must preserve
AbortErrorbehavior.
lib/image-pipeline/preview-renderer.ts
- Flow today: decode source bitmap -> draw 2D canvas ->
getImageData-> apply step loops -> optional histogram. - This file owns preview sizing and output shape; it should remain the public API while delegating kernel execution to a backend adapter.
- The yield loop (
requestAnimationFrame/setTimeout) is CPU-friendly but should become backend-aware (GPU path should avoid unnecessary per-step yielding).
lib/image-pipeline/worker-client.ts
- Central execution coordinator with worker transport, request dedupe, cancel fan-out, and main-thread fallback.
- Best place to inject backend telemetry and backend selection hints (for rollout and safety).
- Existing fallback logic (worker unavailable -> main thread) maps directly to backend fallback ladder and kill-switch controls.
Architecture Choice
Adopt a Backend Adapter + Capability Router architecture:
- Keep existing preview/full public APIs and message formats stable.
- Introduce an internal backend interface for step execution and pixel transport.
- Select backend at runtime by capability + feature flag + step support matrix.
- Preserve CPU backend as correctness source and terminal fallback.
Integration Seam
- New seam:
executePipelineSteps()in a backend-router module called by bothpreview-renderer.tsandbridge.ts. render-core.tsbecomes the CPU backend implementation instead of the global default path.- Worker protocol receives optional diagnostics metadata (selected backend, fallback reason), not required for correctness.
Fallback Ladder (authoritative order)
- WebGL2 worker backend (preferred for preview/full when supported and flag-enabled)
- WASM SIMD kernel backend (for unsupported GPU steps or devices with poor GPU availability)
- CPU worker backend (
render-core.tscurrent path) - CPU main-thread fallback (existing
worker-client.tsbehavior)
Each downgrade records a structured reason (unsupported_api, shader_compile_failed, capability_mismatch, flag_disabled, runtime_error) so rollout decisions are data-driven.
Context7 Guidance Buckets
Safe now
- WebGL2 shader pipeline for color/curve/light/detail kernels with runtime feature detection.
- OffscreenCanvas in worker where available, with same-origin + secure-context guardrails.
- WASM SIMD (single-threaded) behind feature detection (
WebAssembly.validateon SIMD module or runtime probe). - Worker-first execution with transferable buffers (already used in worker client).
Experimental
- WebGPU compute path (
navigator.gpu/WorkerNavigator.gpu): gate behind an explicit experiment flag and per-browser allowlist. - Step fusion and mixed GPU+WASM scheduling: only after baseline correctness/perf metrics are stable.
- Advanced WebGL packing optimizations (multi-pass fusion, half-float intermediates) after parity suite exists.
Not now
- WASM threads + SharedArrayBuffer requirement: requires cross-origin isolation (COOP/COEP) and can impact app embedding/3rd-party integrations.
- WebGPU-first default backend: browser/feature variability still too high for this repository's reliability target.
- Full rewrite of decode/encode stack around custom codecs: exceeds scope; current
drawImage+ canvas/blob path remains.
Phased Rollout
Phase 0: Foundation + Instrumentation (no behavior change)
- Add backend router with CPU-only implementation.
- Add telemetry points for preview/full latency, fallback reason, and error class.
- Add golden-image parity harness for all 4 current step types.
Phase 1: WebGL PoC (preview only)
- Implement WebGL2 backend for
curvesandcolor-adjustfirst. - Enable by dev flag on preview pipeline only.
- Validate output parity against CPU baseline under tolerance thresholds.
Phase 2: WebGL expansion + guarded production rollout
- Add
light-adjustanddetail-adjustkernels. - Enable progressive rollout (1% -> 10% -> 25% -> 50% -> 100%) with kill switch.
- Keep full render on CPU backend until preview parity and crash rate are stable.
Phase 3: WASM SIMD fallback kernels
- Add WASM SIMD implementation for hot CPU kernels as middle rung between WebGL and CPU.
- Use when WebGL unavailable or disabled, still inside worker.
Phase 4: Full-render backend adoption
- Route full render through same backend router.
- Keep output encode path unchanged.
- Roll out in smaller increments than preview due to export-critical path.
Phase 5: WebGPU experiment track
- Optional experiment branch with strict allowlist and observability.
- No default enablement in this plan.
Risk Controls
- Feature flags:
imagePipeline.backend.webgl.enabledimagePipeline.backend.wasm.enabledimagePipeline.backend.webgpu.experimentimagePipeline.backend.forceCpu
- Auto-disable circuit breaker when fallback/error thresholds exceed SLO windows.
- Golden-image regression suite enforced in CI before any rollout increase.
- Keep worker protocol backward compatible until migration fully lands.
Metrics and Success Criteria
- Preview latency (
p50,p95) for representative step stacks and image sizes. - Full render latency (
p50,p95) for default export sizes. - Main-thread long task count during rapid slider edits.
- Backend selection distribution (% webgl/wasm/cpu/main-thread).
- Fallback and runtime error rate per backend.
- Pixel parity drift (max absolute channel delta and histogram similarity).
Success target for initial migration:
-
=30% preview
p95improvement on supported devices. - No increase in user-visible render failures.
- <=1% forced downgrade due to runtime backend errors after rollout stabilization.
Out of Scope
- New visual adjustment features.
- Server-side/offline rendering architecture changes.
- COOP/COEP rollout for SAB-threaded WASM.