9.6 KiB
id, title, status, assignee, created_date, updated_date, labels, dependencies, priority, ordinal
| id | title | status | assignee | created_date | updated_date | labels | dependencies | priority | ordinal |
|---|---|---|---|---|---|---|---|---|---|
| TASK-30 | Externalisiere die persönliche Audit-Pipeline | Done | 2026-06-06 18:44 | 2026-06-10 19:27 | high | 32000 |
Description
Baue die Pipeline für audit.matthias-meister-webdesign.de so um, dass ressourcenintensive Website-Erfassung über externe API-Services statt Playwright läuft, während die Codebase später SaaS-fähig bleibt.
Acceptance Criteria
- #1 Neue Audit-Pipeline nutzt Jina/ScreenshotOne/PageSpeed/OpenRouter über serverseitige Managed-Konfiguration und schreibt bestehende Audit-Artefakte weiter.
- #2 Usage- und Kostenereignisse werden pro Lauf/Provider persistiert und im Settings-/Readiness-Kontext sichtbar gemacht.
- #3 Die v3-Skill-Registry wird geparst und in Audit-Generierung sowie Tests über das neue Finding-Schema genutzt.
- #4 Outreach bleibt persönlicher SMTP-Dogfood-Kanal; bestehende Freigabe-Gates bleiben intakt und SaaS-Mailbox-Onboarding wird nicht eingeführt.
- #5 Bestehende Tests plus neue TDD-Tests für Service-Adapter, Usage-Logging und Skill-Registry laufen erfolgreich.
Implementation Plan
- Baseline und Arbeitsbranch sichern
- Service-Adapter und Usage-Logging TDD implementieren
- v3-Skill-Registry und Audit-Schema TDD implementieren
- Pipeline-Orchestrierung auf externe Services umstellen
- Settings/Readiness und Dokumentation aktualisieren
- Reviews, Integration und vollständige Verifikation
Implementation Notes
Worker B: Start TDD-Slice fuer v3 Skill-Registry und Finding-Schemas. Write-Set: lib/skills-registry.ts, lib/ai/schemas.ts, Skill-/Schema-Tests.
Baseline vor Umsetzung: pnpm test grün mit 307/307 Tests auf Branch codex/pipeline-first-external-services. Drei parallele Worker gestartet: Service-Adapter/Usage, v3-Skill-Registry/Schema, Operations-Readiness/Doku.
Worker B: GREEN fuer v3 Registry/Schemas. parseSkillsRegistry erkennt v3 YAML-Metablocks aus v2_elemente/skills.md und bleibt legacy-kompatibel; AI-Schemas enthalten v3 Finding-Items plus Audit-Aggregate. Gezielte Worker-B-Tests: 17/17 gruen. Gesamtes pnpm test weiterhin durch parallele fremde Tests blockiert (external-audit-services, operational-readiness).
Worker B: Final fokussierte Verifikation nach Mischformat-Test: 18/18 gruen fuer audit-skill-registry-v3, ai-schemas und skills-registry.
Worker B Quality Review: Start TDD-Fix fuer strengere v3 Audit-Schemas und keine heuristischen v3-Kategorien.
Worker B Quality Review: GREEN. v3 Audit-Schema rejected blank text/empty arrays, ctaType auf anruf|termin|rueckruf begrenzt; v3 Registry gibt ohne explizite Kategorie keine category mehr aus. Fokussierte Tests: 21/21 gruen.
Worker B Quality Review: Erweiterte fokussierte Verifikation inkl. audit-evidence: 27/27 gruen.
Grundslices reviewt: A Service-Adapter/Usage approved, B v3 Skill-Registry/Schemas approved, C Operations-Readiness/Doku approved. Reviewer-Verifikation: C pnpm test 321/321; B fokussiert 21/21; A fokussiert 7/7.
Worker D: GREEN fuer Convex Usage-/Kostenpersistenz-Slice. Added usageEvents schema with provider/operation/runId/leadId/auditId/estimatedCostUsd/tokens/callCounts/createdAt, bounded indexes, internal recordUsageEvent mutation, and bounded usage queries by latest/run/lead/audit/provider. RED confirmed via failing usage-events-source contract before implementation; final verification pnpm test -- tests/usage-events-source.test.ts passed with tsc and 332/332 tests. Task intentionally remains In Progress pending orchestrator/user confirmation.
Worker D Quality Review: GREEN fuer UsageEvents numeric guardrails. RED bestaetigt durch neuen Source-Contract fuer assertValidUsageEventNumbers vor ctx.db.insert. recordUsageEvent validiert jetzt estimatedCostUsd als finite non-negative number und alle token/callCounts-Felder als finite non-negative integers, um negative Werte, NaN, Infinity und Bruchwerte vor Persistenz zu blockieren. Final verification pnpm test -- tests/usage-events-source.test.ts passed with tsc and 334/334 tests. Task bleibt In Progress.
UsageEvents-Slice approved: schema/module/tests mit Guardrails fuer finite non-negative Kosten und integer Tokens/CallCounts; D Spec+Quality approved.
Worker E: RED/GREEN fuer externe Audit-Orchestrierung abgeschlossen. RED bestaetigt mit neuem tests/external-audit-pipeline-source.test.ts: fehlende externe Helper, UsageEvents und Jina-Markdown-Anbindung. GREEN: auditGenerationAction bereitet ScreenshotOne/Jina-Capture aus started.lead.websiteUrl/websiteDomain vor, guardet ScreenshotOne ueber SCREENSHOTONE_API_KEY, nutzt optional JINA_API_KEY, persistiert erfolgreiche ScreenshotOne-Bilder via ctx.storage.store + internal.auditGeneration.persistExternalCaptureScreenshot in websiteCrawlScreenshots, gibt Jina-Markdown in buildAuditEvidenceInput/Prompts und protokolliert usageEvents fuer screenshotone/jina audit_capture sowie openrouter audit_generation. Fokussierte Verifikation: pnpm test -- tests/external-audit-pipeline-source.test.ts gruen mit 335/335 Tests.
Worker E Quality Review: RED/GREEN fuer drei Review-Issues abgeschlossen. RED: tests/external-audit-pipeline-source.test.ts fiel auf fehlende Capture-Timeouts/Body-Limits, unsichere Error-Pfade und fehlende German-Copy-Usage-Aggregation. GREEN: auditGenerationAction nutzt EXTERNAL_CAPTURE_TIMEOUT_MS mit AbortController, MAX_SCREENSHOT_BYTES, MAX_JINA_MARKDOWN_BYTES und MAX_JINA_MARKDOWN_CHARS; Screenshot/Jina Bodies werden stream-basiert begrenzt statt response.blob()/response.text(); messageFromError sanitizt ueber sanitizeSecretCandidates inkl. SCREENSHOTONE_API_KEY/JINA_API_KEY und alle Error-Pfade nutzen safeErrorSummary; German-Copy UsageEvent aggregiert alle sechs OpenRouter-Aufrufe der Stufe. Verifikation: pnpm test -- tests/external-audit-pipeline-source.test.ts gruen mit 341/341 Tests.
Orchestrator final verification: AC #1 checked after external Capture/Generation pipeline uses ScreenshotOne/Jina/PageSpeed/OpenRouter server-side configuration, persists screenshots to existing websiteCrawlScreenshots/artifacts, and records provider usage. AC #4 checked because outreach remains the personal SMTP dogfood flow with existing review gates; no SaaS mailbox onboarding was introduced. Final review found no P0/P1 blockers. Task remains In Progress pending Matthias manual confirmation before Done.
2026-06-07: Investigating user report that audit runs fail and Convex table rows mention Azure. Repository search found no azure/Azure/AZURE string in code or backlog, so initial hypothesis is that Azure comes from an external provider/model error surfaced through OpenRouter/AI SDK or persisted raw error details from a live Convex run, not from application code.
2026-06-07: Root cause for failed auditGenerations confirmed from live error: OpenRouter routed an OpenAI-compatible request through an Azure-backed provider path using strict structured outputs. AI SDK 6/OpenAI strictJsonSchema rejects response_format JSON schemas where an object property exists but is omitted from required; Zod .optional() generated exactly that for auditClassificationSchema.usedSkills. Classification failed before any audit could complete. Applied TDD fix: changed generated-output schemas used by generateObject from optional top-level fields to nullable fields for auditClassificationSchema.usedSkills, followUpDraftSchema.followInDays/goals, and qualityReviewSchema.notes; updated prompt/action null handling. RED confirmed focused schema test failed on missing usedSkills; GREEN verification passed: focused ai-schemas test 11/11, pnpm test 365/365, pnpm exec tsc -p tsconfig.json --pretty false, pnpm lint 0 errors with two pre-existing BetterAuth generated warnings, pnpm exec tsc -p convex/tsconfig.json --pretty false. Convex SaaS typecheck could not be completed because sandbox network failed and escalation was rejected due external code/metadata upload risk; user approval is required for that exact command.
2026-06-07 follow-up live Convex investigation for run j97d4ytrzccqcx3vc05dre30rh886wz4 on dev deployment different-caterpillar-213: Azure schema blocker is resolved; classification/multimodal/germanCopy succeeded. Current hard failure is qualityReview. Convex auditGenerations quality parsedJson shows LLM QA isValid=false for subjective copy notes (langatmig/redundant), plus German-Copy-Guard issues. Local reproduction of the live German copy showed deterministic guard false positives: emailBody missed observation/suggestion because observed text used "festgestellt" outside the narrow token pattern, and callScript.closeLine incorrectly required Ich-form for a collaborative closing line. Implemented TDD fix: German guard now recognizes festgestellt/feststellen/feststellbar and noun-form "Vorschlag"; call-script close lines no longer require Ich-form. Audit action now hard-blocks only deterministic German-Copy-Guard failures; subjective LLM QA false is persisted/logged as warning while allowing the audit to continue. Added regression tests for the live copy and source contract. Verification passed: pnpm test 366/366, pnpm exec tsc -p tsconfig.json --pretty false, pnpm lint 0 errors with two existing BetterAuth generated warnings, pnpm exec tsc -p convex/tsconfig.json --pretty false. Attempted Convex dev deployment was rejected by approval reviewer because it changes shared Dev behavior and user has not explicitly approved deployment.
Final Summary
Closed per explicit user request while switching project tracking to pitchfast.