Fix audit generation and enrichment fallback

2026-06-07 23:03:57 +02:00
parent e9463e8ef2
commit 470fb0f348
10 changed files with 2190 additions and 138 deletions
--- a/Externalisiere-die-persönliche-Audit-Pipeline.md
+++ b/Externalisiere-die-persönliche-Audit-Pipeline.md
@@ -0,0 +1,76 @@
+---
+id: TASK-30
+title: Externalisiere die persönliche Audit-Pipeline
+status: In Progress
+assignee: []
+created_date: '2026-06-06 18:44'
+updated_date: '2026-06-07 20:27'
+labels: []
+dependencies: []
+priority: high
+ordinal: 32000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Baue die Pipeline für audit.matthias-meister-webdesign.de so um, dass ressourcenintensive Website-Erfassung über externe API-Services statt Playwright läuft, während die Codebase später SaaS-fähig bleibt.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Neue Audit-Pipeline nutzt Jina/ScreenshotOne/PageSpeed/OpenRouter über serverseitige Managed-Konfiguration und schreibt bestehende Audit-Artefakte weiter.
+- [x] #2 Usage- und Kostenereignisse werden pro Lauf/Provider persistiert und im Settings-/Readiness-Kontext sichtbar gemacht.
+- [x] #3 Die v3-Skill-Registry wird geparst und in Audit-Generierung sowie Tests über das neue Finding-Schema genutzt.
+- [x] #4 Outreach bleibt persönlicher SMTP-Dogfood-Kanal; bestehende Freigabe-Gates bleiben intakt und SaaS-Mailbox-Onboarding wird nicht eingeführt.
+- [x] #5 Bestehende Tests plus neue TDD-Tests für Service-Adapter, Usage-Logging und Skill-Registry laufen erfolgreich.
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Baseline und Arbeitsbranch sichern
+2. Service-Adapter und Usage-Logging TDD implementieren
+3. v3-Skill-Registry und Audit-Schema TDD implementieren
+4. Pipeline-Orchestrierung auf externe Services umstellen
+5. Settings/Readiness und Dokumentation aktualisieren
+6. Reviews, Integration und vollständige Verifikation
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Worker B: Start TDD-Slice fuer v3 Skill-Registry und Finding-Schemas. Write-Set: lib/skills-registry.ts, lib/ai/schemas.ts, Skill-/Schema-Tests.
+
+Baseline vor Umsetzung: `pnpm test` grün mit 307/307 Tests auf Branch `codex/pipeline-first-external-services`. Drei parallele Worker gestartet: Service-Adapter/Usage, v3-Skill-Registry/Schema, Operations-Readiness/Doku.
+
+Worker B: GREEN fuer v3 Registry/Schemas. parseSkillsRegistry erkennt v3 YAML-Metablocks aus v2_elemente/skills.md und bleibt legacy-kompatibel; AI-Schemas enthalten v3 Finding-Items plus Audit-Aggregate. Gezielte Worker-B-Tests: 17/17 gruen. Gesamtes pnpm test weiterhin durch parallele fremde Tests blockiert (external-audit-services, operational-readiness).
+
+Worker B: Final fokussierte Verifikation nach Mischformat-Test: 18/18 gruen fuer audit-skill-registry-v3, ai-schemas und skills-registry.
+
+Worker B Quality Review: Start TDD-Fix fuer strengere v3 Audit-Schemas und keine heuristischen v3-Kategorien.
+
+Worker B Quality Review: GREEN. v3 Audit-Schema rejected blank text/empty arrays, ctaType auf anruf|termin|rueckruf begrenzt; v3 Registry gibt ohne explizite Kategorie keine category mehr aus. Fokussierte Tests: 21/21 gruen.
+
+Worker B Quality Review: Erweiterte fokussierte Verifikation inkl. audit-evidence: 27/27 gruen.
+
+Grundslices reviewt: A Service-Adapter/Usage approved, B v3 Skill-Registry/Schemas approved, C Operations-Readiness/Doku approved. Reviewer-Verifikation: C `pnpm test` 321/321; B fokussiert 21/21; A fokussiert 7/7.
+
+Worker D: GREEN fuer Convex Usage-/Kostenpersistenz-Slice. Added usageEvents schema with provider/operation/runId/leadId/auditId/estimatedCostUsd/tokens/callCounts/createdAt, bounded indexes, internal recordUsageEvent mutation, and bounded usage queries by latest/run/lead/audit/provider. RED confirmed via failing usage-events-source contract before implementation; final verification `pnpm test -- tests/usage-events-source.test.ts` passed with tsc and 332/332 tests. Task intentionally remains In Progress pending orchestrator/user confirmation.
+
+Worker D Quality Review: GREEN fuer UsageEvents numeric guardrails. RED bestaetigt durch neuen Source-Contract fuer assertValidUsageEventNumbers vor ctx.db.insert. recordUsageEvent validiert jetzt estimatedCostUsd als finite non-negative number und alle token/callCounts-Felder als finite non-negative integers, um negative Werte, NaN, Infinity und Bruchwerte vor Persistenz zu blockieren. Final verification `pnpm test -- tests/usage-events-source.test.ts` passed with tsc and 334/334 tests. Task bleibt In Progress.
+
+UsageEvents-Slice approved: schema/module/tests mit Guardrails fuer finite non-negative Kosten und integer Tokens/CallCounts; D Spec+Quality approved.
+
+Worker E: RED/GREEN fuer externe Audit-Orchestrierung abgeschlossen. RED bestaetigt mit neuem tests/external-audit-pipeline-source.test.ts: fehlende externe Helper, UsageEvents und Jina-Markdown-Anbindung. GREEN: auditGenerationAction bereitet ScreenshotOne/Jina-Capture aus started.lead.websiteUrl/websiteDomain vor, guardet ScreenshotOne ueber SCREENSHOTONE_API_KEY, nutzt optional JINA_API_KEY, persistiert erfolgreiche ScreenshotOne-Bilder via ctx.storage.store + internal.auditGeneration.persistExternalCaptureScreenshot in websiteCrawlScreenshots, gibt Jina-Markdown in buildAuditEvidenceInput/Prompts und protokolliert usageEvents fuer screenshotone/jina audit_capture sowie openrouter audit_generation. Fokussierte Verifikation: pnpm test -- tests/external-audit-pipeline-source.test.ts gruen mit 335/335 Tests.
+
+Worker E Quality Review: RED/GREEN fuer drei Review-Issues abgeschlossen. RED: tests/external-audit-pipeline-source.test.ts fiel auf fehlende Capture-Timeouts/Body-Limits, unsichere Error-Pfade und fehlende German-Copy-Usage-Aggregation. GREEN: auditGenerationAction nutzt EXTERNAL_CAPTURE_TIMEOUT_MS mit AbortController, MAX_SCREENSHOT_BYTES, MAX_JINA_MARKDOWN_BYTES und MAX_JINA_MARKDOWN_CHARS; Screenshot/Jina Bodies werden stream-basiert begrenzt statt response.blob()/response.text(); messageFromError sanitizt ueber sanitizeSecretCandidates inkl. SCREENSHOTONE_API_KEY/JINA_API_KEY und alle Error-Pfade nutzen safeErrorSummary; German-Copy UsageEvent aggregiert alle sechs OpenRouter-Aufrufe der Stufe. Verifikation: pnpm test -- tests/external-audit-pipeline-source.test.ts gruen mit 341/341 Tests.
+
+Orchestrator final verification: AC #1 checked after external Capture/Generation pipeline uses ScreenshotOne/Jina/PageSpeed/OpenRouter server-side configuration, persists screenshots to existing websiteCrawlScreenshots/artifacts, and records provider usage. AC #4 checked because outreach remains the personal SMTP dogfood flow with existing review gates; no SaaS mailbox onboarding was introduced. Final review found no P0/P1 blockers. Task remains In Progress pending Matthias manual confirmation before Done.
+
+2026-06-07: Investigating user report that audit runs fail and Convex table rows mention Azure. Repository search found no azure/Azure/AZURE string in code or backlog, so initial hypothesis is that Azure comes from an external provider/model error surfaced through OpenRouter/AI SDK or persisted raw error details from a live Convex run, not from application code.
+
+2026-06-07: Root cause for failed auditGenerations confirmed from live error: OpenRouter routed an OpenAI-compatible request through an Azure-backed provider path using strict structured outputs. AI SDK 6/OpenAI strictJsonSchema rejects response_format JSON schemas where an object property exists but is omitted from required; Zod .optional() generated exactly that for auditClassificationSchema.usedSkills. Classification failed before any audit could complete. Applied TDD fix: changed generated-output schemas used by generateObject from optional top-level fields to nullable fields for auditClassificationSchema.usedSkills, followUpDraftSchema.followInDays/goals, and qualityReviewSchema.notes; updated prompt/action null handling. RED confirmed focused schema test failed on missing usedSkills; GREEN verification passed: focused ai-schemas test 11/11, pnpm test 365/365, pnpm exec tsc -p tsconfig.json --pretty false, pnpm lint 0 errors with two pre-existing BetterAuth generated warnings, pnpm exec tsc -p convex/tsconfig.json --pretty false. Convex SaaS typecheck could not be completed because sandbox network failed and escalation was rejected due external code/metadata upload risk; user approval is required for that exact command.
+
+2026-06-07 follow-up live Convex investigation for run j97d4ytrzccqcx3vc05dre30rh886wz4 on dev deployment different-caterpillar-213: Azure schema blocker is resolved; classification/multimodal/germanCopy succeeded. Current hard failure is qualityReview. Convex auditGenerations quality parsedJson shows LLM QA isValid=false for subjective copy notes (langatmig/redundant), plus German-Copy-Guard issues. Local reproduction of the live German copy showed deterministic guard false positives: emailBody missed observation/suggestion because observed text used "festgestellt" outside the narrow token pattern, and callScript.closeLine incorrectly required Ich-form for a collaborative closing line. Implemented TDD fix: German guard now recognizes festgestellt/feststellen/feststellbar and noun-form "Vorschlag"; call-script close lines no longer require Ich-form. Audit action now hard-blocks only deterministic German-Copy-Guard failures; subjective LLM QA false is persisted/logged as warning while allowing the audit to continue. Added regression tests for the live copy and source contract. Verification passed: pnpm test 366/366, pnpm exec tsc -p tsconfig.json --pretty false, pnpm lint 0 errors with two existing BetterAuth generated warnings, pnpm exec tsc -p convex/tsconfig.json --pretty false. Attempted Convex dev deployment was rejected by approval reviewer because it changes shared Dev behavior and user has not explicitly approved deployment.
+<!-- SECTION:NOTES:END -->
--- a/Stabilisiere-Website-Enrichment-ohne-Playwright-Abbruch.md
+++ b/Stabilisiere-Website-Enrichment-ohne-Playwright-Abbruch.md
@@ -0,0 +1,50 @@
+---
+id: TASK-43
+title: Stabilisiere Website-Enrichment ohne Playwright-Abbruch
+status: In Progress
+assignee: []
+created_date: '2026-06-07 19:40'
+updated_date: '2026-06-07 20:57'
+labels: []
+dependencies: []
+priority: high
+ordinal: 45000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Investigate and fix the Convex websiteEnrichmentAction crash where Playwright/Chromium closes during lead enrichment after a new lead is created. The action should not fail the lead pipeline when browser-based enrichment crashes.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 The root cause and affected call path are documented in task notes
+- [x] #2 Lead enrichment degrades gracefully when browser/page/context is closed
+- [x] #3 Regression tests cover the browser-closed failure path or removal of Playwright dependency
+- [x] #4 Relevant verification commands pass
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Reproduce and trace the browser-closed failure path in websiteEnrichmentAction
+2. Compare with existing graceful-failure paths and Convex action constraints
+3. Add a RED regression test for page/context/browser closed during page capture
+4. Delegate a minimal fix that degrades enrichment instead of crashing
+5. Run focused and full verification; leave task In Progress until Matthias confirms Done
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Root-cause investigation: The reported Convex log is from internal action websiteEnrichmentAction:processLeadEnrichment, not auditGenerationAction. The action still launches Playwright/Chromium for legacy lead website enrichment. The log shows navigation reached the target page multiple times, then Playwright threw `Target page, context or browser has been closed`. Current code has an outer catch, but the outer finally closes desktopContext/mobileContext/browser without protection; if a resource is already closed, cleanup can throw after the catch and surface as Convex Uncaught Error. Helper-level page.close() calls are also unprotected and can obscure the original browser failure. Hypothesis: cleanup must be best-effort and browser/page instability should finish the run as failed/degraded, queue PageSpeed if possible, and patch lead reason instead of crashing the action runtime.
+
+TASK-43 Worker update: Website-Enrichment-only fix. RED test added in tests/website-enrichment-action.test.ts for best-effort Playwright cleanup; initial focused run failed on missing isPlaywrightTargetClosedError/closePlaywrightResourceSafely contract. Minimal fix in convex/websiteEnrichmentAction.ts adds isPlaywrightTargetClosedError and closePlaywrightResourceSafely; page.close(), desktopContext.close(), mobileContext.close(), and browser.close() now run through the safe helper. Target/page/context/browser closed cleanup errors are swallowed so the existing action catch/failure path can persist failed runs, queue PageSpeed when possible, and patch lead reason. Unexpected cleanup close failures are swallowed with console.warn. No AuditGeneration, ScreenshotOne, or Jina slices touched by this TASK-43 change. Verification: pnpm test -- tests/website-enrichment-action.test.ts passed after RED/GREEN (386 pass, 0 fail); pnpm exec tsc --noEmit passed; pnpm lint passed with 2 existing generated-file warnings in convex/betterAuth/_generated; pnpm test passed (364 pass, 0 fail); git diff --check passed.
+
+Live follow-up 2026-06-07 22:34 CEST: Audit generation now succeeds, but website_enrichment still fails before useful extraction when TASK8_BROWSER_ASSET_URL / Chromium source is not configured. New objective for this task slice: remove the Chromium/Playwright hard requirement by adding a no-browser enrichment path, or otherwise prevent the website_enrichment run from failing solely because no browser asset is configured.
+
+Follow-up fix: The live Convex run j9737mz0tkgdbg6mzjxjd1w7018878b1 failed because processLeadEnrichment still treated missing TASK8_BROWSER_ASSET_URL / Chromium source as a fatal Playwright bootstrap error. Added a browserless fetch fallback in convex/websiteEnrichmentAction.ts: when no Chromium source is configured, the action records a warning, fetches homepage/relevant static subpages directly with bounded response reads, extracts metadata/links/contact candidates via the existing website-crawler helpers, persists websiteCrawlPages/websiteCrawlLinks/websiteEmailCandidates/websiteTechnicalChecks with screenshots=[], patches the lead, queues PageSpeed, and finishes website_enrichment as succeeded if direct crawl succeeds. Existing Playwright path remains available when Chromium is configured. Regression source tests now cover the no-Chromium branch and browserless persistence. Verification: pnpm test -- tests/website-enrichment-action.test.ts passed; pnpm exec tsc -p convex/tsconfig.json --pretty false passed; pnpm exec tsc -p tsconfig.json --pretty false passed; pnpm test passed (368/368); pnpm lint passed with 2 existing generated BetterAuth warnings; git diff --check passed.
+
+Final verification after robustness cleanup: pnpm test -- tests/website-enrichment-action.test.ts passed (392/392 in focused harness); pnpm exec tsc -p convex/tsconfig.json --pretty false passed; pnpm exec tsc -p tsconfig.json --pretty false passed; git diff --check passed; pnpm test passed (368/368); pnpm lint passed with the same two generated BetterAuth unused-disable warnings and 0 errors.
+<!-- SECTION:NOTES:END -->