Improve audit pipeline and outreach review

2026-06-08 22:16:32 +02:00
parent ff18fc202e
commit 1695110e0a
34 changed files with 2792 additions and 238 deletions
--- a/Add-Convex-specialist-fan-out-audit-pipeline.md
+++ b/Add-Convex-specialist-fan-out-audit-pipeline.md
@@ -0,0 +1,48 @@
+---
+id: TASK-46
+title: Add Convex specialist fan-out audit pipeline
+status: In Progress
+assignee: []
+created_date: '2026-06-08 09:04'
+updated_date: '2026-06-08 09:19'
+labels: []
+dependencies: []
+priority: high
+ordinal: 48000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Implement an evidence-first specialist fan-out/fan-in audit generation pipeline in Convex so audits produce verified, reviewable findings before German copy and publication.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Specialist audit stages run after evidence collection and before German copy
+- [x] #2 Specialist findings include typed evidence refs and unsupported claims are rejected
+- [x] #3 Verified findings are persisted separately and surfaced on audit detail pages
+- [x] #4 Quality review blocks when either model QA or German copy guard fails
+- [x] #5 Skill summaries use real registry purpose or instructions
+- [x] #6 Schema, evidence, action-source, persistence, quality gate, and UI tests pass
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Add RED tests for specialist schemas, evidence IDs, action ordering, persistence, QA gates, and UI rendering
+2. Implement schema validators and evidence ledger helpers
+3. Add auditFindings persistence and detail query joins
+4. Wire specialist fan-out stages and evidence verifier before German copy
+5. Make qualityReview model invalid state blocking and improve skill summaries
+6. Update audit detail UI to render findings with evidence chips
+7. Run focused tests, typecheck, and full test suite where feasible
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+RED: pnpm exec tsc -p tsconfig.test.json fails because AuditEvidenceInput has no evidenceLedger and lib/ai/schemas exports no specialist/verifier schemas yet. This is the expected missing-feature failure.
+
+GREEN: Focused audit fan-out/source/UI tests passed 67/67. Full pnpm test passed 384/384. Implemented specialist fan-out stages, evidence ledger, auditFindings persistence, blocking model+guard QA, real skill summaries, and findings-first audit detail UI.
+<!-- SECTION:NOTES:END -->
--- a/Fix-evidence-verifier-audit-generation-failure.md
+++ b/Fix-evidence-verifier-audit-generation-failure.md
@@ -0,0 +1,48 @@
+---
+id: TASK-47
+title: Fix evidence verifier audit generation failure
+status: In Progress
+assignee: []
+created_date: '2026-06-08 09:35'
+updated_date: '2026-06-08 10:07'
+labels: []
+dependencies: []
+priority: high
+ordinal: 49000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Diagnose and fix the evidenceVerifier stage failure in the Convex specialist fan-out audit pipeline so live audit generation can complete or fail with actionable verifier diagnostics.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Root cause is identified from persisted run or generation evidence
+- [x] #2 Evidence verifier schema or prompt no longer fails on valid specialist outputs
+- [x] #3 Audit generation preserves strict evidence gates without schema-induced false failures
+- [x] #4 Focused and full regression tests pass
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Pull the failing evidenceVerifier error details from Convex run/generation records
+2. Add a RED regression test for the root cause
+3. Fix the verifier schema/prompt or fallback behavior at the source
+4. Run focused fan-out tests and full pnpm test
+5. Record verification notes and keep task In Progress until user confirms live audit works
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Root cause from Convex auditGenerations/agentRunEvents: all specialist structured-output calls failed before content generation because Azure rejected the response_format schema. The shared evidenceRef object declared sourceUrl as an optional property, but Azure/OpenAI strict structured outputs require every declared property to be listed in required. The verifier then received an empty findings array and failed on the same schema issue.
+Fix: made Specialist/Verifier output schemas strict-output compatible by requiring sourceUrl and required array fields, added explicit prompt guidance for sourceUrl/status/findings/notes, and replaced rejectedFindings with a narrow rejection schema so unknown/generic rejected claims do not have to pass the publishable finding schema.
+Verification: RED test reproduced schema.findings[].evidenceRefs[].sourceUrl missing from required; focused schema tests now pass; fan-out/persistence/UI tests pass; pnpm test passes 386/386; git diff --check passes; ESLint on touched source/test files passes.
+
+Second live failure root cause: after the strict schema fix, specialist stages succeeded, but evidenceVerifier failed with "No object generated: could not parse the response." The persisted verifier prompt contained about 10 full specialist findings and the verifier schema required echoing full verifiedFindings objects back. With the classification profile capped at 1200 output tokens, this made verifier output too large/fragile to parse. Context7 AI SDK docs confirmed AI SDK 6 uses strict OpenAI JSON schema behavior by default; the issue was now output shape/size rather than schema rejection.
+Fix: changed evidenceVerifier output to compact verifiedFindingIds plus small rejected decisions, then deterministically map accepted IDs back to original specialist findings in the action. This preserves strict evidence gates while removing verifier echoing/mutation of findings.
+Verification: added RED schema regression for compact verifier IDs and many findings; focused schema/action tests pass; adjacent audit persistence/schema/UI/evidence tests pass; pnpm test passes 387/387; git diff --check passes; ESLint on touched files passes; npx convex dev --once synced the fix to dev deployment.
+<!-- SECTION:NOTES:END -->
--- a/Integrate-impeccable-critique-into-audit-pipeline.md
+++ b/Integrate-impeccable-critique-into-audit-pipeline.md
@@ -0,0 +1,43 @@
+---
+id: TASK-48
+title: Integrate impeccable critique into audit pipeline
+status: In Progress
+assignee: []
+created_date: '2026-06-08 12:02'
+updated_date: '2026-06-08 12:10'
+labels: []
+dependencies: []
+priority: high
+ordinal: 50000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Extend the evidence-first audit pipeline with design critique/impeccable-style visual and UX evaluation, especially the critique skill, while keeping verified findings evidence-linked and customer-safe.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Critique/impeccable skill guidance is inspected and translated into bounded audit stages or skill prompts
+- [x] #2 New critique findings stay evidence-linked and flow through the compact evidence verifier
+- [x] #3 German copy synthesis consumes only verified critique findings, not raw skill output
+- [x] #4 Audit UI exposes critique findings with evidence chips and actual skill purpose text
+- [x] #5 Focused and full regression tests cover the new critique integration
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Inspect impeccable/critique skill guidance and current audit pipeline shape
+2. Define a compact critique/impeccable stage that maps skill guidance into evidence-backed audit findings
+3. Add schemas/prompts or stage wiring without expanding verifier output size
+4. Update UI/tests so critique findings are visible with evidence and real skill purpose
+5. Run focused and full regression tests, deploy Convex dev, keep task In Progress for live confirmation
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Implemented the impeccable/critique integration as an evidence-bound audit extension. Inspected the local impeccable and critique skills; no project-specific .impeccable.md was present, so the product guidance was translated into bounded audit behavior instead of broad design taste claims. Added the V3 skill registry entry `impeccable-critique`, prioritized it in selected local audit skills, and wired a new Convex `critiqueSpecialist` stage between visual trust and performance/accessibility. The stage is instructed to produce only evidence-linked findings using skillId `impeccable-critique`; the existing compact verifier and German synthesis path remain the gate, so raw specialist output is not customer-facing. UI tests continue to cover evidence chips and real registry purpose text. Verification: focused specialist/evidence tests 45/45 passed; skill/UI tests 15/15 passed; full `pnpm test` 388/388 passed; `git diff --check` passed; targeted ESLint passed; `npx convex dev --once` synced successfully.
+<!-- SECTION:NOTES:END -->
--- a/Improve-audit-outreach-email-tone.md
+++ b/Improve-audit-outreach-email-tone.md
@@ -0,0 +1,43 @@
+---
+id: TASK-49
+title: Improve audit outreach email tone
+status: In Progress
+assignee: []
+created_date: '2026-06-08 19:30'
+updated_date: '2026-06-08 19:48'
+labels: []
+dependencies: []
+priority: high
+ordinal: 51000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Add evidence-first, collegial-direct tonal guidelines for generated outreach emails, wire them into the existing German copy stage without extra AI calls, and hard-block unnatural email copy before outreach_ready.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [x] #1 Shared customer tone guidelines capture the selected collegial-direct email style and banned patterns
+- [x] #2 German copy prompts use the tone guidelines, explicit lead context, at most two verified findings, and no extra AI stage or model call
+- [x] #3 Deterministic German copy guard blocks unnatural email subjects and bodies while keeping public audit tone checks limited to existing rules
+- [x] #4 Quality review applies the same first-contact email rubric
+- [x] #5 Focused and full regression tests cover natural email pass cases, unnatural email failures, source wiring, and no new generation stage
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Add failing tests for natural vs. formulaic outreach email tone
+2. Add shared collegial-direct tone guideline module
+3. Add deterministic hard guard for email subject/body tone
+4. Wire guidelines into German copy and quality review prompts without a new AI stage
+5. Run focused tests, full regression, lint, diff check, and Convex dev sync
+<!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Implemented the evidence-first outreach email tone pass. Added `lib/ai/customer-tone-guidelines.ts` with the selected collegial-direct sender posture, short first-contact email constraints, banned phrases, and prompt helper. Updated German copy generation to remove the old Ich-Ich instruction, include the shared tone section, pass normalized evidence context, and keep the existing generation call structure. Added hard deterministic email tone checks for subject length/pitch patterns, email length, sentence/paragraph count, formulaic Ich-habe/Ich-schlage-vor patterns, brochure language, mini-audit structure, informal address, and missing low-friction asks. Public audit hard guard behavior remains limited to the existing rules. Quality review now explicitly asks whether the email sounds like a real first email from Matthias, not AI sales copy, and whether concrete claims are backed by verified findings. Verification: focused tests 60/60 passed; full `pnpm test` 395/395 passed; targeted ESLint passed; `git diff --check` passed; `npx convex dev --once` synced successfully after fixing the Convex-only typecheck issue by passing `evidenceInput` instead of raw evidence.
+<!-- SECTION:NOTES:END -->
--- a/Refactor-dashboard-views-into-compact-cards.md
+++ b/Refactor-dashboard-views-into-compact-cards.md
@@ -0,0 +1,39 @@
+---
+id: TASK-50
+title: Refactor dashboard views into compact cards
+status: In Progress
+assignee: []
+created_date: '2026-06-08 19:56'
+updated_date: '2026-06-08 19:57'
+labels: []
+dependencies: []
+priority: high
+ordinal: 52000
+---
+
+## Description
+
+<!-- SECTION:DESCRIPTION:BEGIN -->
+Implement the planned internal Ops UX refactor for Campaigns, Leads, Audits, and Review Workspace using compact shadcn-style cards, modal/detail disclosure, and accessible status feedback.
+<!-- SECTION:DESCRIPTION:END -->
+
+## Acceptance Criteria
+<!-- AC:BEGIN -->
+- [ ] #1 Campaigns render as a responsive card grid while preserving existing campaign actions and run logs.
+- [ ] #2 Leads show compact cards and open the review form in an accessible modal from Mehr anzeigen.
+- [ ] #3 Audits use responsive cards with detail links for audit rows and non-clickable pipeline states for generation rows.
+- [ ] #4 Review Workspace uses compact queue cards with a single selected detail editor while preserving existing save, publish, approve, and send flows.
+- [ ] #5 Relevant tests, lint, and build pass or any remaining blockers are documented.
+<!-- AC:END -->
+
+## Implementation Plan
+
+<!-- SECTION:PLAN:BEGIN -->
+1. Add failing UI/source tests for card-grid, lead modal, audit cards, and review master-detail
+2. Implement Campaigns responsive grid and accessible card semantics
+3. Move Leads inline review details into Dialog modal
+4. Replace Audits row table with responsive cards
+5. Convert Review Workspace to queue cards plus selected detail editor
+6. Run focused tests, then lint/build where feasible
+7. Record verification notes on TASK-50 without marking Done
+<!-- SECTION:PLAN:END -->