feat: add lead qualification workflow

2026-06-04 16:09:47 +02:00
parent 15d8bfeb66
commit 59824b7336
19 changed files with 2833 additions and 78 deletions
--- a/Implement-Playwright-website-crawling-and-screenshot-capture.md
+++ b/Implement-Playwright-website-crawling-and-screenshot-capture.md
@@ -4,6 +4,7 @@ title: Implement Playwright website crawling and screenshot capture
 status: To Do
 assignee: []
 created_date: '2026-06-03 19:13'
+updated_date: '2026-06-04 14:08'
 labels:
  - mvp
  - audit
@@ -19,24 +20,37 @@ ordinal: 8000
 ## Description

 <!-- SECTION:DESCRIPTION:BEGIN -->
-Build the website inspection layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, and store all raw evidence in Convex.
+Build the website inspection and contact-enrichment layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, store all raw evidence in Convex, and feed found email candidates back into the TASK-7 qualification rules before a lead remains in Kontakt fehlt. Google Places does not provide business email fields, so website crawl evidence is the primary MVP source for usable business email addresses.
 <!-- SECTION:DESCRIPTION:END -->

 ## Acceptance Criteria
 <!-- AC:BEGIN -->
 - [ ] #1 Playwright captures desktop and mobile screenshots for the homepage and stores them in Convex File Storage
 - [ ] #2 Crawler visits a bounded set of relevant subpages: Kontakt, Impressum, Leistungen/Angebot, Über uns/Team when discoverable
- [ ] #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, and CTA/contact-form signals
- [ ] #4 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
- [ ] #5 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads
+- [ ] #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, email source URLs, contact-person context, and CTA/contact-form signals
+- [ ] #4 Extracted email candidates are classified through the TASK-7 rules: generic business emails are preferred; named emails are accepted only when explicitly published as business contact addresses; no guessed addresses are generated
+- [ ] #5 Leads discovered by Google Places with a website are automatically scheduled for contact enrichment before they remain in Kontakt fehlt; found usable email updates the lead contact fields and status while preserving phone and source data
+- [ ] #6 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
+- [ ] #7 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads
 <!-- AC:END -->

+
+
 ## Implementation Plan

 <!-- SECTION:PLAN:BEGIN -->
 1. Add Playwright runtime setup compatible with local development and Coolify container deployment.
 2. Define crawl limits, viewports, timeout behavior, and allowed same-domain URL rules.
-3. Capture homepage desktop/mobile screenshots and upload to Convex storage.
+3. Capture homepage desktop/mobile screenshots and upload them to Convex storage.
 4. Discover and inspect relevant subpages with bounded depth.
-5. Persist extracted text, metadata, contact candidates, technical checks, screenshots, and errors.
+5. Extract visible text, metadata, links, phone numbers, email candidates, contact-person context, CTA/contact-form signals, and source URLs.
+6. Normalize and score email candidates, then call the existing TASK-7 lead review/contact qualification path so usable emails update lead contact fields and unqualified named emails do not.
+7. Add contact-enrichment run state and dashboard-visible run events/errors for leads that still need manual contact research.
+8. Persist extracted raw evidence, technical checks, screenshots, and crawler errors in Convex.
 <!-- SECTION:PLAN:END -->
+
+## Implementation Notes
+
+<!-- SECTION:NOTES:BEGIN -->
+Expanded TASK-8 to cover website-based contact enrichment because Google Places does not provide business email fields. This keeps email handling evidence-based and reuses TASK-7 qualification rules instead of guessing addresses.
+<!-- SECTION:NOTES:END -->