feat: add lead qualification workflow

This commit is contained in:
2026-06-04 16:09:47 +02:00
parent 15d8bfeb66
commit 59824b7336
19 changed files with 2833 additions and 78 deletions

View File

@@ -4,6 +4,7 @@ title: Implement Playwright website crawling and screenshot capture
status: To Do
assignee: []
created_date: '2026-06-03 19:13'
updated_date: '2026-06-04 14:08'
labels:
- mvp
- audit
@@ -19,24 +20,37 @@ ordinal: 8000
## Description
<!-- SECTION:DESCRIPTION:BEGIN -->
Build the website inspection layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, and store all raw evidence in Convex.
Build the website inspection and contact-enrichment layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, store all raw evidence in Convex, and feed found email candidates back into the TASK-7 qualification rules before a lead remains in Kontakt fehlt. Google Places does not provide business email fields, so website crawl evidence is the primary MVP source for usable business email addresses.
<!-- SECTION:DESCRIPTION:END -->
## Acceptance Criteria
<!-- AC:BEGIN -->
- [ ] #1 Playwright captures desktop and mobile screenshots for the homepage and stores them in Convex File Storage
- [ ] #2 Crawler visits a bounded set of relevant subpages: Kontakt, Impressum, Leistungen/Angebot, Über uns/Team when discoverable
- [ ] #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, and CTA/contact-form signals
- [ ] #4 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
- [ ] #5 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads
- [ ] #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, email source URLs, contact-person context, and CTA/contact-form signals
- [ ] #4 Extracted email candidates are classified through the TASK-7 rules: generic business emails are preferred; named emails are accepted only when explicitly published as business contact addresses; no guessed addresses are generated
- [ ] #5 Leads discovered by Google Places with a website are automatically scheduled for contact enrichment before they remain in Kontakt fehlt; found usable email updates the lead contact fields and status while preserving phone and source data
- [ ] #6 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
- [ ] #7 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads
<!-- AC:END -->
## Implementation Plan
<!-- SECTION:PLAN:BEGIN -->
1. Add Playwright runtime setup compatible with local development and Coolify container deployment.
2. Define crawl limits, viewports, timeout behavior, and allowed same-domain URL rules.
3. Capture homepage desktop/mobile screenshots and upload to Convex storage.
3. Capture homepage desktop/mobile screenshots and upload them to Convex storage.
4. Discover and inspect relevant subpages with bounded depth.
5. Persist extracted text, metadata, contact candidates, technical checks, screenshots, and errors.
5. Extract visible text, metadata, links, phone numbers, email candidates, contact-person context, CTA/contact-form signals, and source URLs.
6. Normalize and score email candidates, then call the existing TASK-7 lead review/contact qualification path so usable emails update lead contact fields and unqualified named emails do not.
7. Add contact-enrichment run state and dashboard-visible run events/errors for leads that still need manual contact research.
8. Persist extracted raw evidence, technical checks, screenshots, and crawler errors in Convex.
<!-- SECTION:PLAN:END -->
## Implementation Notes
<!-- SECTION:NOTES:BEGIN -->
Expanded TASK-8 to cover website-based contact enrichment because Google Places does not provide business email fields. This keeps email handling evidence-based and reuses TASK-7 qualification rules instead of guessing addresses.
<!-- SECTION:NOTES:END -->