Implement Playwright website crawling and screenshot capture
To Do
2026-06-03 19:13
2026-06-04 14:08
mvp
audit
playwright
TASK-7
PRD.md
high
8000
Description
Build the website inspection and contact-enrichment layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, store all raw evidence in Convex, and feed found email candidates back into the TASK-7 qualification rules before a lead remains in Kontakt fehlt. Google Places does not provide business email fields, so website crawl evidence is the primary MVP source for usable business email addresses.
Acceptance Criteria
#1 Playwright captures desktop and mobile screenshots for the homepage and stores them in Convex File Storage
#2 Crawler visits a bounded set of relevant subpages: Kontakt, Impressum, Leistungen/Angebot, Über uns/Team when discoverable
#4 Extracted email candidates are classified through the TASK-7 rules: generic business emails are preferred; named emails are accepted only when explicitly published as business contact addresses; no guessed addresses are generated
#5 Leads discovered by Google Places with a website are automatically scheduled for contact enrichment before they remain in Kontakt fehlt; found usable email updates the lead contact fields and status while preserving phone and source data
#6 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
#7 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads
Implementation Plan
Add Playwright runtime setup compatible with local development and Coolify container deployment.
Normalize and score email candidates, then call the existing TASK-7 lead review/contact qualification path so usable emails update lead contact fields and unqualified named emails do not.
Add contact-enrichment run state and dashboard-visible run events/errors for leads that still need manual contact research.
Persist extracted raw evidence, technical checks, screenshots, and crawler errors in Convex.
Implementation Notes
Expanded TASK-8 to cover website-based contact enrichment because Google Places does not provide business email fields. This keeps email handling evidence-based and reuses TASK-7 qualification rules instead of guessing addresses.