--- id: TASK-8 title: Implement Playwright website crawling and screenshot capture status: To Do assignee: [] created_date: '2026-06-03 19:13' labels: - mvp - audit - playwright dependencies: - TASK-7 references: - PRD.md priority: high ordinal: 8000 --- ## Description Build the website inspection layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, and store all raw evidence in Convex. ## Acceptance Criteria - [ ] #1 Playwright captures desktop and mobile screenshots for the homepage and stores them in Convex File Storage - [ ] #2 Crawler visits a bounded set of relevant subpages: Kontakt, Impressum, Leistungen/Angebot, Über uns/Team when discoverable - [ ] #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, and CTA/contact-form signals - [ ] #4 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit - [ ] #5 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads ## Implementation Plan 1. Add Playwright runtime setup compatible with local development and Coolify container deployment. 2. Define crawl limits, viewports, timeout behavior, and allowed same-domain URL rules. 3. Capture homepage desktop/mobile screenshots and upload to Convex storage. 4. Discover and inspect relevant subpages with bounded depth. 5. Persist extracted text, metadata, contact candidates, technical checks, screenshots, and errors.