Files
webdev-pipeline/backlog/tasks/task-8 - Implement-Playwright-website-crawling-and-screenshot-capture.md
2026-06-03 21:18:36 +02:00

1.8 KiB

id, title, status, assignee, created_date, labels, dependencies, references, priority, ordinal
id title status assignee created_date labels dependencies references priority ordinal
TASK-8 Implement Playwright website crawling and screenshot capture To Do
2026-06-03 19:13
mvp
audit
playwright
TASK-7
PRD.md
high 8000

Description

Build the website inspection layer using Playwright. For qualified leads, the system should load the company website, inspect the homepage and a small set of relevant subpages, capture desktop/mobile screenshots, extract visible text and contact signals, and store all raw evidence in Convex.

Acceptance Criteria

  • #1 Playwright captures desktop and mobile screenshots for the homepage and stores them in Convex File Storage
  • #2 Crawler visits a bounded set of relevant subpages: Kontakt, Impressum, Leistungen/Angebot, Über uns/Team when discoverable
  • #3 Crawler extracts visible text, page title, meta description, headings, links, phone numbers, email candidates, and CTA/contact-form signals
  • #4 Simple technical checks include HTTPS/final URL, missing title/meta description, visible contact path, and obvious broken internal links within the crawl limit
  • #5 Crawler failures produce useful dashboard-visible errors without blocking unrelated leads

Implementation Plan

  1. Add Playwright runtime setup compatible with local development and Coolify container deployment.
  2. Define crawl limits, viewports, timeout behavior, and allowed same-domain URL rules.
  3. Capture homepage desktop/mobile screenshots and upload to Convex storage.
  4. Discover and inspect relevant subpages with bounded depth.
  5. Persist extracted text, metadata, contact candidates, technical checks, screenshots, and errors.