02 — Technical case study
Lead-gen
engine.
An eight-stage ETL pipeline that turns a geographic region into a stream of hyper-targeted B2B cold emails — with custom AI-generated HTML previews and LSSI-CE compliance baked in.
- Role
- Solo engineer
- Year
- 2026
- Status
- Live · production
- Scope
- Pipeline · CLI · Compliance
- Stack
- Python · Playwright · Claude
- Sector
- Sales · Outreach · NDA
01
Pipeline.
Numbered stages · CSV between
Each stage is a standalone script with a single contract — read CSV, write CSV. Re-runnable, atomic, step-by-step debuggable.
- 01
Scrape Maps
Iterate over N localities × M categories through
Google Places API (New). Emitsdata/leads_raw.csvwith place_id, declared website, geo and metadata. Idempotent — safe to re-run without duplicates. - 02
Qualify
Drops leads that don't meet the "poor web" criteria: no site at all, non-https, redirect to social platforms (
linktr.ee, instagram, facebook),PageSpeed mobile < 50, or load > 5s. Reason for rejection logged on every row. - 02.5
Enrich emails (cascade)
Cascading lookup until an email is found: (1) scrape their own site → (2) Google Custom Search snippets → (3) Instagram bio → (4) Facebook "About". Stamps
enrichment_statuswith the exact source ornot_found. - 03
Generate previews
Claude writes a per-lead HTML page — hero, sections, CTA — from a deterministic style brief.
Playwrightopens headless Chromium at 1440×2000 viewport and captures the PNG. Files land inpreviews/<place_id>/. - 04
Send
Delivery through Gmail SMTP with an app password. The preview is embedded
inline cid:for maximum visual impact. Throttle75s ± 25%between sends, max50/run, blacklist and a send log indata/sent_log.csvfor idempotency. - 05+
Multi-channel queue
For leads without a discovered email, secondary scripts feed alternative queues:
05_whatsapp_queue.py+06_open_whatsapp.pyopen WhatsApp Web pre-filled,07_manual_queue.pyexports a CSV for manual handling.
02
Flow.
CSV-driven · idempotent
// pipeline · stage flow
config.yaml
│
▼
┌──────────────┐ leads_raw.csv
│ 01 scrape │ ─────────────────▶ Places API
└──────┬───────┘
▼
┌──────────────┐ leads_qualified.csv
│ 02 qualify │ ─────────────────▶ PageSpeed API
└──────┬───────┘ (filter: poor web)
▼
┌──────────────┐ + email column
│ 02.5 enrich │ ─────────────────▶ web · CSE · IG · FB
└──────┬───────┘ (cascade fallback)
▼
┌──────────────┐ previews/<id>/index.html
│ 03 previews │ ─────────────────▶ Claude API
└──────┬───────┘ Playwright PNG
▼
┌──────────────┐ sent_log.csv
│ 04 send │ ─────────────────▶ Gmail SMTP
└──────────────┘ inline preview · throttle 03
Legal.
Compliance baked-in
Serious B2B cold email — not a growth hack. LSSI-CE compliance encoded into the pipeline itself, not into good intentions.
- Legal framework
- LSSI-CE art. 21 — Spain / EU
- Audience
- Legal entities only (B2B). Personal emails are dropped
- Identification
- Sender name, tax ID and physical address on every email
- Opt-out
- Working unsubscribe link · auto-reply adds the sender to the blacklist
- Throttle
- 75 s ± 25% between sends · 50 emails/run · progressive warm-up
- Idempotency
- sent_log.csv prevents re-sends · place_id as the unique key
04
Integrations.
8 external APIs orchestrated
- Lead source
- Google Places API (New) — geo-segmented searches with configurable radius
- Quality signal
- Google PageSpeed Insights API — mobile score, LCP, blocking time
- Email enrichment
- Cascade — web scraping + Google CSE + Instagram bio + Facebook About
- Content generation
- Anthropic Claude (Sonnet) with a deterministic style prompt
- Visual capture
- Playwright Chromium headless · 1440×2000 viewport · PNG screenshot
- Color sampling
- ColorThief + Pillow to extract the visual palette from the lead's logo/site
- Email rendering
- Jinja2 templates · Premailer for inline CSS · cid: for embedded images
- Delivery
- Gmail SMTP with app password · optional BCC · dry-run mode
05
Stack.
Production dependencies
Core
- Python 3.10+
- pandas
- PyYAML
- python-dotenv
- Jinja2
- requests
AI · Visual
- anthropic 0.39+
- Playwright 1.45+ (Chromium)
- Pillow
- ColorThief
- Premailer
Web
- Flask 3.0 (preview server)
- tldextract
Pipeline
- 8 numbered stages
- Idempotent re-runs
- Single-lead shortcut
- Dry-run mode