Back to index
Live · Case study 002

02 — Technical case study

Lead-gen
engine.

An eight-stage ETL pipeline that turns a geographic region into a stream of hyper-targeted B2B cold emails — with custom AI-generated HTML previews and LSSI-CE compliance baked in.

Role
Solo engineer
Year
2026
Status
Live · production
Scope
Pipeline · CLI · Compliance
Stack
Python · Playwright · Claude
Sector
Sales · Outreach · NDA

01

Pipeline.

Numbered stages · CSV between

Each stage is a standalone script with a single contract — read CSV, write CSV. Re-runnable, atomic, step-by-step debuggable.

  1. 01

    Scrape Maps

    Iterate over N localities × M categories through Google Places API (New). Emits data/leads_raw.csv with place_id, declared website, geo and metadata. Idempotent — safe to re-run without duplicates.

  2. 02

    Qualify

    Drops leads that don't meet the "poor web" criteria: no site at all, non-https, redirect to social platforms (linktr.ee, instagram, facebook), PageSpeed mobile < 50, or load > 5s. Reason for rejection logged on every row.

  3. 02.5

    Enrich emails (cascade)

    Cascading lookup until an email is found: (1) scrape their own site → (2) Google Custom Search snippets → (3) Instagram bio → (4) Facebook "About". Stamps enrichment_status with the exact source or not_found.

  4. 03

    Generate previews

    Claude writes a per-lead HTML page — hero, sections, CTA — from a deterministic style brief. Playwright opens headless Chromium at 1440×2000 viewport and captures the PNG. Files land in previews/<place_id>/.

  5. 04

    Send

    Delivery through Gmail SMTP with an app password. The preview is embedded inline cid: for maximum visual impact. Throttle 75s ± 25% between sends, max 50/run, blacklist and a send log in data/sent_log.csv for idempotency.

  6. 05+

    Multi-channel queue

    For leads without a discovered email, secondary scripts feed alternative queues: 05_whatsapp_queue.py + 06_open_whatsapp.py open WhatsApp Web pre-filled, 07_manual_queue.py exports a CSV for manual handling.

02

Flow.

CSV-driven · idempotent

// pipeline · stage flow

  config.yaml
       │
       ▼
  ┌──────────────┐    leads_raw.csv
  │ 01 scrape    │ ─────────────────▶ Places API
  └──────┬───────┘
         ▼
  ┌──────────────┐    leads_qualified.csv
  │ 02 qualify   │ ─────────────────▶ PageSpeed API
  └──────┬───────┘    (filter: poor web)
         ▼
  ┌──────────────┐    + email column
  │ 02.5 enrich  │ ─────────────────▶ web · CSE · IG · FB
  └──────┬───────┘    (cascade fallback)
         ▼
  ┌──────────────┐    previews/<id>/index.html
  │ 03 previews  │ ─────────────────▶ Claude API
  └──────┬───────┘    Playwright PNG
         ▼
  ┌──────────────┐    sent_log.csv
  │ 04 send      │ ─────────────────▶ Gmail SMTP
  └──────────────┘    inline preview · throttle

03

Legal.

Compliance baked-in

Serious B2B cold email — not a growth hack. LSSI-CE compliance encoded into the pipeline itself, not into good intentions.

Legal framework
LSSI-CE art. 21 — Spain / EU
Audience
Legal entities only (B2B). Personal emails are dropped
Identification
Sender name, tax ID and physical address on every email
Opt-out
Working unsubscribe link · auto-reply adds the sender to the blacklist
Throttle
75 s ± 25% between sends · 50 emails/run · progressive warm-up
Idempotency
sent_log.csv prevents re-sends · place_id as the unique key

04

Integrations.

8 external APIs orchestrated

Lead source
Google Places API (New) — geo-segmented searches with configurable radius
Quality signal
Google PageSpeed Insights API — mobile score, LCP, blocking time
Email enrichment
Cascade — web scraping + Google CSE + Instagram bio + Facebook About
Content generation
Anthropic Claude (Sonnet) with a deterministic style prompt
Visual capture
Playwright Chromium headless · 1440×2000 viewport · PNG screenshot
Color sampling
ColorThief + Pillow to extract the visual palette from the lead's logo/site
Email rendering
Jinja2 templates · Premailer for inline CSS · cid: for embedded images
Delivery
Gmail SMTP with app password · optional BCC · dry-run mode

05

Stack.

Production dependencies

Core

  • Python 3.10+
  • pandas
  • PyYAML
  • python-dotenv
  • Jinja2
  • requests

AI · Visual

  • anthropic 0.39+
  • Playwright 1.45+ (Chromium)
  • Pillow
  • ColorThief
  • Premailer

Web

  • Flask 3.0 (preview server)
  • tldextract

Pipeline

  • 8 numbered stages
  • Idempotent re-runs
  • Single-lead shortcut
  • Dry-run mode

End of case study

Need a pipeline
like this?