TAG

Posts tagged "Web-Scraping"

12 posts

The Art of Painting, by Johannes Vermeer
Ecommerce10 min read

Schema design patterns for e-commerce extraction

Battle-tested schema patterns for product pages, category pages, reviews, and inventory. Edge cases, type choices, and the fields people forget.

The Chess Players, by Moritz Retzsch
SEO9 min read

Scraping Google SERP results in 2026: what works and what doesn't

Direct Google scraping is a losing battle in 2026. Here's the realistic landscape, the alternatives that work, and how to extract structured data from search results.

Shearing the Rams
Lead-Generation10 min read

Lead generation from public web data: a builder's guide

How to extract qualified leads from company websites, public directories, and structured registries without violating terms of service or privacy law.

Lithograph by Honoré Daumier
Legal9 min read

Is web scraping legal in 2026? A practical guide for builders

What courts, regulators, and contracts actually say about scraping public web data, with the case law that shaped the current landscape and a working playbook.

The Eclipse in Venice
Engineering8 min read

Scraping JavaScript-heavy SPAs: Next.js, Nuxt, and React in 2026

Why plain HTTP fetching returns empty pages on modern frontends, what render targets work, and how to recover server-shipped data without a headless browser.

Departure of William III from Hellevoetsluis
Guide9 min read

The complete guide to web scraping APIs in 2026

What a modern web scraping API actually does, how to evaluate one, and where each category (proxies, browsers, extractors) fits into a real pipeline.

Michelangelo in His Studio Visited by Pope Julius II, by Alexandre Cabanel
Comparison8 min read

Firecrawl vs Apify vs Runo: which scraping API to pick in 2026

An honest, side-by-side look at three popular scraping APIs. What each is built for, where each shines, and where each costs you time and money.

Seascape Study with Rain Cloud
Engineering7 min read

How to scrape Cloudflare-protected sites without getting blocked

A practical, layered approach to defeating Cloudflare's bot challenges in 2026. TLS fingerprints, hardened headless, cookie persistence, and when to escalate.

An Experiment on a Bird in the Air Pump
Engineering7 min read

LLM extraction vs CSS selectors: why selector-based scraping is dead at scale

Selectors break when sites redesign. LLMs extract by semantic meaning. Here's why the tradeoff has flipped, with cost numbers from real workloads.

A Scholar in His Study
Guide8 min read

Extracting structured JSON from any HTML: a developer's guide

How to turn arbitrary web pages into typed JSON shaped to your schema. Covers schema design, type coercion, null handling, and edge cases.

The Astronomer, by Johannes Vermeer
AI-Agents8 min read

Web scraping for AI agents: building the data layer for LLM apps

How to architect the scraping stack behind autonomous agents. Schema-typed data, low-latency tool calls, cost control, and error semantics that don't break the loop.

The Gulf Stream, by Winslow Homer
Guide9 min read

How to monitor competitor prices with a scraping API

A practical guide to building a competitor price monitoring pipeline. Schema design, change detection, alerting, and the legal and operational pitfalls.