Is web scraping legal in 2026? A practical guide for builders

Lithograph by Honoré Daumier

This is not legal advice. This is the lay of the land for builders who want to ship scraping-backed products without an angry letter. Talk to a lawyer for your specific situation; use this to know what to ask them.

The short version: scraping public web data is generally legal in the US and most of Europe in 2026, with caveats around terms of service, copyright, personal data, and circumvention. Below is what the major rulings actually held, the patterns that get people sued, and the safe-by-default operating posture.

What changed since 2022#

Three things shifted the legal landscape in the early 2020s that are still load-bearing in 2026:

  1. hiQ Labs v. LinkedIn (2022 settlement, after Ninth Circuit): scraping publicly accessible data is not a Computer Fraud and Abuse Act violation. The CFAA's "without authorization" requires bypassing an actual access barrier (login, paywall), not violating a public-page terms of service.
  2. Van Buren v. United States (2021): SCOTUS narrowed CFAA's "exceeds authorized access" to mean accessing parts of a system you weren't permitted to access at all, not misusing parts you were permitted to access.
  3. Meta v. Bright Data (2024): A federal court held that scraping public Facebook/Instagram pages without logging in didn't breach Meta's terms of service. Logging in and scraping after the login does breach them, because click-wrap acceptance bound the logged-in user.

The pattern across all three: public data is fair game; access barriers and contract acceptance are what creates liability.

Web scraping liability comes from four buckets, not one:

Vector What triggers it Worst-case outcome
Computer Fraud and Abuse Act (CFAA) Bypassing technical access controls (login, IP block, captcha) Federal criminal liability + civil damages
Breach of contract (ToS) Logged-in scraping that violates click-wrapped terms Civil damages, injunction
Copyright infringement Republishing protected content beyond fair use Statutory damages ($750-$30K/work)
Personal data laws (GDPR, CCPA, etc.) Collecting personal data without legal basis Up to 4% of global revenue (GDPR)

Each vector is independent. You can be CFAA-clean and copyright-clean and still get nailed under GDPR. Plan for all four.

The CFAA layer: don't bypass barriers#

After hiQ and Van Buren, the bright line is technical access controls. Scraping a public page logged out: not a CFAA issue. Scraping after creating fake accounts to log in: probably a CFAA issue. Bypassing rate limits via rotating IPs: contested but increasingly viewed as not a CFAA issue if no login is involved. Bypassing captcha: contested; some courts treat it as bypassing an access barrier.

The defensive posture: don't log in unless your business requires it. If you can extract the same data from logged-out public pages, do that. The legal exposure drops by an order of magnitude.

If you must log in (e.g., scraping LinkedIn for lead-gen):

  • Use real accounts you control, not bought or bot-created accounts
  • Read the platform's terms; assume you're bound by them
  • Have a legal opinion on the specific use case before scaling

The ToS layer: the click matters#

Terms of service are contracts. Like any contract, they require offer + acceptance + consideration. The acceptance part is where the law has clarified:

  • Browsewrap (a footer link that says "by using this site you agree..."): courts increasingly hold these unenforceable. Mere site visits don't constitute acceptance.
  • Clickwrap (an explicit "I agree" button at signup): clearly enforceable. You're bound.
  • Sign-in-wrap (terms shown at login): generally enforceable.

For scraping, this means:

  • Logged-out scraping of a public site: no clickwrap, browsewrap is weak, contractual liability is low
  • Logged-in scraping: clickwrap accepted at signup; you're bound

This is why the Meta v. Bright Data ruling matters. Bright Data wasn't logged in. The terms cited were browsewrap. Court said no contract.

You're allowed to read copyrighted material. You're allowed to index and analyze it (this is what Google, archive.org, every search engine relies on). You're not generally allowed to republish substantial portions without a license.

For scraping use cases, the relevant distinctions:

Likely fair use:

  • Extracting facts and structured data (prices, ratings, addresses, dates). Facts aren't copyrightable.
  • Building search indexes that link back to the original
  • Building training datasets where the model output isn't a substantial reproduction
  • Analytics and aggregation across many sources

Likely infringement:

  • Republishing full articles or substantial excerpts on your own site
  • Hosting full images you didn't license, especially for commercial use
  • Reproducing the "expressive selection and arrangement" of a database (think: a competitor's entire product catalog with their exact descriptions and photos)

The Authors Guild v. Google Books decisions (2013-2015) are the clearest precedent: scraping and indexing for transformative purposes is fair use, even when the underlying works are copyrighted. The line is "transformative" and "non-substitutive" (your use doesn't replace consuming the original).

If you're using LLM extraction (which Runo does, see LLM extraction vs CSS selectors), one helpful pattern: extract structured facts, not prose. A schema like {title, price, sku, availability} returns facts; it doesn't reproduce the seller's marketing copy. The output is materially different from the input, which strengthens a fair-use argument.

The personal data layer: GDPR is the heavyweight#

If your scraping touches data about identifiable EU residents, GDPR applies regardless of where you're based. The relevant provisions:

  • Lawful basis required: you need one of six lawful bases (consent, contract, legal obligation, vital interests, public task, legitimate interests). For scraping, "legitimate interests" is the usual claim, which requires a balancing test.
  • Data subject rights: people you have data on can request access, correction, deletion. You need a process for handling those requests.
  • Sensitive categories (health, ethnicity, political opinions, sexual orientation, biometric, etc.) require explicit consent or one of a narrow set of exceptions. Scraping these is much riskier.
  • Cross-border transfers: if you store EU personal data outside the EU, you need transfer safeguards (SCCs, adequacy decision).

The CCPA (California) and similar state laws (Colorado, Virginia, Connecticut, Utah) have similar but less stringent rules. The trend across the US is toward more state privacy laws, not fewer.

The defensive posture for personal data scraping:

  • Don't scrape personal data unless you need it. The cleanest GDPR posture is "we don't collect personal data."
  • If you must, document a legitimate-interests assessment (LIA) per the European Data Protection Board's guidance.
  • Build a deletion pipeline. When a data subject requests deletion, you need to actually delete from your systems within 30 days.
  • Don't scrape sensitive categories without explicit consent. The damages math is brutal.

The patterns that get people sued#

Empirically, these are the patterns that draw cease-and-desists and lawsuits:

  1. Republishing content verbatim. Aggregators that copy article text or product images.
  2. Hammering with no rate limit. Sites notice when one IP makes 10K requests/hour. Even if scraping is legal, the abuse case in front of a judge is unsympathetic.
  3. Logged-in scraping at scale. LinkedIn, Facebook, Twitter, OnlyFans have all sued aggressively for this.
  4. Selling the scraped data back to the source's competitors (especially if the source can show market harm).
  5. Bypassing technical controls (captcha, IP block, paid wall) at scale.
  6. Scraping personal data without a clean GDPR posture (this is the one most likely to bite you in 2026).

The patterns that don't usually draw lawsuits:

  1. Logged-out scraping of public pages, with polite rate limits, for analytical or transformative purposes.
  2. Building indexes and search products that link back to the source.
  3. Extracting structured facts (prices, ratings, public stats) without reproducing prose.
  4. Scraping with robots.txt respect and reasonable per-host backoff.

A safe-by-default operating posture#

If you're building a scraping product or feature, the defaults that minimize legal exposure:

  1. Stay logged out. Only log in when the business case clearly requires it, with legal review.
  2. Respect robots.txt. It's not legally binding everywhere, but it's the cheapest way to demonstrate good faith.
  3. Polite rate limits. Per-host jitter and backoff. Don't hammer.
  4. Identify your bot in User-Agent when you can. Hiding makes you look worse if challenged.
  5. Extract facts, not prose. Schema-typed structured data has a stronger fair-use story than raw HTML republication.
  6. Avoid personal data unless necessary. If necessary, do a GDPR/CCPA assessment first.
  7. Don't bypass captchas at scale. Especially if the site clearly intends them as access controls.
  8. Document everything. A legitimate-interests assessment, a robots.txt compliance log, a data subject request process. The paper trail matters when challenged.

What Runo handles vs what you handle#

Using a scraping API doesn't transfer legal liability. The customer is the one extracting and using the data; the API is a tool, like a web browser is a tool. That said, Runo handles the technical-civility side automatically:

  • Per-host adaptive backoff
  • robots.txt respect by default
  • No login support (we don't accept credentials, on purpose)
  • Extracts structured fields, not prose

What you handle:

  • Whether to scrape a particular site at all
  • Whether your use case has a copyright fair-use story
  • Whether the data you collect triggers GDPR or CCPA
  • The data subject rights process if it does

When to talk to a lawyer#

Cheap heuristics for "this needs legal review":

  • You're scraping logged-in pages or data behind a login wall
  • You're touching personal data, especially EU resident data
  • You're republishing scraped content on your own surface
  • You've received a cease-and-desist letter (always)
  • You're scraping a direct competitor in a way that could be characterized as market harm
  • You're scraping a site that publicly states "no scraping" in their terms and you're a US-incorporated business in the same jurisdiction as the target

Most builders don't actually need legal review for typical scraping use cases (logged-out, structured-data extraction, polite rate limits, transformative downstream use). Most of the cease-and-desists go to people who skipped the basics.

Jurisdiction notes#

  • United States: most permissive after hiQ + Van Buren. Federal CFAA narrowly applied; copyright fair use doctrine relatively friendly to indexing/analysis.
  • United Kingdom: post-Brexit, similar to EU but with diverging case law. Computer Misuse Act 1990 is broader than CFAA; be careful.
  • European Union: GDPR is the binding constraint for any data touching EU residents. ECJ has not been as scraper-friendly as the Ninth Circuit.
  • Germany: case law has been particularly hostile to commercial scraping of competitor sites. Higher risk for B2B scraping there.
  • France: CNIL (the data protection authority) is aggressive. Personal-data scraping draws scrutiny.
  • China: extensive data export restrictions; scraping Chinese sites for export to non-Chinese systems has additional regulatory risk.
  • Japan, Australia, Canada: generally similar to US, with their own privacy regimes layered on top.

TL;DR#

  • Scraping public, logged-out web data is generally legal in the US and most of Europe in 2026 after hiQ, Van Buren, and Meta v. Bright Data.
  • Four independent legal vectors: CFAA (don't bypass barriers), contract (don't violate clickwrap'd terms when logged in), copyright (don't republish substantial portions), personal data (GDPR/CCPA require lawful basis).
  • The patterns that get people sued: republishing content, hammering targets, logged-in scraping at scale, selling data to competitors of the source, bypassing captchas, scraping personal data without a clean posture.
  • Safe defaults: stay logged out, respect robots.txt, polite rate limits, extract facts not prose, avoid personal data unless necessary, document compliance.
  • Using an API like Runo doesn't transfer legal liability, but it ships polite defaults (rate limits, robots respect, no login support) automatically.
  • Talk to a lawyer for: logged-in scraping, personal data, content republication, competitor scraping, anything that drew a cease-and-desist.
Shearing the Rams
Lead-Generation10 min read

Lead generation from public web data: a builder's guide

How to extract qualified leads from company websites, public directories, and structured registries without violating terms of service or privacy law.

Departure of William III from Hellevoetsluis
Guide9 min read

The complete guide to web scraping APIs in 2026

What a modern web scraping API actually does, how to evaluate one, and where each category (proxies, browsers, extractors) fits into a real pipeline.

Michelangelo in His Studio Visited by Pope Julius II, by Alexandre Cabanel
Comparison8 min read

Firecrawl vs Apify vs Runo: which scraping API to pick in 2026

An honest, side-by-side look at three popular scraping APIs. What each is built for, where each shines, and where each costs you time and money.