Headless browser detection has matured into a science. In 2026, vanilla Playwright or Puppeteer is detected within milliseconds by every major anti-bot vendor (Cloudflare, Datadome, PerimeterX, Akamai, Incapsula). The good news is that the signals they check are well understood, and most of them have known patches.
This post is the technical map: which signals matter, how detection works, and which open-source projects address each one. It's the deep version of how to scrape Cloudflare-protected sites, which covered the operational ladder; this is what's happening one layer down.
The detection surface#
Anti-bot vendors check signals at five layers:
- Network layer: TCP/TLS fingerprint
- HTTP layer: header order, presence, values
- Browser environment: navigator properties, screen, plugins
- JavaScript runtime: missing functions, automation flags, devtools detection
- Behavioral: mouse movement, scroll patterns, timing
Each layer can independently flag a session. A scraper that only patches layer 4 (the most-discussed online) will still fail at layer 1 or 2. Comprehensive patching is necessary.
Layer 1: TLS fingerprinting (JA3, JA4)#
Before HTTP even begins, the TLS handshake reveals a fingerprint of which cipher suites, extensions, elliptic curves, and signature algorithms the client supports. This fingerprint, encoded as JA3 or JA4 hash, is distinctive per (browser, version, OS).
Real Chrome 124 on macOS produces one specific JA4. Headless Chromium produces a slightly different one (different cipher ordering, different ALPN preferences). Anti-bot vendors maintain block lists of "browser claims to be Chrome but TLS fingerprint says it's not" pairs.
Detection: pre-handshake. Your request never reaches the server.
Patch: TLS impersonation libraries.
curl_cffi: Python; wraps curl with browser TLS profiles.requests-compatible API. The current best option for impersonation in Python.tls-client: Go-based; wraps utls. Used by some scraping APIs.undiciwith custom TLS: Node.js; harder, requires custom dispatcher.
Usage:
from curl_cffi import requests
resp = requests.get("https://example.com", impersonate="chrome124")
The impersonate parameter sets the full TLS + HTTP profile. Without it, you're using curl's default fingerprint, which is also blocked.
This single change defeats a lot of "we can't scrape this" cases that people incorrectly attribute to JavaScript rendering. A surprising number of sites that "require headless" actually only require correct TLS.
Layer 2: HTTP header fingerprinting#
Real browsers send specific headers in a specific order with specific values. Headless browsers (and naive HTTP clients) deviate.
Real Chrome 124 on macOS sends, in order:
:authority: example.com
:method: GET
:path: /
:scheme: https
sec-ch-ua: "Chromium";v="124", "Google Chrome";v="124", "Not.A/Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ...
accept: text/html,application/xhtml+xml,...
sec-fetch-site: none
sec-fetch-mode: navigate
sec-fetch-user: ?1
sec-fetch-dest: document
accept-encoding: gzip, deflate, br, zstd
accept-language: en-US,en;q=0.9
A naive httpx.get() sends a different set in a different order with no sec-ch-* or sec-fetch-* headers. Easy to detect.
Detection: post-handshake, pre-page. Server returns a 403 or a captcha page.
Patches:
- For HTTP clients:
curl_cffiwithimpersonate=handles header order automatically. - For Playwright: the browser sends correct headers natively. No patch needed at this layer; it's not where Playwright fails.
- Rotate user agents consistently with
sec-ch-uaheaders. Don't claim Chrome 124 in the user-agent and Chrome 100 insec-ch-ua. Anti-bot vendors check coherence.
If you're rolling your own header set, the canonical reference is the browser-fingerprint repo which captures real browser headers across versions and OSes.
Layer 3: Browser environment fingerprinting#
Once the page loads in a headless browser, JavaScript checks dozens of navigator and window properties for headless-specific values.
Common signals checked by anti-bot scripts:
| Property | Vanilla Headless | Real Browser | Patch |
|---|---|---|---|
navigator.webdriver |
true |
undefined |
--disable-blink-features=AutomationControlled flag, or runtime override |
navigator.plugins.length |
0 |
>0 (PDF viewer, native client) |
Inject plugin objects |
navigator.languages |
[] or ['en-US'] only |
Multiple languages | Set explicit list |
navigator.permissions |
Returns denied for notifications without prompting |
Returns default (prompts) |
Override query method |
navigator.hardwareConcurrency |
2 (Docker default) |
4-16 typically |
Set realistic value |
screen.width / height |
800x600 (default) |
User's screen | Set realistic resolution |
screen.colorDepth |
24 always |
Variable | Set per-platform realistic |
WebGL UNMASKED_VENDOR/RENDERER |
"Brian Paul" / "Mesa OffScreen" | "Apple" / "Apple M2 Pro" etc. | Spoof to platform-realistic |
Notification.permission |
denied |
default |
Override |
window.chrome (object) |
Missing or stub | Full object | Inject realistic object |
Each of these checks runs in JavaScript on every page load, takes microseconds, and any one of them being wrong is a flag. A scraper that patches webdriver but leaves screen.width: 800 is still detected.
Patches:
puppeteer-extra-plugin-stealth(Node) andplaywright-stealth(Python) bundle ~20 patches that handle most of these.patchright: a fork of Playwright that integrates the stealth patches into Playwright's core. Generally newer and better-maintained thanplaywright-stealthin 2026.camoufox: a stealth fork of Firefox. Different tradeoffs from Chromium-based stealth. Firefox is less common and easier to slip through some Chrome-specific signature checks, but more conspicuous on sites that fingerprint by browser share.
Using patchright:
from patchright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto("https://example.com")
The patches are applied automatically. You write code as if it were vanilla Playwright.
Layer 4: JavaScript runtime fingerprinting#
Beyond the navigator object, anti-bot scripts check for things only present in automation runtimes:
- CDP detection: Chromium with the
--remote-debugging-portexposed leaves traces in the page context. Specific window properties, specific behavior ofFunction.prototype.toString.call(window.alert). - Devtools detection: when devtools is open in real Chrome,
window.outerWidth - window.innerWidthdiffers by a known amount. Headless browsers without devtools mismatch. - Object proxying detection: stealth plugins patch some properties using JavaScript Proxies. Anti-bot scripts can detect proxied objects via
toStringmismatches or constructor checks. - Function source detection:
navigator.plugins.toString()returns specific strings on real browsers. Stealth patches sometimes get these wrong.
patchright and playwright-stealth address most of these, but not all. The cat-and-mouse here is constant; what worked in February 2026 may be detected by April. The stealth-plugin maintainers ship updates regularly.
Layer 5: Behavioral fingerprinting#
The hardest signal to spoof, and increasingly the differentiator at the high end of anti-bot.
Real users move the mouse in non-linear paths, scroll with momentum, click after a brief hover, and exhibit reading-time pauses on text-heavy pages. Bots click coordinates instantly, scroll programmatically, and load pages back-to-back without dwell time.
Signals checked:
- Mouse movement before click (real users hover-then-click; bots click cold)
- Scroll smoothness and speed (real scrolling has acceleration curves)
- Page dwell time (real users spend seconds-to-minutes; bots fetch and leave)
- Inter-action timing (real users have ms-scale jitter; bots have predictable intervals)
- Form fill timing (real users type at variable speed; bots type uniformly or paste)
Patches: simulate human behavior. puppeteer-mouse-helper, humanize-pyppeteer, and similar libraries inject realistic mouse paths and timing. Even basic measures (random sleep between actions, mouse hover before click, scroll with sleep) defeat naive behavioral checks.
For sites with sophisticated behavioral checks, you may need to combine this with residential proxies (so the IP looks consumer) and pacing (so per-IP rate looks human, often <50 page loads per hour).
The state of the art in 2026#
For an aggressive site (Cloudflare Pro/Enterprise tier, Datadome, PerimeterX), the comprehensive recipe in 2026 looks like:
- Network:
curl_cffifor plain fetch with TLS impersonation; for headless, ensure the headless build uses real Chromium TLS (not Chrome Headless mode). - HTTP: real browser headers in real order. Coherent
sec-ch-ua/ user-agent /Accept-*. - Browser env:
patchrightorcamoufoxwith current stealth bundle. - JS runtime: turn off
--remote-debugging-portif possible; if not, isolate via the patchright CDP-hiding patches. - Behavioral: random pre-click hover, randomized inter-action sleep, scroll with sleep, page dwell.
- Network identity: residential proxy from a reputable provider (IPRoyal, Bright Data) with sticky sessions matching dwell time.
- Captcha: budget for CapSolver / 2Captcha when challenges appear, ~$0.0008-$0.003 per solve.
That's the full stack. Building it yourself is a multi-week project that requires ongoing maintenance as defenders ship new detections.
What a managed scraping API handles#
A hosted scraping API like Runo ships the bypass stack server-side. You don't have to think about JA3 fingerprints, sec-ch-ua coherence, navigator.webdriver, or behavioral pacing. You send a URL and a schema; you get back typed JSON. The plumbing is the product.
When you should still roll your own#
Three cases:
- You're already a scraping company with an in-house team that maintains the stack. Marginal cost is low; control is high.
- You have unusual requirements (specific geographies, specific browser versions, specific TLS profiles) that scraping APIs don't expose.
- You're scraping a single hard target at extreme volume (e.g. 10M+ requests/month on one site) where amortising the build cost makes sense.
For everyone else, the buy case is overwhelming. The detection surface is wider and harder than it looks, and the maintenance burden never stops.
A quick test for "do I need stealth?"#
Before reaching for the heavy stack, run this test:
from curl_cffi import requests
resp = requests.get(target_url, impersonate="chrome124")
print(resp.status_code, len(resp.text))
If that returns 200 with the real page content, you don't need headless at all. About 60-70% of "I can't scrape this site" cases I've seen turn out to be solved by curl_cffi alone.
If it returns 403 or a block page, escalate to headless with patchright. If patchright returns 403 too, you're up against captcha/Datadome and need a captcha solver plus residential proxy budget.
TL;DR#
- Anti-bot detection runs at five layers: TLS fingerprint, HTTP headers, browser environment, JS runtime, behavior. Patching one layer doesn't help if another is wrong.
- TLS impersonation via
curl_cffi(Python) defeats roughly 60-70% of "needs scraping bypass" cases without any headless browser at all. - For headless rendering,
patchright(Chromium) andcamoufox(Firefox) bundle current stealth patches;playwright-stealthis older but still works. - The hardest layer is behavioral: real mouse paths, scroll momentum, dwell time. Sophisticated sites check this; spoofing requires deliberate humanization libraries.
- For aggressive sites (Cloudflare Pro, Datadome), full stack: TLS impersonation + stealth headless + cookie persistence + captcha solver + residential proxy + behavioral pacing.
- Building this yourself is a multi-week project with ongoing maintenance burden. Most teams should buy. Runo ships the full bypass stack server-side.
- Quick triage: try
curl_cffifirst. If that fails, escalate to headless. If headless fails, escalate to paid bypass tiers.