Reverse-Engineering Bot Infrastructure & NHT Mechanics

Look at your log-level data. The humans left a long time ago.

What you are actually buying is a meticulously engineered illusion powered by headless chromium, residential proxy networks, and OS-level cryptographic spoofing.

Contents

1. You Are Buying Ghosts

2. The Execution Environment (Headless Browsers on Steroids)

3. Hardware, Cryptography, and OS-Level Spoofing

4. The Routing Layer (Residential & Proxy Engines)

5. Mimicking the Meatbag (Synthetic Telemetry)

6. Catching Prototype Pollution

7. FAQs: Bot Infrastructure & Non-Human Traffic

You Are Buying Ghosts

The game is rigged. You are buying ghosts. Your verification vendors slap a “brand safe” badge on a server farm in St. Petersburg and charge you a premium for the privilege.

Ad-tech is fundamentally complacent. We are bleeding billions of dollars annually to syndicate operators who laugh at our standard JavaScript tags. Wake up.

We aren’t fighting script kiddies in basements anymore. We are fighting heavily funded syndicates running proxy networks and headless Chrome clusters.

Look at the mechanics of the historic 3ve botnet. They compromised over 1.7 million legitimate IP addresses to siphon $29 million in programmatic ad spend.

Or examine the Uber vs. Phunware lawsuit. Massive attribution fraud. App installs were spoofed entirely in the dark with zero actual human interaction. Total budget extraction.

Burn your PDF dashboards. Forget those sanitized 1% IVT reports your agency hands you. You need to look directly at the log-level data (LLD).

When you actually parse the raw request telemetry, the rot is obvious. Here is what we hunt in the trenches:

Synthetic leads passing seamlessly with 0.9 reCAPTCHA v3 scores.
Sudden 400% CTR spikes from obscure ASN blocks routing straight to headless servers.
Impression latency dropping below 10ms on purported 3G mobile networks.

Physics alone tells you it is a lie. If you don’t understand exactly how fraudsters engineer these exploits, you cannot detect and neutralize them. Let’s rip open the infrastructure.

The Execution Environment (Headless Browsers on Steroids)

Dump the cURL scripts. The 2014 era of sending raw HTTP GET requests from a Python script is dead. If you track the evolution of headless browsers in ad fraud, you realize syndicates use full-fledged automation frameworks now.

They deploy custom builds of Puppeteer, Playwright, and Selenium. These aren’t simple web scrapers. They are chromium instances executing a complete JavaScript payload.

They render the DOM. They execute third-party ad tags. They trigger your verification vendor’s viewability pixel. The system logs it as a human impression, and you pay for it.

Deconstructing the Stealth Plugin

The execution here is surprisingly methodical. Default headless Chrome is loud. It screams its automated nature via exposed API properties.

To bypass this, operators deploy stealth extensions. By deconstructing the Puppeteer stealth plugin, we see a systematic prototype pollution engine at work.

The plugin patches JavaScript prototypes on the fly. It aggressively overwrites the default navigator.webdriver = true flag. It masks the entire automation environment from naive anti-fraud tags.

Network Telemetry Deltas: Hunting the Delta

You cannot catch this in the DOM. You must drop below the application layer. Headless Chrome still leaves distinct network telemetry footprints.

Look at the exact sequence of HTTP headers. A legitimate Chrome browser running on a real consumer device sends headers in a highly specific, deterministic order.

Headless automation tools frequently shuffle this header order. Sometimes they drop headers entirely. We map these anomalies in the log-level data.

Look for these exact technical fingerprints:

Missing Sec-CH-UA-Mobile values on purported mobile traffic.
Incomplete or mathematically impossible Sec-CH-UA-Platform declarations.
TLS fingerprinting (JA3) mismatches. The advertised User-Agent string rarely matches the underlying SSL/TLS negotiation signature of the headless server.


[HTTP Header Delta Extract - Suspected Headless Node]
> GET /ad_request?placement=header HTTP/2
> Host: theauditveteran.com
> User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
> Accept-Language: [MISSING]
> Sec-CH-UA-Mobile: ?0
> JA3_Hash: 3b5074b1b5d032e5620f69f9f700ff0e [MATCH: NodeJS / Mismatched with Win10 Chrome]

Exposing Prototype Pollution

Vendors won’t save you. You have to build your own traps. Here is how you detect the stealth plugin’s sloppy tracks.

When fraudsters patch navigator.webdriver, they often do it incorrectly. They inject the property directly onto the navigator instance instead of the prototype, or they fail to spoof the native function string.

Deploy this JavaScript killswitch early in the DOM lifecycle. Execute it before the bid request ever fires.

Silently log the session ID. Flag the IP block. Drop the bid request immediately. Stop feeding the machine.

Hardware, Cryptography, and OS-Level Spoofing

Premium CPMs demand premium devices. Fraudsters do not get paid $15 CPMs for traffic originating from a headless Linux server. They get paid for targeting iPhone 15 Pros and high-end gaming rigs.

How do they bridge the gap? They fake the silicon. They lie directly to the browser APIs.

Faking the Silicon (WebGL & Canvas)

This is pure digital forgery. They intercept the Canvas API and inject mathematical noise into graphic rendering layers.

Anti-fraud vendors hash this canvas data to fingerprint the device. By injecting noise, the bots generate a highly unique, seemingly human hash every single time.

They also attack the GPU layer. Operators spoof WebGL vendor strings via WEBGL_debug_renderer_info. A cheap cloud server will brazenly echo back “Apple GPU” or “NVIDIA RTX 4090” instead of “Mesa OffScreen”.

We expose the lie by mapping the render latency deltas. Real GPUs render complex WebGL scenes in milliseconds. CPU-bound headless servers spoofing a GPU take infinitely longer.

API Hijacking and the Hardware Lie

Bot operators systematically manipulate navigator.hardwareConcurrency and navigator.deviceMemory. They scale these numbers to perfectly match the forged User-Agent.

They even hijack the Battery Status API. A headless server sitting in a Frankfurt datacenter suddenly reports a discharging battery at 68%. It is a meticulously crafted lie to validate the mobile device spoof.

AudioContext and Font Scrambling

Static bot fingerprints get banned quickly. Fraudsters need entropy. They must generate unique but valid device hashes per session to keep bleeding your budget.

They systematically enumerate system fonts. They inject dynamic, invisible audio noise into the AudioContext API. Every single bid request looks like a unique, brand-new user with a slightly different hardware configuration.

The Ultimate Disguise: OS-Level Evasion

This is the apex predator of ad fraud. Standard JavaScript tags cannot see this layer. We must drop down and analyze the TCP/IP stack.

The architecture of this bypass involves patched cryptographic libraries like uTLS or BoringSSL. They intercept the network handshake at the operating system level.

A Debian Linux server mathematically forces its network packets to broadcast the exact JA3 and JA3S TLS signatures of iOS Safari.

Your verification vendor sees an iPhone. The network gateway sees an iPhone. It is a Linux box sitting in an OVH datacenter stealing your clients’ money.

Calling the Hardware Bluff

Fraudsters are lazy. They successfully spoof the User-Agent and the WebGL renderer, but they frequently forget the physical constraints of the hardware they claim to be emulating.

Deploy this WebGL fingerprinting trap. It cross-references the claimed operating system with the unmasked graphic renderer. Software-based renderers on purported premium Apple devices are an immediate red flag.

Do not trust the User-Agent. Interrogate the hardware. If the silicon lies, kill the session.

The Routing Layer (Residential & Proxy Engines)

IP banning is dead. Blacklisting known datacenter ranges is a relic of 2015. You are fighting distributed infrastructure now.

Fraudsters do not buy cheap AWS servers to route ad requests anymore. They buy access to residential proxy networks. They hijack legitimate devices through “free” VPN apps, bundled SDKs, and IoT malware.

The 3ve botnet mastered this architecture. They did not just brute-force requests from a central server. They routed their synthetic traffic through over a million infected residential IPs. Your verification vendor saw a verified Comcast connection in Ohio. You paid the premium.

Mobile Proxy Farms and the CGNAT Shield

Mobile proxy traffic is the holy grail for ad fraud syndicates. Fraudsters exploit Carrier-Grade NAT (CGNAT) to create an impenetrable shield.

Under CGNAT, mobile carriers route thousands of legitimate human users through a single public IPv4 address. Fraudsters deploy malware on a handful of these devices.

If you ban that IP, you are blinding yourself. You just blocked 5,000 real Verizon customers to stop one bot. Rate limiting breaks down completely. The fraudsters simply rotate the mobile IP every 30 seconds, maintaining a pristine reputation score while bleeding your budget.

Threat Intel: Datacenter vs. ISP Proxies

Not all proxies are created equal. Datacenter IPs are cheap and dirty. They are easy to flag via their Autonomous System Number (ASN).

Fraudsters adapted rapidly. They shifted to ISP proxies—often called “sneaker proxies” in underground forums. These are hosted in datacenters but are registered under legitimate consumer ASNs like AT&T or Spectrum. They offer the blazing speed of a datacenter with the trust score of a residential home.

We hunt for anomalies in the log-level data. Look for a sudden 600% spike in impression volume from a single consumer ASN between 2 AM and 4 AM local time. Humans sleep. Proxies do not.

Network Detection: MTU Mismatches and TTL Drops

Application-layer JavaScript tags are blind to proxy chains. You must drop down to the TCP/IP stack to catch the routing anomalies.

We use passive TCP/IP stack fingerprinting tools like p0f. We analyze the Time-to-Live (TTL) and Maximum Transmission Unit (MTU) of the incoming network packets.

Windows devices have a default TTL of 128. Linux defaults to 64. If a User-Agent claims to be a Windows 11 desktop, but the packet arrives with a starting TTL mathematically tracing back to 64, you have a proxy. The OS is lying.

VPNs and proxy tunnels also add cryptographic overhead. This frequently forces the MTU below the standard 1500 bytes. Mismatched MTUs and anomalous TTL values are the bleeding edge of network forensics.

Catching Velocity Anomalies

You cannot rely on static IP blacklists. You must build velocity-based detection models directly into your data pipeline.

Deploy this PySpark logic against your LLD. It flags impossible impression velocity from specific IP and User-Agent combinations within microscopic time windows, completely bypassing the CGNAT shield.

Isolate the velocity. Strip away the CGNAT cover. Block the specific device footprint, not the entire carrier IP. Stop targeting shadows.

Mimicking the Meatbag (Synthetic Telemetry)

Verification vendors only look for “is the mouse moving.” Fraudsters know this. Basic mousemove event listeners are a joke.

Bots do not move the cursor linearly anymore. They simulate human motor functions mathematically. They leverage Bezier curves. They emulate the exact deceleration patterns of human muscle activity. They randomize velocity.

They even insert pseudo-random delays between keystrokes to bypass biometrics. They pass your behavioral models without breaking a sweat.

Viewability Theater and Scroll Fraud

They target Made For Advertising (MFA) sites hungry for viewability metrics. Bots “scroll” through thousands of words of scraped content.

It is viewability theater. They simulate natural reading speeds.

Crucially, they pause directly over the ad slots. They trigger the Intersection Observers. They maximize Moat and Active View scores while the underlying impression is 100% synthetic. You pay for “guaranteed viewability.” You bought a ghost.

Polluting Analytics and GA

Non-human traffic avoids being flagged by the obvious 100% bounce rates. They pollute your funnel data for attribution.

They emit synthetic heartbeats. Automated pingbacks. They randomize interaction events to pollute Google Analytics and CRM systems.

They fake “dwell time” to legitimate the session, making subsequent forensic auditing exponentially harder when the lead conversion rate inevitably crashes.

The Apex Predator: Session Replay Attacks

This is the hardest variant to catch. Standard behavioral modeling fails completely against it.

It is pure telemetry hijacking. They record actual human interactions—the precise pixel coordinates of touches, scrolls, and clicks—from real users on compromised apps or SDKs.

They replay this raw data across thousands of headless sessions on the server side. Your AI detection models see perfect human physical behavior.

Exposing Impossible Scrolling

A human rarely scrolls down a page with perfect mathematical linearity. We always exhibit minor horizontal drift (X-axis variance) when moving vertically on a trackpad or smartphone screen.

Bots simulating scrolling through simple scripting often have zero variance. They are perfect. You need to identify this impossible precision in the LLD.

Blue teams do not trust reported scroll events. You analyze the telemetry data stream in your logs for robotic timing. Deploy this SQL query against your log-level viewability data to flag robotic pause patterns that manipulate Intersection Observer metrics.

When the variance is near zero, it is a script. Flag the session. Block the ASN. Do not negotiate with ghosts.

Catching Prototype Pollution

Vendors sell black boxes. Black boxes fail. You need to own your telemetry. Stop relying on outsourced JavaScript that syndicate developers reverse-engineered three years ago.

This is where the exploit becomes truly insidious. They do not just delete the webdriver flag. They build a synthetic echo chamber.

They use JavaScript Proxy objects to intercept your verification vendor’s telemetry scripts. When a vendor tag asks for the browser’s hardware capabilities, the bot intercepts the call. It feeds a mathematically perfect, pre-calculated spoof back to the tag. They pollute the prototype chain entirely.

Look at the mechanics of the Methbot operation. They fabricated entire DOM environments inside Node.js. They mocked the exact prototype chains of Chrome on MacOS. They bled $3 to $5 million a day because standard anti-fraud tags were talking to phantoms.

Fingerprinting the Phantom

When a bot operator pollutes a prototype, they have to cover their tracks. If they overwrite a native browser function, calling .toString() on that function will expose their custom JavaScript instead of returning the expected [native code].

To hide this, stealth plugins hijack Function.prototype.toString. They force the browser to lie about its own functions.

You will see the anomalies in your log-level data if you know what to look for:

Absolute 0.00ms execution latency on complex DOM environment queries. Real hardware takes microseconds. Phantoms reply instantly.
100% viewability scores from devices that fail basic native code stringification checks.
Synthetic leads coming from browsers where the Object.getOwnPropertyDescriptor returns a modified getter for basic APIs.

The Code: Exposing the Lie

To break the illusion, you have to interrogate the runtime environment itself. Do not ask the browser for its properties. Test the toString method to see if the polygraph is rigged.

Deploy this deep prototype pollution trap. It bypasses the surface-level spoofing and directly checks if the foundational stringification methods of the JavaScript engine have been hijacked.

If the environment is polluted, drop the bid. Kill the session.

The ad-tech ecosystem relies on a gentleman’s agreement that the browser is telling the truth. The fraudsters broke that agreement a decade ago. It is time you started treating every impression as hostile until cryptographically proven otherwise. Protect your budget. Hunt the bots.

FAQs: Bot Infrastructure & Non-Human Traffic

Why do standard viewability tags miss headless Chrome bots?

Vendors rely on surface-level JavaScript APIs. Fraudsters use stealth plugins to pollute prototypes and spoof these exact APIs, feeding fake telemetry directly to your verification scripts.

How do botnets bypass carrier-grade NAT (CGNAT) IP blocking?

Syndicates hijack real mobile devices using malware SDKs. They route synthetic traffic through legitimate 4G/5G carrier IPs. Banning the IP blocks thousands of real humans, breaking rate limits.

Why is my stealth plugin detection code failing in production?

You are checking surface properties instead of execution context. Sophisticated operators intercept native toString methods and proxy your queries. You must validate the underlying stringification layer.

How do bots fake human mouse movements and scroll behavior?

Basic bots use linear scripts. Elite syndicates inject mathematical noise using Bezier curves to emulate muscle deceleration, or they simply replay hijacked telemetry from real human sessions at scale.

Dai Luong Ngo - Blue Team Engineer with 12+ years in programmatic ad fraud prevention. Reverse-engineering botnets and invalid traffic to protect enterprise ad spend. Verify Credentials on LinkedIn