Detecting VPN, Proxy, and Datacenter Traffic in 2026: A Pragmatic Guide for Affiliate Publishers

Roughly 8–14% of inbound web traffic in regulated affiliate verticals comes through some kind of intermediary — a consumer VPN, a residential proxy network, a datacenter, or a mobile carrier exit node that geo-locates wrong. For most monetization paths this is fine; for affiliate inventory specifically it has direct revenue and compliance consequences. This article covers what each type of intermediary actually represents, how to detect them at the edge without adding latency, and how to make the allow / deny / downweight decision in a way that holds up to advertiser scrutiny.

It's written for publishers and affiliate platform engineers who are tired of binary "block all VPNs" advice that costs them legitimate revenue.

The four populations you actually have to deal with#

Lumping all "non-direct" traffic together is the most common mistake. There are four real populations and they behave differently:

1. Consumer VPNs (NordVPN, ExpressVPN, Surfshark, ProtonVPN, Mullvad, etc.). Real users, often in their own country, using a VPN for privacy or to access region-locked content. Geo-located IP may not match the user's actual country. Conversion rate is typically near baseline. Fraud rate is low.

2. Residential proxies (Bright Data, Oxylabs, Smartproxy, etc., resold through dozens of layers). A mix of legitimate users (who don't know their device is part of a proxy network) and bot operators paying to route requests through residential IPs. Conversion rate is highly variable — the legitimate users convert normally, the bot traffic doesn't convert at all. Hardest population to make a binary decision about.

3. Datacenter traffic (AWS, GCP, Azure, Hetzner, OVH, DigitalOcean, etc.). Almost always non-human. Scrapers, monitoring services, AI training crawlers, security scanners. Conversion rate near zero. Fraud rate per impression is high but most are not maliciously trying to game your affiliate — they're just bots.

4. Mobile carrier traffic with bad GeoIP. Real users on real phones whose IP geo-locates incorrectly because the carrier's IP block is registered in a different country than the actual cell tower. Conversion rate is normal. Geo decisions need to be more forgiving here.

Lumping these together costs you money. A blanket "block all VPNs" rule strips out a meaningful slice of legitimate consumer VPN users in your target geo. A blanket "allow all" rule lets datacenter scrapers eat your impression budget.

Detection at the edge: what actually works#

For each of the four populations, the detection signals are different.

Consumer VPNs#

IP database lookup against a maintained VPN provider list. Most major VPN providers publish their exit-node IP ranges (or they leak through public databases like IP2Location, MaxMind's GeoIP2 Anonymous IP, IPQualityScore, or Spur). Match incoming IPs against a daily-refreshed list. Latency is a single hash lookup — well under 1ms.
TCP/IP fingerprint checks are useful but mostly redundant once you have a good IP list.
Header inconsistencies (e.g., Accept-Language says German, IP geo-locates to a US VPN exit) are a soft signal — useful for downweighting confidence, not for hard blocks.

Residential proxies#

Reputation scoring services. This is the one population where you almost certainly need a third-party feed. IP-level reputation (Spur, IPQualityScore, Fingerprint Pro, MaxMind minFraud) gives you a confidence score per IP that's much more accurate than a static list because residential proxy networks rotate constantly.
Behavioral signals. Mouse movement, scroll patterns, time-on-page, and form-fill cadence all separate humans from bots. These are more reliable than IP signals for residential proxies, but they require you to wait for the user to actually interact before you can score them.
Per-impression dedup. If the same offer impression is served to 50 different IPs in 90 seconds with identical user-agent strings, you're looking at a scraper. Track this in a sliding window in your edge worker.

Datacenter traffic#

ASN lookup. The simplest, fastest, most reliable signal. Look up the ASN of the incoming IP against a list of known datacenter ASNs (AWS = 16509, Google Cloud = 396982 and others, Azure = various, etc.). MaxMind's free GeoLite2 ASN database is sufficient for this.
Reverse DNS sanity check. Datacenter IPs almost always have rDNS pointing back to the cloud provider. Real consumer ISPs don't.
TLS fingerprint (JA3/JA4). Headless Chrome, headless Firefox, and most scraping libraries have distinctive TLS fingerprints. JA4 is the modern variant; if you're behind Cloudflare, you get this for free in request headers.

Mobile carriers with bad GeoIP#

Don't try to fix it at the IP layer. You won't.
Use higher-confidence signals at the user level. Browser locale, timezone, language preferences, and HTTP Accept-Language header give you more accurate geo than the IP for mobile users. A 2026-appropriate stack treats these as primary inputs and IP as a tie-breaker.

Allow / deny / downweight: the decision matrix#

For affiliate inventory, the right policy varies by population:

Population	Default action	When to override
Consumer VPN, geo matches advertiser target	Allow	Block only if advertiser explicitly requires "no VPN" inventory
Consumer VPN, geo mismatches advertiser target	Downweight	Show offers for the detected geo if you have offers there; otherwise show generic content
Residential proxy, low reputation score	Deny	Allow if behavioral signals (real interaction) score high
Residential proxy, high reputation score	Allow with monitoring	Block specific IPs that show repeat-impression fraud patterns
Datacenter	Deny for impression billing	Allow for free / unsponsored content; many cloud-hosted browsers (e.g., browser-in-browser products) are real users
Mobile carrier with geo mismatch	Allow with corrected geo	Use locale + timezone to override IP-based geo

The key insight is that denying is the wrong default for everything except datacenter and confirmed-fraudulent residential proxies. Most VPN traffic is monetizable. The job of the policy is to route it correctly, not to refuse it.

For how this maps to advertiser allowlisting in AffilFinder specifically, see Configuring geo rules.

Latency budget#

Edge detection has to be fast or you're trading conversion rate for fraud reduction. Practical targets in 2026:

<2ms for IP database lookups (in-memory or KV-backed). Cloudflare Workers or AWS Lambda@Edge with a binary-format database baked into the deployment artifact gets you well under this.
<15ms for third-party reputation API calls. Cache aggressively — most IPs you see today you'll see again within 24 hours, so a short TTL cache (10–60 minutes) catches >80% of repeat lookups.
<5ms for ASN lookup. Same as IP database — in-memory.
0ms for TLS fingerprint — you get this from your CDN headers, no extra round trip.

Total added latency for a fully instrumented decision: <25ms for cold IPs, <5ms for warm IPs. That's well within the budget for an offer block that has to render quickly. For more on widget performance budgets, see Core Web Vitals and affiliate overlays.

What about advertisers' own anti-fraud requirements?#

Some advertisers (especially in regulated finance and iGaming) have specific contractual requirements about VPN and proxy traffic. The pattern that scales:

Advertiser-specific allow / deny lists in their AffilFinder allowlist configuration. This is the canonical place for "this advertiser doesn't accept VPN traffic" or "this advertiser requires datacenter IPs to be blocked".
Per-impression evidence in the event log. When you serve an impression, log the IP reputation, ASN, and geo confidence so the advertiser can audit later if they question a specific conversion.
Reconciliation friendly to their feed. If an advertiser sends a quarterly "we suspect IPs X, Y, Z are fraudulent" list, you should be able to retroactively flag those impressions and adjust the bill.

For a longer treatment of the fraud-control side, see Affiliate fraud in geo-gated inventory.

What changes in 2026 vs 2024#

Residential proxy pricing collapsed, so the volume of residential proxy traffic on the open web is up roughly 3x in two years. Behavioral signals matter more than they used to because IP-level signals are weaker.
AI training crawlers from datacenter ranges exploded, especially from Google (Gemini), Anthropic (claude-bot), and OpenAI (GPTBot). These are trivial to detect and easy to deny on impression billing — just respect the user-agent.
Apple Private Relay is now significant in iOS traffic. It's not a VPN exactly, but its exit IPs come from Apple's pool and don't match the user's actual ISP. Treat it as "consumer VPN with high reputation" and don't block it — Apple Private Relay users convert normally.
Google's Privacy Sandbox doesn't change IP-level detection, but it does affect how you correlate the same user across visits. See Cookieless affiliate attribution.

Bottom line#

VPN, proxy, and datacenter detection in 2026 isn't a binary problem. The publishers and platforms that get the most out of their affiliate inventory treat it as a routing problem — detect the population, choose the right offer set or policy, log the evidence, reconcile honestly. Blanket bans cost real revenue; blanket allows cost real money to fraud. The middle path requires data, latency discipline, and a clear policy matrix per advertiser.

It's written for publishers and affiliate platform engineers who are tired of binary "block all VPNs" advice that costs them legitimate revenue.

The four populations you actually have to deal with#

Lumping all "non-direct" traffic together is the most common mistake. There are four real populations and they behave differently:

Detection at the edge: what actually works#

For each of the four populations, the detection signals are different.

Consumer VPNs#

IP database lookup against a maintained VPN provider list. Most major VPN providers publish their exit-node IP ranges (or they leak through public databases like IP2Location, MaxMind's GeoIP2 Anonymous IP, IPQualityScore, or Spur). Match incoming IPs against a daily-refreshed list. Latency is a single hash lookup — well under 1ms.
TCP/IP fingerprint checks are useful but mostly redundant once you have a good IP list.
Header inconsistencies (e.g., Accept-Language says German, IP geo-locates to a US VPN exit) are a soft signal — useful for downweighting confidence, not for hard blocks.

Residential proxies#

Reputation scoring services. This is the one population where you almost certainly need a third-party feed. IP-level reputation (Spur, IPQualityScore, Fingerprint Pro, MaxMind minFraud) gives you a confidence score per IP that's much more accurate than a static list because residential proxy networks rotate constantly.
Behavioral signals. Mouse movement, scroll patterns, time-on-page, and form-fill cadence all separate humans from bots. These are more reliable than IP signals for residential proxies, but they require you to wait for the user to actually interact before you can score them.
Per-impression dedup. If the same offer impression is served to 50 different IPs in 90 seconds with identical user-agent strings, you're looking at a scraper. Track this in a sliding window in your edge worker.

Datacenter traffic#

ASN lookup. The simplest, fastest, most reliable signal. Look up the ASN of the incoming IP against a list of known datacenter ASNs (AWS = 16509, Google Cloud = 396982 and others, Azure = various, etc.). MaxMind's free GeoLite2 ASN database is sufficient for this.
Reverse DNS sanity check. Datacenter IPs almost always have rDNS pointing back to the cloud provider. Real consumer ISPs don't.
TLS fingerprint (JA3/JA4). Headless Chrome, headless Firefox, and most scraping libraries have distinctive TLS fingerprints. JA4 is the modern variant; if you're behind Cloudflare, you get this for free in request headers.

Mobile carriers with bad GeoIP#

Don't try to fix it at the IP layer. You won't.
Use higher-confidence signals at the user level. Browser locale, timezone, language preferences, and HTTP Accept-Language header give you more accurate geo than the IP for mobile users. A 2026-appropriate stack treats these as primary inputs and IP as a tie-breaker.

Allow / deny / downweight: the decision matrix#

For affiliate inventory, the right policy varies by population:

Population	Default action	When to override
Consumer VPN, geo matches advertiser target	Allow	Block only if advertiser explicitly requires "no VPN" inventory
Consumer VPN, geo mismatches advertiser target	Downweight	Show offers for the detected geo if you have offers there; otherwise show generic content
Residential proxy, low reputation score	Deny	Allow if behavioral signals (real interaction) score high
Residential proxy, high reputation score	Allow with monitoring	Block specific IPs that show repeat-impression fraud patterns
Datacenter	Deny for impression billing	Allow for free / unsponsored content; many cloud-hosted browsers (e.g., browser-in-browser products) are real users
Mobile carrier with geo mismatch	Allow with corrected geo	Use locale + timezone to override IP-based geo

For how this maps to advertiser allowlisting in AffilFinder specifically, see Configuring geo rules.

Latency budget#

Edge detection has to be fast or you're trading conversion rate for fraud reduction. Practical targets in 2026:

<2ms for IP database lookups (in-memory or KV-backed). Cloudflare Workers or AWS Lambda@Edge with a binary-format database baked into the deployment artifact gets you well under this.
<15ms for third-party reputation API calls. Cache aggressively — most IPs you see today you'll see again within 24 hours, so a short TTL cache (10–60 minutes) catches >80% of repeat lookups.
<5ms for ASN lookup. Same as IP database — in-memory.
0ms for TLS fingerprint — you get this from your CDN headers, no extra round trip.

What about advertisers' own anti-fraud requirements?#

Some advertisers (especially in regulated finance and iGaming) have specific contractual requirements about VPN and proxy traffic. The pattern that scales:

Advertiser-specific allow / deny lists in their AffilFinder allowlist configuration. This is the canonical place for "this advertiser doesn't accept VPN traffic" or "this advertiser requires datacenter IPs to be blocked".
Per-impression evidence in the event log. When you serve an impression, log the IP reputation, ASN, and geo confidence so the advertiser can audit later if they question a specific conversion.
Reconciliation friendly to their feed. If an advertiser sends a quarterly "we suspect IPs X, Y, Z are fraudulent" list, you should be able to retroactively flag those impressions and adjust the bill.

For a longer treatment of the fraud-control side, see Affiliate fraud in geo-gated inventory.

What changes in 2026 vs 2024#

Residential proxy pricing collapsed, so the volume of residential proxy traffic on the open web is up roughly 3x in two years. Behavioral signals matter more than they used to because IP-level signals are weaker.
AI training crawlers from datacenter ranges exploded, especially from Google (Gemini), Anthropic (claude-bot), and OpenAI (GPTBot). These are trivial to detect and easy to deny on impression billing — just respect the user-agent.
Apple Private Relay is now significant in iOS traffic. It's not a VPN exactly, but its exit IPs come from Apple's pool and don't match the user's actual ISP. Treat it as "consumer VPN with high reputation" and don't block it — Apple Private Relay users convert normally.
Google's Privacy Sandbox doesn't change IP-level detection, but it does affect how you correlate the same user across visits. See Cookieless affiliate attribution.

Detecting VPN, Proxy, and Datacenter Traffic in 2026: A Pragmatic Guide for Affiliate Publishers

The four populations you actually have to deal with#

Detection at the edge: what actually works#

Consumer VPNs#

Residential proxies#

Datacenter traffic#

Mobile carriers with bad GeoIP#

Allow / deny / downweight: the decision matrix#

Latency budget#

What about advertisers' own anti-fraud requirements?#

What changes in 2026 vs 2024#

Bottom line#

Ready to monetize blocked traffic?

Related articles

Detecting VPN, Proxy, and Datacenter Traffic in 2026: A Pragmatic Guide for Affiliate Publishers

The four populations you actually have to deal with#

Detection at the edge: what actually works#

Consumer VPNs#

Residential proxies#

Datacenter traffic#

Mobile carriers with bad GeoIP#

Allow / deny / downweight: the decision matrix#

Latency budget#

What about advertisers' own anti-fraud requirements?#

What changes in 2026 vs 2024#

Bottom line#

Ready to monetize blocked traffic?

Related articles