Testing Methodology

v2.1

Every score on ProxyStats is generated by automated, reproducible tests run from our own infrastructure across multiple geographic regions. No provider pays for placement — rankings are based purely on performance data. See Limitations & Disclosures for what this benchmark can and cannot tell you.

Scoring Formula

Each provider receives a composite score from 0 to 100, calculated as a weighted sum of five independently measured dimensions. The weights below are a design choice — they reflect "general purpose" scraping use. For use-case-specific rankings (Google SERP, Amazon, Social Media, etc.) with re-weighted formulas, see our /best/ pages.

score_v2.1 = core_network × 0.35 + session_reliability × 0.30
+ neutral_reach × 0.15 + target_reach × 0.20

v2.1 (2026-05-29): geo_integrity removed from composite (broken metric — country_match was 0% for all providers due to incomplete geo enrichment). Returns in v2.2 with multi-country probes. Full methodology changelog below.

Core Network Performance

Plane A35%

Measures fundamental proxy connectivity: TCP connect time, TLS handshake duration, time to first byte (TTFB), and timeout rate. Probes hit our own controlled endpoints every 15 minutes. As of v2.1 (2026-05-29), this score uses ONLY Plane A (controlled-plane) data — previously it cross-counted neutral and high-defense probes, inflating its weight.

Metrics: connect_p50, connect_p95, tls_p50, tls_p95, ttfb_p50, ttfb_p95, timeout_rate, controlled-plane success_rate.

Session Reliability

Plane A30%

Tests how well sticky sessions maintain the same exit IP over time. We open a session, make 2–4 requests over the session TTL (60s in current version; longer TTLs ship in Phase 43.7), and check if the IP rotates unexpectedly. High unexpected rotation indicates an unstable or overloaded IP pool.

Metrics: session_survival_rate, unexpected_rotation_rate. Heavy penalty (×3) for unexpected mid-session IP changes — score floors at 0 when rotation ≥ 33%. burst_failure_rate parameter retained but not measured until Phase 43.7.

Geo Integrity

Plane A0% (in revision)

REMOVED FROM COMPOSITE IN v2.1. The country_match_rate metric was 0% for all providers due to incomplete geo enrichment (only ~10–20% of probes were calling geo lookup) and missing available_countries data. Rather than carry a broken metric in the composite, we removed it until properly measurable. Multi-country probes ship in Phase 43.5.7, after which this dimension returns with real measurement.

Planned v2.2 metrics: per-country requested_vs_observed match rate, city consistency, ASN consistency. Pilot scope: Maskify (5 countries — US/DE/FR/JP/BR).

Neutral Reachability

Plane B15%

Tests ability to reach neutral third-party services like httpbin.org and Cloudflare speed test. These targets don't actively block proxies, so failures here indicate fundamental connectivity issues rather than anti-bot detection. Weight raised from 10% to 15% in v2.1 (redistribution after geo removal).

Metrics: neutral_success_rate, neutral_avg_ttfb_ms. Bonus for TTFB < 500ms, penalty for > 2000ms.

Target Reachability

Plane C20%

Periodically samples high-defense targets (Google, Amazon) with strict safety rules. As of 2026-05-27, response bodies are scanned for CAPTCHA / challenge / interstitial markers — HTTP 200 with a soft-block page is classified as challenge_or_interstitial, not success. Weight raised from 10% to 20% in v2.1 — this is the most user-relevant dimension for anti-bot-protected scraping.

Metrics: target_deny_rate, target_challenge_rate. Runs every 6 hours with 24h per-IP cooldowns. ~120 samples/month/provider. Body inspected for ~15 known anti-bot markers; only first 50 KB scanned; body content never stored or logged.

Three-Plane Testing Architecture

Unlike traditional benchmarks that hammer target sites directly, we use a layered approach that separates controlled testing from real-world reachability measurement.

A

Controlled

Every 15 minutes

Targets: Our own endpoints (/healthz, /reflect, /tiny, /download)
Purpose: Core latency, TLS timing, session tests, geo verification
B

Neutral

Every 15 minutes

Targets: httpbin.org, Cloudflare speed test
Purpose: External reachability without anti-bot interference
C

High-Defense

Every 6 hours

Targets: Google, Amazon
Purpose: Anti-bot bypass capability (safe sampling only)

Test Infrastructure

Vantage points2 probe regions — EU-Central (Frankfurt) + US-East. Storage on a separate main server (no probe traffic).
Plane A frequencyEvery 15 minutes (~96 cycles/day) — probes write immediately to probe_runs table
Plane C frequencyEvery 6 hours with 24h per-IP cooldown (stop-after-deny)
Composite recomputeOnce daily via rollup task at 00:05 UTC — dashboard values change once per day
Worker isolationDedicated Docker containers with egress-only networking
SSRF protectionDNS pre-resolution, CIDR blocklisting, network segmentation
Result integrityHMAC-SHA256 on all numeric metrics
Data retention90 days of granular probe data, daily rollup aggregation
Confidence levelsLow (<10 probes), Medium (10–50), High (>50 probes/day)

Data Pipeline

probe_runs
daily rollup
leaderboard

Raw probe results are stored individually as they arrive (every 15 minutes for Plane A/B, every 6 hours for Plane C). The rollup task runs once daily at 00:05 UTC and aggregates the last 24 hours of probes per provider per region into one row in rollups_daily. The composite score on the leaderboard is recomputed at rollup time — dashboard values change once per day, not every 15 minutes. Each rollup includes a confidence badge based on probe count.

Limitations & Disclosures

What this benchmark can and cannot tell you. We publish this section explicitly because most "best proxy" lists hide their methodology gaps.

Vantage point bias. Probes originate from EU-Central (Frankfurt) and US-East. We do not test Asia or LATAM exit performance — a provider with a strong APAC pool may underperform on our metrics through no fault of their own.

Limited provider coverage. We currently benchmark 3 active providers (Maskify, Aceproxies, GonzoProxy). Larger providers (Bright Data, Oxylabs, Smartproxy) are not yet tested — pending business relationship / budget. We do not extrapolate scores to untested providers.

Plane C sample size. Anti-bot target probes run every 6 hours = ~120 samples per provider per month. Single-digit percentage differences in Google success rate may be statistical noise. Treat the ranking as more informative than the absolute number.

HTTP 200 ≠ usable success (now mitigated). As of 2026-05-27, Plane C responses are scanned for CAPTCHA / challenge / interstitial markers; soft-blocked HTTP 200 responses are classified as challenge_or_interstitial and excluded from success rate. Marker list is conservative — some subtle anti-bot variants may still slip through. Pre-2026-05-27 rollups used the old definition; comparing across that date is not apples-to-apples.

Session TTL coverage. Session reliability tests use a 60-second TTL window. Real-world session lifetimes (5–30 minutes for login flows) are not yet directly tested. Longer-TTL probes are planned.

IP uniqueness window. Uniqueness ratios are computed within the daily rollup window. A pool that rotates every few hours may show high uniqueness in our metric but have a smaller effective pool than the number suggests.

Composite weights are subjective. The 30/30/20/10/10 split reflects our judgment for "general purpose" use, not a universal optimum. If your workflow is SERP-heavy, use the Google-specific score on the /best/ pages instead.

What we explicitly do NOT test. We do not probe LinkedIn, X, Instagram, Booking, Expedia, Zillow, Realtor, or Redfin. These targets prohibit automated access in their ToS, and concentrated probing from our infrastructure could fingerprint and flag every provider we test simultaneously (JA4 fingerprint cascade). See our Path C commitment for the full rationale.

Score Methodology Changelog

Public versioning of every change to the scoring formula. We publish these the day a change ships, with rationale and impact. If you ever wonder "why did this score move?" — this is the answer.

v2.12026-05-29current

Honest self-audit while preparing for our Reddit launch. Found three structural issues in the composite formula and fixed them publicly before going live.

  • Fixed double-counting in core_network. Previously, success_rate in core_network_score was computed across all planes — same probe could feed into both core_network AND neutral_reachability dimensions. Now uses controlled-plane (Plane A) data only.
  • Removed geo_integrity from composite. country_match_rate was 0% for all providers due to incomplete geo enrichment and missing declared-country data. Returns in v2.2 with multi-country probes (Phase 43.5.7).
  • Reweighted composite from 30/30/20/10/10 to 35/30/0/15/20after geo removal. Target Reachability (Plane C anti-bot) gained the most — it's the most user-relevant dimension.
  • Cleaned session_reliability signature. burst_failure_rate parameter was a placeholder, never measured. Documented and defaulted to 0 until Phase 43.7 ships extended-TTL session probes.
  • Historical rollups recomputed for the last 30 days so Score History charts reflect the new formula across the whole period. Pre-v2.1 backup retained.

Observable impact: composite scores went up across all active providers by ~14–16 points (correcting for the broken geo penalty). Ranking unchanged.

v2.02026-04

Initial Architecture 2.0 composite: five-dimension weighted score (30/30/20/10/10), three-plane testing architecture (Controlled / Neutral / High-Defense), HMAC-signed worker→backend pipeline.

White-Hat Ethical Framework

No pay-to-play. Providers cannot pay for better rankings or placement. Scores are generated purely from automated tests.

Safe target sampling. We do not aggressively scrape Google or Amazon. Plane C uses strict cooldowns (6h initial, 24h escalated) and stop-after-deny to prevent IP reputation damage.

Per-IP cooldowns. Each exit IP that receives a deny or challenge is placed on cooldown, preventing repeated probing of the same IP on protected targets.

Full transparency. Our scoring formula, weights, and methodology are published publicly. The three-plane architecture ensures we measure real proxy quality without causing harm.

Tamper-proof results. Each probe is signed with HMAC-SHA256. Any modification to stored results is detectable.

Multi-region fairness. All providers are tested from identical infrastructure in EU and US regions, eliminating geographic bias.

Questions about our methodology?

Reach out to us on X (Twitter) or Telegram. We're happy to explain any aspect of our testing process in detail.