Testing Methodology
v2.1Every score on ProxyStats is generated by automated, reproducible tests run from our own infrastructure across multiple geographic regions. No provider pays for placement — rankings are based purely on performance data. See Limitations & Disclosures for what this benchmark can and cannot tell you.
Scoring Formula
Each provider receives a composite score from 0 to 100, calculated as a weighted sum of five independently measured dimensions. The weights below are a design choice — they reflect "general purpose" scraping use. For use-case-specific rankings (Google SERP, Amazon, Social Media, etc.) with re-weighted formulas, see our /best/ pages.
+ neutral_reach × 0.15 + target_reach × 0.20
v2.1 (2026-05-29): geo_integrity removed from composite (broken metric — country_match was 0% for all providers due to incomplete geo enrichment). Returns in v2.2 with multi-country probes. Full methodology changelog below.
Core Network Performance
Measures fundamental proxy connectivity: TCP connect time, TLS handshake duration, time to first byte (TTFB), and timeout rate. Probes hit our own controlled endpoints every 15 minutes. As of v2.1 (2026-05-29), this score uses ONLY Plane A (controlled-plane) data — previously it cross-counted neutral and high-defense probes, inflating its weight.
Metrics: connect_p50, connect_p95, tls_p50, tls_p95, ttfb_p50, ttfb_p95, timeout_rate, controlled-plane success_rate.
Session Reliability
Tests how well sticky sessions maintain the same exit IP over time. We open a session, make 2–4 requests over the session TTL (60s in current version; longer TTLs ship in Phase 43.7), and check if the IP rotates unexpectedly. High unexpected rotation indicates an unstable or overloaded IP pool.
Metrics: session_survival_rate, unexpected_rotation_rate. Heavy penalty (×3) for unexpected mid-session IP changes — score floors at 0 when rotation ≥ 33%. burst_failure_rate parameter retained but not measured until Phase 43.7.
Geo Integrity
REMOVED FROM COMPOSITE IN v2.1. The country_match_rate metric was 0% for all providers due to incomplete geo enrichment (only ~10–20% of probes were calling geo lookup) and missing available_countries data. Rather than carry a broken metric in the composite, we removed it until properly measurable. Multi-country probes ship in Phase 43.5.7, after which this dimension returns with real measurement.
Planned v2.2 metrics: per-country requested_vs_observed match rate, city consistency, ASN consistency. Pilot scope: Maskify (5 countries — US/DE/FR/JP/BR).
Neutral Reachability
Tests ability to reach neutral third-party services like httpbin.org and Cloudflare speed test. These targets don't actively block proxies, so failures here indicate fundamental connectivity issues rather than anti-bot detection. Weight raised from 10% to 15% in v2.1 (redistribution after geo removal).
Metrics: neutral_success_rate, neutral_avg_ttfb_ms. Bonus for TTFB < 500ms, penalty for > 2000ms.
Target Reachability
Periodically samples high-defense targets (Google, Amazon) with strict safety rules. As of 2026-05-27, response bodies are scanned for CAPTCHA / challenge / interstitial markers — HTTP 200 with a soft-block page is classified as challenge_or_interstitial, not success. Weight raised from 10% to 20% in v2.1 — this is the most user-relevant dimension for anti-bot-protected scraping.
Metrics: target_deny_rate, target_challenge_rate. Runs every 6 hours with 24h per-IP cooldowns. ~120 samples/month/provider. Body inspected for ~15 known anti-bot markers; only first 50 KB scanned; body content never stored or logged.
Three-Plane Testing Architecture
Unlike traditional benchmarks that hammer target sites directly, we use a layered approach that separates controlled testing from real-world reachability measurement.
Controlled
Every 15 minutes
Neutral
Every 15 minutes
High-Defense
Every 6 hours
Test Infrastructure
| Vantage points | 2 probe regions — EU-Central (Frankfurt) + US-East. Storage on a separate main server (no probe traffic). |
| Plane A frequency | Every 15 minutes (~96 cycles/day) — probes write immediately to probe_runs table |
| Plane C frequency | Every 6 hours with 24h per-IP cooldown (stop-after-deny) |
| Composite recompute | Once daily via rollup task at 00:05 UTC — dashboard values change once per day |
| Worker isolation | Dedicated Docker containers with egress-only networking |
| SSRF protection | DNS pre-resolution, CIDR blocklisting, network segmentation |
| Result integrity | HMAC-SHA256 on all numeric metrics |
| Data retention | 90 days of granular probe data, daily rollup aggregation |
| Confidence levels | Low (<10 probes), Medium (10–50), High (>50 probes/day) |
Data Pipeline
Raw probe results are stored individually as they arrive (every 15 minutes for Plane A/B, every 6 hours for Plane C). The rollup task runs once daily at 00:05 UTC and aggregates the last 24 hours of probes per provider per region into one row in rollups_daily. The composite score on the leaderboard is recomputed at rollup time — dashboard values change once per day, not every 15 minutes. Each rollup includes a confidence badge based on probe count.
Limitations & Disclosures
What this benchmark can and cannot tell you. We publish this section explicitly because most "best proxy" lists hide their methodology gaps.
Vantage point bias. Probes originate from EU-Central (Frankfurt) and US-East. We do not test Asia or LATAM exit performance — a provider with a strong APAC pool may underperform on our metrics through no fault of their own.
Limited provider coverage. We currently benchmark 3 active providers (Maskify, Aceproxies, GonzoProxy). Larger providers (Bright Data, Oxylabs, Smartproxy) are not yet tested — pending business relationship / budget. We do not extrapolate scores to untested providers.
Plane C sample size. Anti-bot target probes run every 6 hours = ~120 samples per provider per month. Single-digit percentage differences in Google success rate may be statistical noise. Treat the ranking as more informative than the absolute number.
HTTP 200 ≠ usable success (now mitigated). As of 2026-05-27, Plane C responses are scanned for CAPTCHA / challenge / interstitial markers; soft-blocked HTTP 200 responses are classified as challenge_or_interstitial and excluded from success rate. Marker list is conservative — some subtle anti-bot variants may still slip through. Pre-2026-05-27 rollups used the old definition; comparing across that date is not apples-to-apples.
Session TTL coverage. Session reliability tests use a 60-second TTL window. Real-world session lifetimes (5–30 minutes for login flows) are not yet directly tested. Longer-TTL probes are planned.
IP uniqueness window. Uniqueness ratios are computed within the daily rollup window. A pool that rotates every few hours may show high uniqueness in our metric but have a smaller effective pool than the number suggests.
Composite weights are subjective. The 30/30/20/10/10 split reflects our judgment for "general purpose" use, not a universal optimum. If your workflow is SERP-heavy, use the Google-specific score on the /best/ pages instead.
What we explicitly do NOT test. We do not probe LinkedIn, X, Instagram, Booking, Expedia, Zillow, Realtor, or Redfin. These targets prohibit automated access in their ToS, and concentrated probing from our infrastructure could fingerprint and flag every provider we test simultaneously (JA4 fingerprint cascade). See our Path C commitment for the full rationale.
Score Methodology Changelog
Public versioning of every change to the scoring formula. We publish these the day a change ships, with rationale and impact. If you ever wonder "why did this score move?" — this is the answer.
Honest self-audit while preparing for our Reddit launch. Found three structural issues in the composite formula and fixed them publicly before going live.
- →Fixed double-counting in core_network. Previously, success_rate in core_network_score was computed across all planes — same probe could feed into both core_network AND neutral_reachability dimensions. Now uses controlled-plane (Plane A) data only.
- →Removed geo_integrity from composite. country_match_rate was 0% for all providers due to incomplete geo enrichment and missing declared-country data. Returns in v2.2 with multi-country probes (Phase 43.5.7).
- →Reweighted composite from 30/30/20/10/10 to 35/30/0/15/20after geo removal. Target Reachability (Plane C anti-bot) gained the most — it's the most user-relevant dimension.
- →Cleaned session_reliability signature. burst_failure_rate parameter was a placeholder, never measured. Documented and defaulted to 0 until Phase 43.7 ships extended-TTL session probes.
- →Historical rollups recomputed for the last 30 days so Score History charts reflect the new formula across the whole period. Pre-v2.1 backup retained.
Observable impact: composite scores went up across all active providers by ~14–16 points (correcting for the broken geo penalty). Ranking unchanged.
Initial Architecture 2.0 composite: five-dimension weighted score (30/30/20/10/10), three-plane testing architecture (Controlled / Neutral / High-Defense), HMAC-signed worker→backend pipeline.
White-Hat Ethical Framework
No pay-to-play. Providers cannot pay for better rankings or placement. Scores are generated purely from automated tests.
Safe target sampling. We do not aggressively scrape Google or Amazon. Plane C uses strict cooldowns (6h initial, 24h escalated) and stop-after-deny to prevent IP reputation damage.
Per-IP cooldowns. Each exit IP that receives a deny or challenge is placed on cooldown, preventing repeated probing of the same IP on protected targets.
Full transparency. Our scoring formula, weights, and methodology are published publicly. The three-plane architecture ensures we measure real proxy quality without causing harm.
Tamper-proof results. Each probe is signed with HMAC-SHA256. Any modification to stored results is detectable.
Multi-region fairness. All providers are tested from identical infrastructure in EU and US regions, eliminating geographic bias.
Questions about our methodology?
Reach out to us on X (Twitter) or Telegram. We're happy to explain any aspect of our testing process in detail.