Proxies for Web Scraping

Web scraping works best when the proxy setup matches the target site's sensitivity, request volume, and session behavior. The wrong proxy type leads to early CAPTCHAs, rate limits, or unstable results long before the scraper logic itself fails.

Guardrail: keep the exact endpoint from your portal and change only the username controls when you need rotation, sticky sessions, or country targeting. Public docs use placeholders like <ROTATING_HTTP_ENDPOINT> because live host and port values are account-specific.

What web scraping needs from a proxy

Route rotation so repeated requests do not all come from one IP.
Sticky session controls for pagination, carts, or multi-request flows that must stay consistent.
Residential or mobile identity when strict targets challenge datacenter traffic.
Predictable auth and retry behavior that works cleanly with common Python and Node.js clients.

Which NinjaProxy product fits and why

Proxy setup	Best for	Why
Rotating residential	General-purpose scraping on stricter sites	Rotating residential traffic is the best default when a target blocks obvious datacenter requests.
Sticky residential session	Pagination, search flows, and scraping that must preserve one route briefly	Username session controls keep one identity stable while the scraper completes a bounded sequence.
Datacenter	Fast, lower-cost scraping on lenient targets	Datacenter routes are useful when the site is tolerant and throughput matters more than stealth.

Working code example

This Python example uses a rotating gateway endpoint and pins a residential route long enough to finish one scraping batch without guessing hostnames, ports, or undocumented parameters.

import requests

TARGET_URL = "https://example.com/category/widgets"
PROXY = "http://<USERNAME>--session-scrape-batch-01--duration-120--provider-res:<API_KEY>@<ROTATING_HTTP_ENDPOINT>"

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; NinjaProxyDocsExample/1.0)",
})

response = session.get(
    TARGET_URL,
    proxies={"http": PROXY, "https": PROXY},
    timeout=20,
)
response.raise_for_status()

print(response.text[:500])

Common failure modes

Failure	Likely cause	Fix
CAPTCHAs or block pages	Datacenter traffic is too easy for the target to detect.	Move the workflow to residential traffic and reduce burstiness per route.
Repeated 429 or 403 responses	Too many requests are landing from one session or IP.	Rotate sessions more often, lower concurrency, and stagger retries.
Same IP for every request	You are using an assigned/static endpoint or reusing one session token.	Use a rotating gateway and remove or vary the `--session-...` token when you need fresh routes.
407 authentication errors	Malformed credentials or missing URL encoding.	Re-copy the username and API key, and percent-encode reserved characters if needed.

Related docs

Python integration for `requests`, `httpx`, `aiohttp`, Scrapy, and Playwright examples.
Authentication for username + API key, whitelist mode, and rotating username controls.
Troubleshooting for 407s, timeouts, sticky-session mistakes, and block-response triage.
Rotating proxies for session controls, provider selection, and route overrides.
Rate Limits for concurrency and retry guidance.

Getting Started

Quick Start Authentication Troubleshooting Rate Limits MCP

Proxy Types

Residential Datacenter Mobile Rotating

Integrations

curl Python Node.js Go

Use Cases

AI Agents Web Scraping Browser Automation Price Monitoring Ad Verification

API Reference

All Endpoints Auth Account Usage Proxy Billing

Use these docs with AI

Start with the AI guide or hand llms.txt to your assistant.

Use with AI

llms.txt