Integrating Stat Proxies with ScraProxy

By Nicholas St. Germain —

What Is ScraProxy

ScraProxy is a project developed by fabienvauchelles that allows developers to effectively rotate proxy IPs on each request, without going through the hassle of writing fragile rotation logic or internal services. Traditionally, you'd need to import a list of proxies from a text file and track which proxy IP last served the previous request. ScraProxy removes this headache by creating a single super proxy that you can pass with its credentials as an environment variable at runtime.

At a glance, the main benefits of using ScraProxy are:

  • Ban minimization - Smartly rotate requests through different proxy IPs to minimize bans.
  • Request success rate - Automatically remove poorly performing proxies from your super proxy pool when a proxy repeatedly fails.
  • Infrastructure simplification - A plethora of hosting choices and configuration options are available to you.
  • Intuitive - The fundamental reason why ScraProxy was created is a tale as old as time and something most developers find relatable.

Deployment Walkthrough

We're going to do a quick deployment tutorial using the Docker option to understand the powerful features you can leverage when using Stat Proxies residential static proxy infrastructure. We're going to deploy this in a developer environment locally, so we'll leave out the flag NODE_ENV=production that reduces the volume of logs generated by the container.

docker run -d -p 8890:8890 -p 8888:8888 \
  -e AUTH_LOCAL_USERNAME=admin \
  -e AUTH_LOCAL_PASSWORD=password \
  fabienvauchelles/scrapoxy

Upon successful deployment, the ScraProxy service will be available on port 8890, which you can navigate to by going to localhost:8890. You'll be greeted by a login page, prompting you for the username and password we set at the Docker container deployment. In this case, our username is admin and our password is password. Please do not use these credentials in a production deployment.

If all goes well, you'll see a new project page when you first login.

New Project Configuration Settings

Before getting too overwhelmed, let's go step by step and understand what we're being asked here.

  • Name - The user-defined project name. Set this to whatever you want.
  • Username - The authentication username required for proxy authentication when making requests.
  • Password - The authentication password required for proxy authentication when making requests.
  • Renew token - A button to click for renewing the username and password.
  • Minimum proxies - The minimum number of proxies that should be online when the project status is set to CALM.
  • Auto Rotate Proxies - An option to enable automatic rotation of proxies at random intervals within a specified delay range.
  • Auto Scale Up - An option to automatically switch the project status to HOT and start all proxies upon receiving a request.
  • Auto Scale Down - An option to automatically switch the project status to CALM and stop all proxies if no requests are received after a specified delay.
  • Intercept HTTPS requests with MITM* - An option to enable ScraProxy to intercept and modify HTTPS requests and responses.
  • Certificate* - A CA certificate to install to avoid security warnings in browsers or scrapers.
  • Keep the same proxy with cookie injection* - An option to enable ScraProxy to inject a cookie to maintain the same proxy for a browser session (sticky cookie).
  • Override User-Agent* - An option to enable ScraProxy to override the User-Agent header with a value assigned to a proxy instance, ensuring all requests made with that instance have the same User-Agent header.

Options marked with * will require you to install a CA certificate on the client device using the ScraProxy super proxy.

Setting Up the Provider

Since we're going to be using our Stat Proxies Static Residential Proxy list, we'll choose the Proxy List option. Click "Create" and you'll be prompted with a few more configuration options.

You'll be prompted to set a name for this provider's credentials. Since we're using a predefined list of proxies, our credentials here are not going to be used. However, let's call it "Stat Proxy List".

Creating a Connector

After creating our pseudo credentials we'll be brought to a new connector page. This new connector page will prompt us with a few different key options that we'll need to set.

  • Credential - The credential to use for the connector, selected from the list of available credentials.
  • Name - A unique identifier for the connector within the project.
  • # of proxies - The maximum number of proxies that the connector can provide and that you intend to use.
  • Proxies Timeout - The maximum duration for connecting to a proxy before considering it as offline.
  • Proxies Kick - An option to enable the removal of a proxy from the pool if it remains offline for a duration exceeding the specified value. This value must be greater than the Proxies Timeout.

After saving the settings, ScraProxy performs a validation test to ensure the entered configuration is valid. The list of configured connectors is then displayed.

Adding the Stat Proxies Proxy List

Now, we'll need to head over to our Stat Proxies client dashboard to grab our proxy list. In this case we're using a 25 pack of Stat Captcha Proxies (they are the best datacenter captcha proxies with the highest Google ReCaptcha score).

Prior to proceeding, we'll be making some modifications to this Proxy List Connector, so the connector must be disabled / off. We can make the desired modifications by clicking the configuration option, represented by the 3 vertical buttons on the right hand side of the Stat Proxy List Connector. Continue by clicking the "Update" option in the drop down.

We'll want to scroll down to the Proxies section. Specifically, we'll want to copy and paste the Stat Captcha Proxy List from the dashboard into the "Add new proxies" text area. We can ignore our sources since this list is static (does not change).

Push the plus (+) button adjacent to the proxy list text area, which will create a table with all of our proxies. ScraProxy will validate that all the proxies we've set are active, and will prompt us to remove any inactive proxies automatically with a single power button option.

Flipping the Switch

Once we've added, appended, and validated our proxies, we can click the "Back" button to go back to the Connector List Page. We're now ready to flip on our ScraProxy super proxy. Flip the switch to on and head over to the Settings page to grab our credentials.

By default, ScraProxy provides a username:password authentication mechanism to route traffic through our Stat Captcha Proxies. The Docker container will also have port 8888 open as the proxy port. In effect, this means our ScraProxy will look something along the lines of:

localhost:8888:u9eww37vz5jqnhauz7zxgp:7ca2kuhk9prpd64ypggtj

Where:

  • localhost is our reverse proxy IP
  • 8888 is our proxy port
  • u9eww37vz5jqnhauz7zxgp is our reverse proxy username
  • 7ca2kuhk9prpd64ypggtj is our reverse proxy password

Coverage and Statistics

After a little bit of time, ScraProxy will populate both the Proxy side nav bar page along with the Coverage page. Here we can check out some high level statistics about our proxy infrastructure.

Concluding Thoughts

There are so many more cool features that you can explore with ScraProxy: from the rich API controls, to the MITM configuration, and even mixing different proxy providers. But we'll stop ourselves here.

If you're interested in combining the power of ScraProxy with Stat Proxies Static Residential ISP Infrastructure, pick up a pack today, and take a closer read at ScraProxy's documentation.