How to Set Up Proxy Servers in C# for Efficient Web Scraping

Web scraping allows you to extract valuable data from websites, but your requests can easily be traced back to your IP address. This raises privacy concerns and can lead to your scraper getting blocked. Proxy servers provide a solution by acting as intermediaries between your application and the web. When you use proxies, websites see the proxy‘s IP address instead of yours.

In this in-depth guide, we‘ll walk through how to implement proxy servers in C# to anonymize and scale your web scraping projects. You‘ll learn to set up a local proxy using mitmproxy, create a console app that rotates through a pool of proxies, and leverage a premium proxy service like Bright Data for even better results. Let‘s dive in!

What are Proxy Servers?

A proxy server is a computer that routes traffic between a client (like your web scraping program) and the servers it wants to reach. When you send a request through a proxy, it forwards it to the target server as if the request originated from the proxy‘s IP address rather than your own.

There are a few key reasons to use proxies for web scraping:

  1. Anonymity – Proxies mask your real IP address, making it harder for websites to identify and block your scraper.

  2. Avoiding rate limits – Sending too many requests from one IP can trigger rate limiting. By rotating proxies, you distribute requests across many IPs.

  3. Geotargeting – Some proxies let you choose IP locations to bypass regional restrictions and collect localized data.

  4. Improved performance – Premium proxies can provide faster and more reliable connections than your own IP.

Now that we understand the role of proxies, let‘s see how to implement them in C#.

Setting Up a Local Proxy Server

For development and testing, it‘s handy to run a proxy server on your own machine. We‘ll use mitmproxy, a popular open-source proxy.

First, download and install mitmproxy following the instructions for your operating system. Once installed, launch it from the command line:

mitmproxy

You‘ll see mitmproxy‘s interface in the terminal. It‘s now intercepting requests on port 8080.

Let‘s test it with a quick cURL command:

curl --proxy http://localhost:8080 "http://example.com"

You should see the request logged in mitmproxy, and the response printed in the terminal. Great, our proxy is working! Time to put it to use in a C# app.

Web Scraping with Proxies in C#

We‘ll create a .NET console application that scrapes a website via rotating proxies. You‘ll need:

  • Visual Studio 2022 or Visual Studio Code
  • .NET 7 SDK
  • HtmlAgilityPack library

Create a new console app and add the HtmlAgilityPack NuGet package. We‘ll break the code into several classes for clarity.

Configuring the HttpClient

The first step is setting up an HttpClient to route requests through our proxies. Add a ProxyHttpClient class:


public class ProxyHttpClient
{
public static HttpClient CreateClient(string proxyUrl)
{
var httpClientHandler = new HttpClientHandler
{
Proxy = new WebProxy(proxyUrl),
UseProxy = true
};
return new HttpClient(httpClientHandler);
}
}

This creates an HttpClient configured with the provided proxy URL.

Proxy Rotation

To avoid overusing any single proxy, we‘ll rotate through a collection of them. The ProxyRotator class manages a list of valid proxy URLs and randomly selects one for each request:


public class ProxyRotator
{
private readonly List _validProxies = new();
private readonly Random _random = new();

public ProxyRotator(string[] proxies)
{         
    _validProxies = ProxyChecker.GetWorkingProxies(proxies.ToList()).Result;

    if(!_validProxies.Any())
        throw new InvalidOperationException("No working proxies found.");
}

public HttpClient GetClient()
{
    var proxyUrl = _validProxies[
        _random.Next(_validProxies.Count)
    ];
    return ProxyHttpClient.CreateClient(proxyUrl);
}

}

The constructor takes an array of proxy URLs and uses a ProxyChecker to filter out non-working ones. The GetClient method randomly picks a valid proxy URL and returns an HttpClient configured to use it.

Checking Proxy Health

The ProxyChecker class verifies which proxies in the provided list are usable:


public static class ProxyChecker
{
public static async Task<List> GetWorkingProxies(List proxies)
{
var tasks = proxies.Select(CheckProxy);
var results = await Task.WhenAll(tasks);
return proxies.Where((p, i) => results[i]).ToList();
}

private static async Task<bool> CheckProxy(string proxyUrl)
{
    using var client = ProxyHttpClient.CreateClient(proxyUrl);

    try
    {
        var response = await client.GetAsync("http://www.example.com");
        return response.IsSuccessStatusCode;
    }
    catch
    {
        return false;
    }
}

}

It spawns an async task for each proxy that tries sending a request to example.com. Proxies are considered working if they return a success status code. Failed requests (e.g. timeouts) mark a proxy as unusable.

Scraping Content

Finally, let‘s implement the actual web scraping logic. Our WebScraper class fetches a page‘s HTML content, parses it, and extracts some sample data:


public static class WebScraper
{
public static async Task ScrapeAsync(ProxyRotator proxyRotator, string url)
{
using var client = proxyRotator.GetClient();
var response = await client.GetAsync(url);
var content = await response.Content.ReadAsStringAsync();

    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(content);

    var titles = htmlDoc.DocumentNode
        .SelectNodes("//a")
        .Select(a => a.InnerText);

    foreach(var title in titles)
        Console.WriteLine(title);
}

}

It uses an injected ProxyRotator to get an HttpClient with a random proxy. After downloading the HTML, it uses HtmlAgilityPack to parse the DOM and select all link titles to print out.

We can now wire everything together in the main Program class:


public static async Task Main()
{
var proxies = new [] {
"http://proxy1.com:8080",
"http://proxy2.com:8080",
"http://proxy3.com:8080"
};

var proxyRotator = new ProxyRotator(proxies);

var url = "http://books.toscrape.com";
await WebScraper.ScrapeAsync(proxyRotator, url);

}

When we run the console app, it will download the content from books.toscrape.com via one of the provided proxies (assuming at least one is working) and print out all the link titles found on the page.

With a few small classes, we‘ve implemented a robust proxy management system for web scraping in C#! However, maintaining your own proxy servers can be challenging. Next, we‘ll see how to easily integrate proxies from a dedicated service provider.

Using Bright Data Proxies

Bright Data is a leading proxy provider, offering a large pool of over 72M+ residential IPs. Their proxies are ethically sourced and highly anonymous.

Compared to managing your own proxies, using a service like Bright Data has some key advantages:

  • Extensive global coverage with proxies in every country
  • Millions of diverse residential IPs for better anonymity
  • Automatic proxy rotation for maximum reliability
  • Premium network speed and stability
  • Easy setup with standard authentication

Let‘s modify our C# app to use Bright Data‘s proxies instead of the custom list.

Configuring Bright Data Proxies

First, sign up for a Bright Data account to get your proxy authentication details. Then update the ProxyHttpClient class to use them:


public static HttpClient CreateClient()
{
var proxyUrl = "http://zproxy.lum-superproxy.io:22225";
var proxyUsername = "USERNAME";
var proxyPassword = "PASSWORD";

var credentials = new NetworkCredential(proxyUsername, proxyPassword);

var handler = new HttpClientHandler
{
    Proxy = new WebProxy(proxyUrl),
    UseProxy = true,
    PreAuthenticate = true,
    DefaultProxyCredentials = credentials
};

return new HttpClient(handler);

}

Replace USERNAME and PASSWORD with your Bright Data credentials.

We can now remove the custom ProxyRotator and ProxyChecker classes, since Bright Data automatically rotates our requests through their residential proxy pool. Update the WebScraper to directly use ProxyHttpClient:


public static async Task ScrapeAsync(string url)
{
using var client = ProxyHttpClient.CreateClient();

// Keep the rest of the method as is

}

And simplify the Main method:


public static async Task Main()
{
var url = "http://books.toscrape.com";
await WebScraper.ScrapeAsync(url);
}

That‘s it! When you run the program now, it will scrape books.toscrape.com using Bright Data‘s residential proxy network. You get all the benefits of proxy rotation without having to manage any infrastructure yourself.

Conclusion

Proxy servers are essential tools for secure and efficient web scraping. By routing requests through intermediary IPs, you protect your own identity and avoid anti-scraping measures.

In this guide, we learned how to set up proxies in C# applications, both by running a local proxy server and by leveraging a dedicated proxy service. While local proxies are suitable for small-scale testing, production scrapers typically integrate premium proxy networks like Bright Data for their unmatched reliability, performance, and anonymity.

When scraping with proxies, remember to:

  • Respect website terms of service and robots.txt
  • Use reasonable request rates to avoid overloading servers
  • Rotate proxy IPs to distribute requests
  • Monitor proxy health and remove non-working ones
  • Consider geotargeting for localized data collection

With proxies in your toolkit, you can build more powerful and resilient web scrapers in C#. By outsourcing proxy management to experts like Bright Data, you streamline your scraping pipelines and focus on working with data. Happy scraping!

Similar Posts