无法通过 URL 地址获取页面的 HTML 代码

Can't get HTML code of a page by URL address

提问人:Egor Sharapkov 提问时间:8/22/2023 更新时间:8/23/2023 访问量:45

问:

我需要在 url 地址 https://bakerhughesrigcount.gcs-web.com/intl-rig-count 获取页面的 html 代码。我尝试使用HttpClient,但超出了请求处理时间。也许这个网站有反机器人保护?我尝试将 User-Agent 和 Accept 标头添加到请求中,以使其看起来更真实并匹配正常的浏览器请求,但它没有用

string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count";

        using (HttpClient client = new HttpClient())
        {
            try
            {
                client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
                client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");

                HttpResponseMessage response = await client.GetAsync(url);
                response.EnsureSuccessStatusCode();

                string htmlContent = await response.Content.ReadAsStringAsync();
                Console.WriteLine(htmlContent);
            }
            catch (HttpRequestException e)
            {
                Console.WriteLine($"Error: {e.Message}");
            }
        }

我也尝试使用Selenium,在它的帮助下,我能够获得html代码,但是如何在不使用这个和类似工具的情况下做到这一点呢?

C# HTML HTTP 解析 HttpClient

评论

0赞 Dave S 8/23/2023
查看 Google Chrome 使用开发工具发送的内容。将其与代码发送的内容进行比较,以查看缺少的内容。

答:

1赞 Mohammed Jhosawa 8/23/2023 #1

我建议你也传递这些标头。

client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");

以下是正在运行的示例代码:

using System;
using System.Net.Http;
using System.Threading.Tasks;
                    
public class Program
{   
    public static async Task Main()
    {
        string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count/";

        using (HttpClient client = new HttpClient())
        {
            try {
                client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
                client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");
                client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
                client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");
                var content = await client.GetStringAsync(url);
                Console.WriteLine(content);
            } catch (HttpRequestException e) {
                Console.WriteLine($"Error: {e.Message}");
            }
        }
    }
}

截图:

enter image description here