提问人:Egor Sharapkov 提问时间:8/22/2023 更新时间:8/23/2023 访问量:45
无法通过 URL 地址获取页面的 HTML 代码
Can't get HTML code of a page by URL address
问:
我需要在 url 地址 https://bakerhughesrigcount.gcs-web.com/intl-rig-count 获取页面的 html 代码。我尝试使用HttpClient,但超出了请求处理时间。也许这个网站有反机器人保护?我尝试将 User-Agent 和 Accept 标头添加到请求中,以使其看起来更真实并匹配正常的浏览器请求,但它没有用
string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count";
using (HttpClient client = new HttpClient())
{
try
{
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
string htmlContent = await response.Content.ReadAsStringAsync();
Console.WriteLine(htmlContent);
}
catch (HttpRequestException e)
{
Console.WriteLine($"Error: {e.Message}");
}
}
我也尝试使用Selenium,在它的帮助下,我能够获得html代码,但是如何在不使用这个和类似工具的情况下做到这一点呢?
答:
1赞
Mohammed Jhosawa
8/23/2023
#1
我建议你也传递这些标头。
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");
以下是正在运行的示例代码:
using System;
using System.Net.Http;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count/";
using (HttpClient client = new HttpClient())
{
try {
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");
var content = await client.GetStringAsync(url);
Console.WriteLine(content);
} catch (HttpRequestException e) {
Console.WriteLine($"Error: {e.Message}");
}
}
}
}
截图:
评论