提问人:Teoman shipahi 提问时间:10/15/2023 最后编辑:Teoman shipahi 更新时间:10/15/2023 访问量:33
对纳斯达克 RSS 提要的 HTTP 请求 在浏览器中工作,但使用代码(C#、Node.js/Axios)挂起,即使使用相同的标头也是如此
HTTP Request to nasdaq RSS feed Works in Browser But Hangs with Code (C#, Node.js/Axios), Even with Identical Headers
问:
我在尝试向 URL“https://www.nasdaq.com/feed/rssoutbound?category=FinTech”发出 HTTP GET 请求时遇到了一个特殊问题。当我在 Web 浏览器中手动输入此 URL 时,Feed 加载时没有任何问题。但是,当我尝试使用代码以编程方式发出相同的请求时(我尝试过 C# 和 Node.js 和 Axios),请求会无限期挂起并最终超时。HttpClient
这是我的 C# 代码:
public async Task Execute()
{
using HttpClient httpClient = new HttpClient();
try
{
// Specify the URL you want to request
string url = "https://www.nasdaq.com/feed/rssoutbound?category=FinTech";
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en-US"));
httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en", 0.9));
httpClient.DefaultRequestHeaders.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
httpClient.DefaultRequestHeaders.Add("sec-ch-ua-mobile", "?0");
httpClient.DefaultRequestHeaders.Add("sec-ch-ua-platform", "\"Windows\"");
httpClient.DefaultRequestHeaders.Add("sec-fetch-dest", "empty");
httpClient.DefaultRequestHeaders.Add("sec-fetch-mode", "cors");
httpClient.DefaultRequestHeaders.Add("sec-fetch-site", "same-origin");
// Add the User-Agent header
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36");
// Send a GET request and wait for the response
HttpResponseMessage response = await httpClient.GetAsync(url);
// Check if the request was successful
if (response.IsSuccessStatusCode)
{
// Read the content of the response as a string
string content = await response.Content.ReadAsStringAsync();
// Print the content to the console
Console.WriteLine(content);
}
else
{
Console.WriteLine($"HTTP request failed with status code: {response.StatusCode}");
}
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}
}
我也尝试用 Node.js 和 Axios 发出相同的请求,结果是一样的:
// Node.js code using Axios
const axios = require('axios');
async function fetchData() {
try {
const response = await axios.get('https://www.nasdaq.com/feed/rssoutbound?category=FinTech', {
headers: {
Accept: '*/*',
'Accept-Language': 'en-US,en;q=0.9,tr;q=0.8',
'sec-ch-ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Microsoft Edge";v="116"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
}
});
// ... (handling the response)
} catch (error) {
console.error('An error occurred:', error.message);
}
}
fetchData();
奇怪的是,当我尝试从其他 RSS 来源检索提要时,一切正常。这个问题似乎是纳斯达克提要特有的。我什至尝试过使用 Puppeteer 来获取内容,它也会无限期地挂起。
我使用 Edge(浏览器)工具通过清除所有可编辑的标题和没有 cookie 来发送请求,它仍然有效。所以我怀疑这是一个cookie问题。
遵循 PowerShell 也有效;
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81"
Invoke-WebRequest -UseBasicParsing -Uri "https://www.nasdaq.com/feed/rssoutbound?category=FinTech" `
-WebSession $session `
-Headers @{
"authority"="www.nasdaq.com"
"method"="GET"
"path"="/feed/rssoutbound?category=FinTech"
"scheme"="https"
"accept"="*/*"
"accept-encoding"="gzip, deflate, br"
"accept-language"="en-US,en;q=0.9,tr;q=0.8"
"sec-ch-ua"="`"Chromium`";v=`"116`", `"Not)A;Brand`";v=`"24`", `"Microsoft Edge`";v=`"116`""
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"sec-fetch-dest"="empty"
"sec-fetch-mode"="cors"
"sec-fetch-site"="same-origin"
}
是什么原因导致了纳斯达克提要的这个问题?他们的服务器配置或处理可能导致此行为的请求的方式是否有独特之处?任何见解或建议将不胜感激。
答:
0赞
Teoman shipahi
10/15/2023
#1
显然问题在于Gzip处理。以下代码有效:
public async Task GetRssFeed(string url)
{
using HttpClient httpClient = new HttpClient();
try
{
HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, url);
request.Headers.Add("authority", "www.nasdaq.com");
request.Headers.Add("accept", "*/*");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("accept-language", "en-US,en;q=0.9,tr;q=0.8");
request.Headers.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
request.Headers.Add("sec-ch-ua-mobile", "?0");
request.Headers.Add("sec-ch-ua-platform", "\"Windows\"");
request.Headers.Add("sec-fetch-dest", "empty");
request.Headers.Add("sec-fetch-mode", "cors");
request.Headers.Add("sec-fetch-site", "same-origin");
request.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81");
HttpResponseMessage response = await httpClient.SendAsync(request);
response.EnsureSuccessStatusCode();
if (response.Content.Headers.ContentEncoding.Contains("gzip"))
{
await using var responseStream = await response.Content.ReadAsStreamAsync();
await using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
string decompressedContent = await streamReader.ReadToEndAsync();
Console.WriteLine(decompressedContent);
}
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}
}
评论