对纳斯达克 RSS 提要的 HTTP 请求 在浏览器中工作,但使用代码(C#、Node.js/Axios)挂起,即使使用相同的标头也是如此

HTTP Request to nasdaq RSS feed Works in Browser But Hangs with Code (C#, Node.js/Axios), Even with Identical Headers

提问人:Teoman shipahi 提问时间:10/15/2023 最后编辑:Teoman shipahi 更新时间:10/15/2023 访问量:33

问:

我在尝试向 URL“https://www.nasdaq.com/feed/rssoutbound?category=FinTech”发出 HTTP GET 请求时遇到了一个特殊问题。当我在 Web 浏览器中手动输入此 URL 时,Feed 加载时没有任何问题。但是,当我尝试使用代码以编程方式发出相同的请求时(我尝试过 C# 和 Node.js 和 Axios),请求会无限期挂起并最终超时。HttpClient

这是我的 C# 代码:

public async Task Execute()
{
    using HttpClient httpClient = new HttpClient();
    try
    {
        // Specify the URL you want to request
        string url = "https://www.nasdaq.com/feed/rssoutbound?category=FinTech";

        httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
        httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en-US"));
        httpClient.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en", 0.9));
        httpClient.DefaultRequestHeaders.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
        httpClient.DefaultRequestHeaders.Add("sec-ch-ua-mobile", "?0");
        httpClient.DefaultRequestHeaders.Add("sec-ch-ua-platform", "\"Windows\"");
        httpClient.DefaultRequestHeaders.Add("sec-fetch-dest", "empty");
        httpClient.DefaultRequestHeaders.Add("sec-fetch-mode", "cors");
        httpClient.DefaultRequestHeaders.Add("sec-fetch-site", "same-origin");

        // Add the User-Agent header
        httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36");


        // Send a GET request and wait for the response
        HttpResponseMessage response = await httpClient.GetAsync(url);

        // Check if the request was successful
        if (response.IsSuccessStatusCode)
        {
            // Read the content of the response as a string
            string content = await response.Content.ReadAsStringAsync();

            // Print the content to the console
            Console.WriteLine(content);
        }
        else
        {
            Console.WriteLine($"HTTP request failed with status code: {response.StatusCode}");
        }
    }
    catch (Exception e)
    {
        Console.WriteLine(e);
        throw;
    }
}

我也尝试用 Node.js 和 Axios 发出相同的请求,结果是一样的:

// Node.js code using Axios
const axios = require('axios');

async function fetchData() {
    try {
        const response = await axios.get('https://www.nasdaq.com/feed/rssoutbound?category=FinTech', {
            headers: {
                Accept: '*/*',
                'Accept-Language': 'en-US,en;q=0.9,tr;q=0.8',
                'sec-ch-ua': '"Chromium";v="116", "Not)A;Brand";v="24", "Microsoft Edge";v="116"',
                'sec-ch-ua-mobile': '?0',
                'sec-ch-ua-platform': '"Windows"',
                'sec-fetch-dest': 'empty',
                'sec-fetch-mode': 'cors',
                'sec-fetch-site': 'same-origin',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
            }
        });
        // ... (handling the response)
    } catch (error) {
        console.error('An error occurred:', error.message);
    }
}

fetchData();

奇怪的是,当我尝试从其他 RSS 来源检索提要时,一切正常。这个问题似乎是纳斯达克提要特有的。我什至尝试过使用 Puppeteer 来获取内容,它也会无限期地挂起。

我使用 Edge(浏览器)工具通过清除所有可编辑的标题和没有 cookie 来发送请求,它仍然有效。所以我怀疑这是一个cookie问题。

enter image description here

遵循 PowerShell 也有效;

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81"
Invoke-WebRequest -UseBasicParsing -Uri "https://www.nasdaq.com/feed/rssoutbound?category=FinTech" `
-WebSession $session `
-Headers @{
"authority"="www.nasdaq.com"
  "method"="GET"
  "path"="/feed/rssoutbound?category=FinTech"
  "scheme"="https"
  "accept"="*/*"
  "accept-encoding"="gzip, deflate, br"
  "accept-language"="en-US,en;q=0.9,tr;q=0.8"
  "sec-ch-ua"="`"Chromium`";v=`"116`", `"Not)A;Brand`";v=`"24`", `"Microsoft Edge`";v=`"116`""
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "sec-fetch-dest"="empty"
  "sec-fetch-mode"="cors"
  "sec-fetch-site"="same-origin"
}

enter image description here

是什么原因导致了纳斯达克提要的这个问题?他们的服务器配置或处理可能导致此行为的请求的方式是否有独特之处?任何见解或建议将不胜感激。

C# 节点 .js axios rss httpclient

评论

0赞 gunr2171 10/15/2023
还要看看这是否是cookie问题。
0赞 Teoman shipahi 10/15/2023
@gunr2171感谢您指出这一点。我通过清除所有 cookie 从 Edge 浏览器重新发送请求,它仍然运行良好。我得到 200 个。我不确定它发送请求的方式与HttpClients有何不同。
0赞 gunr2171 10/15/2023
在浏览器中,查看“开发工具>网络”选项卡中的 HTTP 请求。将 HTTP 调用导出到 CURL/Powershell,并在终端中运行它。它看起来不同还是不起作用?
0赞 Teoman shipahi 10/15/2023
是的,PowerShell 和 curl 有效!但不是 fetch/axios/dotnet Http 库。

答:

0赞 Teoman shipahi 10/15/2023 #1

显然问题在于Gzip处理。以下代码有效:

 public async Task GetRssFeed(string url)
 {
     using HttpClient httpClient = new HttpClient();
     try
     {
         HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, url);
         request.Headers.Add("authority", "www.nasdaq.com");
         request.Headers.Add("accept", "*/*");
         request.Headers.Add("accept-encoding", "gzip, deflate, br");
         request.Headers.Add("accept-language", "en-US,en;q=0.9,tr;q=0.8");
         request.Headers.Add("sec-ch-ua", "\"Chromium\";v=\"116\", \"Not)A;Brand\";v=\"24\", \"Microsoft Edge\";v=\"116\"");
         request.Headers.Add("sec-ch-ua-mobile", "?0");
         request.Headers.Add("sec-ch-ua-platform", "\"Windows\"");
         request.Headers.Add("sec-fetch-dest", "empty");
         request.Headers.Add("sec-fetch-mode", "cors");
         request.Headers.Add("sec-fetch-site", "same-origin");
         request.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81");

         HttpResponseMessage response = await httpClient.SendAsync(request);
         response.EnsureSuccessStatusCode();
         if (response.Content.Headers.ContentEncoding.Contains("gzip"))
         {
             await using var responseStream = await response.Content.ReadAsStreamAsync();
             await using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
             using var streamReader = new StreamReader(decompressedStream);
             string decompressedContent = await streamReader.ReadToEndAsync();
             Console.WriteLine(decompressedContent);
         }
     }
     catch (Exception e)
     {
         Console.WriteLine(e);
         throw;
     }
 }