如何通过html敏捷包获取特定的表数据

How to get specific table data by html agility pack

提问人:tanzim abir 提问时间:1/22/2021 最后编辑:DisappointedByUnaccountableModtanzim abir 更新时间:1/29/2021 访问量:396

问:

我正在制作一个网络爬虫来提取股票信息并保存到数据库。我的计划是仅获取公司名称和价格(最新价格、收盘价 YCP 等)并存储为对象。

URL = view-source:https://www.dsebd.org/latest_share_price_scroll_l.php 如果需要,请从 5460 行开始

在这里,我需要先逃逸 tr,然后拉动每个 td[3-7]。

<div class="table-responsive inner-scroll">
                                <table class='table table-bordered background-white shares-table fixedHeader'>
                                    <thead>
                                        <tr>
                                            <th width="4%">#</th>
                                            <th width="12%">TRADING CODE</th>
                                            <th width="12%">LTP*</th>
                                            <th width="12%">HIGH</th>
                                            <th width="12%">LOW</th>
                                            <th width="12%">CLOSEP*</th>
                                            <th width="12%">YCP*</th>
                                            <th width="12%">CHANGE</th>
                                            <th width="12%">TRADE</th>
                                            <th width="12%">VALUE (mn)</th>
                                            <th width="12%">VOLUME</th>
                                        </tr>
                                    </thead>
                                    <tbody>
                                                                                <tr>
                                            <td width="4%">1</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1JANATAMF" class='ab1'>
                                                    1JANATAMF                                               </a>
                                            </td>
                                            <td width="10%">6.3</td>
                                            <td width="10%">6.7</td>
                                            <td width="12%">6.3</td>
                                            <td width="11%">6.5</td>
                                            <td width="12%">6.6</td>
                                            <td width="12%" style="color: red">-0.3</td>
                                            <td width="11%">218</td>
                                            <td width="11%">11.593</td>
                                            <td width="11%">1,771,986</td>
                                        </tr>
                                    </tbody>
                                                                            <tr>
                                            <td width="4%">2</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1STPRIMFMF" class='ab1'>
                                                    1STPRIMFMF                                              </a>
                                            </td>
                                            <td width="10%">20.2</td>
                                            <td width="10%">21.9</td>
                                            <td width="12%">20</td>
                                            <td width="11%">20.2</td>
                                            <td width="12%">21.3</td>
                                            <td width="12%" style="color: red">-1.1</td>
                                            <td width="11%">420</td>
                                            <td width="11%">16.914</td>
                                            <td width="11%">815,552</td>
                                        </tr>
                                    </tbody>... More stocks

这是我的代码。

    public Worker(ILogger<Worker> logger, IParseService parseService)
            {
                _logger = logger;
                _parseService = parseService;
                _url = "https://www.dsebd.org/latest_share_price_scroll_l.php";
            }
    
            protected override async Task ExecuteAsync(CancellationToken stoppingToken)
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    var HtmlDoc = GetHtml(_url);
                    var mainNode = HtmlDoc.DocumentNode.SelectSingleNode("//div[@class='table-responsive inner-scroll']/table[contains(@class, 'table table-bordered background-white shares-table fixedHeader')]").ChildNodes;
    
                

foreach (var nodes in mainNode)
            {
                //Code to get the info
}

感谢您阅读我的问题,非常感谢任何帮助。

C# 网页抓取 解析 html-agility-pack

评论


答:

0赞 tanzim abir 1/27/2021 #1
foreach (HtmlNode node in mainNode.SelectNodes("//tr"))
                {
                    var latestPrice = node.SelectSingleNode("td[2]") == null ? "" : node.SelectSingleNode("td[2]").InnerText;
                    var highestPrice = node.SelectSingleNode("td[3]") == null ? "" : node.SelectSingleNode("td[3]").InnerText;
                    var closingPrice = node.SelectSingleNode("td[4]") == null ? "" : node.SelectSingleNode("td[4]").InnerText;
                    var yesterdayPrice = node.SelectSingleNode("td[5]") == null ? "" : node.SelectSingleNode("td[5]").InnerText;
                    var change = node.SelectSingleNode("td[6]") == null ? "" : node.SelectSingleNode("td[6]").InnerText;
                    var trade = node.SelectSingleNode("td[7]") == null ? "" : node.SelectSingleNode("td[7]").InnerText;
                    var value = node.SelectSingleNode("td[8]") == null ? "" : node.SelectSingleNode("td[8]").InnerText;
                    var volume = node.SelectSingleNode("td[9]") == null ? "" : node.SelectSingleNode("td[9]").InnerText;

                    Regex regex = new Regex(@"^[a - zA - Z]{ 3,}$/"); 

                          Match match = regex.Match(latestPrice);

                    if (match.Success) { Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}", latestPrice, highestPrice, closingPrice, yesterdayPrice, change, trade, value, volume); }
                    continue;
                    
                }