提问人:tanzim abir 提问时间:1/22/2021 最后编辑:DisappointedByUnaccountableModtanzim abir 更新时间:1/29/2021 访问量:396
如何通过html敏捷包获取特定的表数据
How to get specific table data by html agility pack
问:
我正在制作一个网络爬虫来提取股票信息并保存到数据库。我的计划是仅获取公司名称和价格(最新价格、收盘价 YCP 等)并存储为对象。
URL = view-source:https://www.dsebd.org/latest_share_price_scroll_l.php 如果需要,请从 5460 行开始
在这里,我需要先逃逸 tr,然后拉动每个 td[3-7]。
<div class="table-responsive inner-scroll">
<table class='table table-bordered background-white shares-table fixedHeader'>
<thead>
<tr>
<th width="4%">#</th>
<th width="12%">TRADING CODE</th>
<th width="12%">LTP*</th>
<th width="12%">HIGH</th>
<th width="12%">LOW</th>
<th width="12%">CLOSEP*</th>
<th width="12%">YCP*</th>
<th width="12%">CHANGE</th>
<th width="12%">TRADE</th>
<th width="12%">VALUE (mn)</th>
<th width="12%">VOLUME</th>
</tr>
</thead>
<tbody>
<tr>
<td width="4%">1</td>
<td width="15%">
<a href="displayCompany.php?name=1JANATAMF" class='ab1'>
1JANATAMF </a>
</td>
<td width="10%">6.3</td>
<td width="10%">6.7</td>
<td width="12%">6.3</td>
<td width="11%">6.5</td>
<td width="12%">6.6</td>
<td width="12%" style="color: red">-0.3</td>
<td width="11%">218</td>
<td width="11%">11.593</td>
<td width="11%">1,771,986</td>
</tr>
</tbody>
<tr>
<td width="4%">2</td>
<td width="15%">
<a href="displayCompany.php?name=1STPRIMFMF" class='ab1'>
1STPRIMFMF </a>
</td>
<td width="10%">20.2</td>
<td width="10%">21.9</td>
<td width="12%">20</td>
<td width="11%">20.2</td>
<td width="12%">21.3</td>
<td width="12%" style="color: red">-1.1</td>
<td width="11%">420</td>
<td width="11%">16.914</td>
<td width="11%">815,552</td>
</tr>
</tbody>... More stocks
这是我的代码。
public Worker(ILogger<Worker> logger, IParseService parseService)
{
_logger = logger;
_parseService = parseService;
_url = "https://www.dsebd.org/latest_share_price_scroll_l.php";
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var HtmlDoc = GetHtml(_url);
var mainNode = HtmlDoc.DocumentNode.SelectSingleNode("//div[@class='table-responsive inner-scroll']/table[contains(@class, 'table table-bordered background-white shares-table fixedHeader')]").ChildNodes;
foreach (var nodes in mainNode)
{
//Code to get the info
}
感谢您阅读我的问题,非常感谢任何帮助。
答:
0赞
tanzim abir
1/27/2021
#1
foreach (HtmlNode node in mainNode.SelectNodes("//tr"))
{
var latestPrice = node.SelectSingleNode("td[2]") == null ? "" : node.SelectSingleNode("td[2]").InnerText;
var highestPrice = node.SelectSingleNode("td[3]") == null ? "" : node.SelectSingleNode("td[3]").InnerText;
var closingPrice = node.SelectSingleNode("td[4]") == null ? "" : node.SelectSingleNode("td[4]").InnerText;
var yesterdayPrice = node.SelectSingleNode("td[5]") == null ? "" : node.SelectSingleNode("td[5]").InnerText;
var change = node.SelectSingleNode("td[6]") == null ? "" : node.SelectSingleNode("td[6]").InnerText;
var trade = node.SelectSingleNode("td[7]") == null ? "" : node.SelectSingleNode("td[7]").InnerText;
var value = node.SelectSingleNode("td[8]") == null ? "" : node.SelectSingleNode("td[8]").InnerText;
var volume = node.SelectSingleNode("td[9]") == null ? "" : node.SelectSingleNode("td[9]").InnerText;
Regex regex = new Regex(@"^[a - zA - Z]{ 3,}$/");
Match match = regex.Match(latestPrice);
if (match.Success) { Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}", latestPrice, highestPrice, closingPrice, yesterdayPrice, change, trade, value, volume); }
continue;
}
评论