提问人:Marco Almeida 提问时间:3/10/2023 更新时间:3/10/2023 访问量:180
如何遍历 HTML 并解析特定数据?
How to iterate through HTML and parse specific data?
问:
下面的 python 代码是从 html 特定数据中提取的,它仅适用于 html 中包含的一个实例。
我需要的是代码来遍历具有多个实例的 html 并检索特定信息。那么,我怎样才能做到这一点呢?
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Exported Data</title>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="css/style.css" rel="stylesheet"/>
<script src="js/script.js" type="text/javascript">
</script>
</head>
<body onload="CheckLocation();">
<div class="page_wrap">
<div class="page_header">
<div class="content">
<div class="text bold">
🤖🥇 𝑬𝒂𝒔𝒚 𝑩𝒐𝒕 - 𝑶𝒗𝒆𝒓 2.5
</div>
</div>
</div>
<div class="page_body chat_page">
<div class="history">
<div class="message service" id="message-1">
<div class="body details">
9 March 2023
</div>
</div>
<div class="message default clearfix" id="message3984">
<div class="pull_left userpic_wrap">
<div class="userpic userpic2" style="width: 42px; height: 42px">
<div class="initials" style="line-height: 42px">
?
</div>
</div>
</div>
<div class="body">
<div class="pull_right date details" title="09.03.2023 00:27:10 UTC-03:00">
00:27
</div>
<div class="from_name">
🤖🥇 𝑬𝒂𝒔𝒚 𝑩𝒐𝒕 - 𝑶𝒗𝒆𝒓 2.5
</div>
<div class="text">
Easy Bot - Over 2.5<br><br>🏆 Liga: Premiership<br>🚦 Entrada: Over 2.5 FT<br>⚽ Jogos: ✅ 03:30 03:33 03:36 ( 03:39)<br><br><strong>Link: </strong><a href="https://www.bet365.com/#/AVR/B146/R%5E1/">https://www.bet365.com/#/AVR/B146/R%5E1/</a><br><br>🍀 24h:100% de acerto nas últimas 24h<br><br>✅✅✅✅✅✅ .
</div>
</div>
</div>
<div class="message default clearfix" id="message3985">
<div class="pull_left userpic_wrap">
<div class="userpic userpic2" style="width: 42px; height: 42px">
<div class="initials" style="line-height: 42px">
?
</div>
</div>
</div>
<div class="body">
<div class="pull_right date details" title="09.03.2023 00:45:16 UTC-03:00">
00:45
</div>
<div class="from_name">
🤖🥇 𝑬𝒂𝒔𝒚 𝑩𝒐𝒕 - 𝑶𝒗𝒆𝒓 2.5
</div>
<div class="text">
Easy Bot - Over 2.5<br><br>🏆 Liga: Premiership<br>🚦 Entrada: Over 2.5 FT<br>⚽ Jogos: ✅ 03:48 03:51 03:54 ( 03:57)<br><br><strong>Link: </strong><a href="https://www.bet365.com/#/AVR/B146/R%5E1/">https://www.bet365.com/#/AVR/B146/R%5E1/</a><br><br>🍀 24h:100% de acerto nas últimas 24h<br><br>✅✅✅✅✅✅ .
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
答:
1赞
Jack Fleeting
3/10/2023
#1
好吧,这个问题比你之前的问题要复杂一些,所以你需要更多的杂技:
for b in soup.select('div[class="body"]'):
d_str = b.select_one('div.date.details')['title']
calendar = d_str.split(" ")
print("Date: ",calendar[0])
print("Time: ",calendar[1])
targets = b.select('div.text')
for target in targets:
for sts in target.stripped_strings:
if "⚽ Jogos: " in sts:
jugos = [elem for elem in sts.split('⚽ Jogos: ')[1].replace('( ',"(").split(" ") if elem]
if "✅" in jugos:
ind = jugos.index('✅')+1
print("Checkmarked: ", ind)
jugos.remove("✅")
print(jugos)
else:
print(jugos)
print("Checkmarked: NA")
print('------------------------------------')
输出:
Date: 09.03.2023
Time: 00:27:10
Checkmarked: 1
['03:30', '03:33', '03:36', '(03:39)']
------------------------------------
Date: 09.03.2023
Time: 00:45:16
Checkmarked: 1
['03:48', '03:51', '03:54', '(03:57)']
评论