提问人:Yeuhan Shen 提问时间:2/11/2023 最后编辑:HedgeHogYeuhan Shen 更新时间:2/11/2023 访问量:38
代码正在工作问题是card_elements中的循环不起作用
Code is working problem is that the loop in card_elements is not working
问:
url = "https://github.com/marketplace?category=project-management&type=actions"
driver.get(url)
解析页面的 HTML 内容
soup = BeautifulSoup(driver.page_source, "html.parser")
使用 HTML 标记和属性查找特定元素
card_elements = soup.find_all("div", class_="d-md-flex flex-wrap mb-4")
从元素中提取数据,但它只给出 1 个结果
cards = []
for card_element in card_elements:
title_element = card_element.find("h3", class_="h4")
title = title_element.text
description_element = card_element.find("p", class_="color-fg-muted lh-condensed wb-break-word mb-0")
description = description_element.text
#link = title_element["href"]
card = {
"title": title,
"description": description,
#"link": link
}
cards.append(card)
答:
0赞
HedgeHog
2/11/2023
#1
不需要硒,并尝试更具体地选择您的元素 - 例如,使用 css 选择器
并避免使用类,而是注意 ids 或 HTML 结构。
主要问题是,只有一个元素包含您正在搜索的类,即所有卡片的容器。所以只迭代一次。ResultSet
相反,请选择该容器中包含 :<a>
<h3>
soup.select('a:has(h3)')
例
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get('https://github.com/marketplace?category=project-management&type=actions').text)
data = []
for e in soup.select('a:has(h3)'):
data.append({
'title':e.h3.text.strip(),
'author':e.p.text.strip() if e.p else None ,
'description':e.select_one('p:last-of-type').text.strip()
})
data
输出
[{'title': 'Glo Add Assignee To Cards',
'author': 'Axosoft',
'description': 'GitHub action to add an assignee to Glo Boards cards'},
{'title': 'Glo Move Cards',
'author': 'Axosoft',
'description': 'GitHub action to move Glo Boards cards to a column'},
{'title': 'Jira Find issue key',
'author': 'atlassian',
'description': 'Find an issue inside event'},
{'title': 'Jira Issue Transition',
'author': 'atlassian',
'description': 'Change status of specific Jira issue'},
{'title': 'Jira issue from TODO',
'author': 'atlassian',
'description': 'Create Jira issue for TODO comments'},
{'title': 'Jira Create issue',
'author': 'atlassian',
'description': 'Create a new Jira issue'},...]
评论
0赞
Yeuhan Shen
2/11/2023
当我尝试在同一网站中更改要抓取的网址时,将发生错误 AttributeError:“NoneType”对象没有属性“text”soup = BeautifulSoup(requests.get('https://github.com/marketplace?category=api-management&page=1&type=apps').text)
0赞
HedgeHog
2/11/2023
如果结构发生变化或元素不可用,您必须处理它,没有适合所有抓取的解决方案,如果您有特定需求 - 检查代码编辑
评论