代码正在工作问题是card_elements中的循环不起作用

Code is working problem is that the loop in card_elements is not working

提问人:Yeuhan Shen 提问时间:2/11/2023 最后编辑:HedgeHogYeuhan Shen 更新时间:2/11/2023 访问量:38

问:

url = "https://github.com/marketplace?category=project-management&type=actions"
driver.get(url)

解析页面的 HTML 内容

soup = BeautifulSoup(driver.page_source, "html.parser")

使用 HTML 标记和属性查找特定元素

card_elements = soup.find_all("div", class_="d-md-flex flex-wrap mb-4")

从元素中提取数据,但它只给出 1 个结果

cards = []
for card_element in card_elements:
    title_element = card_element.find("h3", class_="h4")
    title = title_element.text
    description_element = card_element.find("p", class_="color-fg-muted lh-condensed wb-break-word mb-0")
    description = description_element.text
    #link = title_element["href"]
    card = {
        "title": title,
        "description": description,
        #"link": link
    }
    cards.append(card)
python selenium 网页抓取 beautifulsoup html 解析

评论


答:

0赞 HedgeHog 2/11/2023 #1

不需要硒,并尝试更具体地选择您的元素 - 例如,使用 css 选择器并避免使用类,而是注意 ids 或 HTML 结构。

主要问题是,只有一个元素包含您正在搜索的类,即所有卡片的容器。所以只迭代一次。ResultSet

相反,请选择该容器中包含 :<a><h3>

soup.select('a:has(h3)')

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(requests.get('https://github.com/marketplace?category=project-management&type=actions').text)
data = []
for e in soup.select('a:has(h3)'):
    data.append({
        'title':e.h3.text.strip(),
        'author':e.p.text.strip() if e.p else None ,
        'description':e.select_one('p:last-of-type').text.strip()
    })
data

输出

[{'title': 'Glo Add Assignee To Cards',
  'author': 'Axosoft',
  'description': 'GitHub action to add an assignee to Glo Boards cards'},
 {'title': 'Glo Move Cards',
  'author': 'Axosoft',
  'description': 'GitHub action to move Glo Boards cards to a column'},
 {'title': 'Jira Find issue key',
  'author': 'atlassian',
  'description': 'Find an issue inside event'},
 {'title': 'Jira Issue Transition',
  'author': 'atlassian',
  'description': 'Change status of specific Jira issue'},
 {'title': 'Jira issue from TODO',
  'author': 'atlassian',
  'description': 'Create Jira issue for TODO comments'},
 {'title': 'Jira Create issue',
  'author': 'atlassian',
  'description': 'Create a new Jira issue'},...]

评论

0赞 Yeuhan Shen 2/11/2023
当我尝试在同一网站中更改要抓取的网址时,将发生错误 AttributeError:“NoneType”对象没有属性“text”soup = BeautifulSoup(requests.get('https://github.com/marketplace?category=api-management&page=1&type=apps').text)
0赞 HedgeHog 2/11/2023
如果结构发生变化或元素不可用,您必须处理它,没有适合所有抓取的解决方案,如果您有特定需求 - 检查代码编辑