Python - Beautifulsoup - 解析多个跨度元素

Python - Beautifulsoup - parse multiple span elements

提问人:JJH 提问时间:10/7/2022 最后编辑:JJH 更新时间:10/7/2022 访问量:55

问:

我正在尝试从“span”中提取标题。

以下面的代码为例,我正在寻找的输出是 6536 和 9319,它们是“title”的一部分。如下图所示:

span aria-label=“6536 个用户为此存储库加了星标” class=“Counter js-social-count” data-plural-suffix=“用户已加星标此存储库” data-singular-suffix=“用户已加星标此存储库” data-turbo-replace=“true” data-view-component=“true” id=“repo-stars-counter-star” title=“6,536”>6.5k</span

我在代码 get_text() 的最后一行解析时遇到问题。我认为我们可以使用正则表达式来解析社交明星,但我不确定如何解析。

from bs4 import BeautifulSoup
import requests

websites = ['https://github.com/marketplace/actions/yq-portable-yaml-processor','https://github.com/marketplace/actions/TruffleHog-OSS']

for links in websites:
URL = requests.get(links)
detailsoup = BeautifulSoup(URL.content, "html.parser")

# Extract stars
socialstars = detailsoup.findAll('span', {'class': 'Counter js-social-count'})
socialstarsList = [socialstars.get_text() for socialstars in socialstars]
python html 正则表达式 beautifulsoup html 解析

评论


答:

2赞 Md. Fazlul Hoque 10/7/2022 #1

您将 url 放入列表中并遍历 url 列表,并且由于每个网页星标都包含相同的 .所以你只需要选择一个就足够了。id

from bs4 import BeautifulSoup
import requests

websites = ['https://github.com/marketplace/actions/yq-portable-yaml-processor','https://github.com/marketplace/actions/TruffleHog-OSS']

for links in websites:
    URL = requests.get(links)
    detailsoup = BeautifulSoup(URL.content, "html.parser")

    # Extract stars
    socialstars = detailsoup.select_one('#repo-stars-counter-star').get('title')
    print(socialstars)

输出:

 6,536
 9,319