在 Python 中抓取包含某些字符和名称的文本？-解网

问：

我对 python 相当陌生，并且正在从事一个项目，在这个项目中，我需要在一堆文章中引用某些人的所有报价。

对于这个问题，我以这篇文章为例：https://www.theguardian.com/us-news/2021/oct/17/jeffrey-clark-scrutiny-trump-election-subversion-scheme

现在，使用 Lambda，我能够使用以下代码抓取包含我正在寻找的人的姓名的文本：

import requests
from bs4 import BeautifulSoup
url = 'https://www.theguardian.com/us-news/2021/oct/17/jeffrey-clark-scrutiny-trump-election-subversion-scheme'
response = requests.get(url)
data=response.text
soup=BeautifulSoup(data,'html.parser')
tags=soup.find_all('p')
words = ["Michael Bromwich"]
for tag in tags:
    quotes=soup.find("p",{"class":"dcr-s23rjr"}, text=lambda text: text and any(x in text for x in words)).text

print(quotes)

...它返回包含“Michael Bromwich”的文本块，在本例中，它实际上是文章中的引用。但是当抓取 100+ 篇文章时，这并不能完成这项工作，因为其他文本块也可能包含指示的名称而不包含引号。我只想要包含引号的文本字符串。

因此，我的问题：是否可以在以下条件下打印所有 HTML 字符串：

文本 STARTS with the caracter “ （引号） OR - （连字符）并包含名称“Michael Bromwich”或“John Johnson”等。

谢谢！

Python 正则表达式 lambda beautifulsoup 行情

在 Python 中抓取包含某些字符和名称的文本？

Scraping text containing certain caracters and names in Python?

评论