如何使用漂亮的汤对桌子的可折叠/可扩展部分进行网络抓取-解网

问：

我正在从雅虎财经的损益表数据表中收集数据。此表中有可折叠/可展开的部分，网络爬虫似乎无法访问这些部分。如何从折叠的部分中检索数据？这是我从中抓取的网站的链接：https://finance.yahoo.com/quote/AMZN/financials?p=AMZN。

到目前为止，我已经编写了一个程序，可以抓取损益表的可见部分。我也希望检索隐藏的数据，因为它们都在我正在迭代的同一个 div 容器下。

new_url = "https://finance.yahoo.com/quote/AMZN/financials?p=AMZN"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}

new_page = requests.get(new_url, headers=headers)

new_soup = BeautifulSoup(new_page.content, "html.parser")
new_table = new_soup.find_all(class_="M(0) Whs(n) BdEnd Bdc($seperatorColor) D(itb)")

for elem in new_table:
    string = str(elem.text)
    print(string)

python html 网页抓取可扩展

如何使用漂亮的汤对桌子的可折叠/可扩展部分进行网络抓取

How to web scrape collapsible/expandable sections of table using beautiful soup

评论