提问人:Nathan Brannan 提问时间:2/19/2023 更新时间:2/19/2023 访问量:42
使用 Beautiful Soup 4 解析 HTML 时无法让循环工作
Can't get for loop to work while parsing HTML using Beautiful Soup 4
问:
我正在使用 Beautiful Soup 文档来帮助我了解如何实现它。我对整个 Python 不太熟悉,所以也许我犯了语法错误,但我不这么认为。下面的代码应该打印出 Etsy 主页上的任何链接,但它并没有这样做。文档中的内容与此类似,但也许我遗漏了一些东西。这是我的代码:
#!/usr/bin/python3
# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path
# Request to website and download HTML contents
url='https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
req=requests.get(url)
content=req.text
soup=BeautifulSoup(content, 'html.parser')
for x in soup.head.find_all('a'):
print(x.get('href'))
如果我以这种方式设置,HTML 会打印,但我无法让 for 循环工作。
答:
0赞
Codist
2/19/2023
#1
如果您尝试从指定的 URL 获取所有代码,请:
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
with requests.get(url) as r:
r.raise_for_status()
soup = BeautifulSoup(r.text, 'lxml')
if (body := soup.body):
for a in body.find_all('a', href=True):
print(a['href'])
评论
<head>
<a>
for x in soup.body.find_all('a'):
for x in soup.find_all('a'):