使用 Beautiful Soup 4 解析 HTML 时无法让循环工作

Can't get for loop to work while parsing HTML using Beautiful Soup 4

提问人:Nathan Brannan 提问时间:2/19/2023 更新时间:2/19/2023 访问量:42

问:

我正在使用 Beautiful Soup 文档来帮助我了解如何实现它。我对整个 Python 不太熟悉,所以也许我犯了语法错误,但我不这么认为。下面的代码应该打印出 Etsy 主页上的任何链接,但它并没有这样做。文档中的内容与此类似,但也许我遗漏了一些东西。这是我的代码:

#!/usr/bin/python3

# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path

# Request to website and download HTML contents
url='https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
req=requests.get(url)
content=req.text

soup=BeautifulSoup(content, 'html.parser')

for x in soup.head.find_all('a'):
    print(x.get('href'))

如果我以这种方式设置,HTML 会打印,但我无法让 for 循环工作。

python 解析 beautifulsoup html 解析

评论

0赞 Peter F 2/19/2023
您当前程序的输出是什么?
0赞 Codist 2/19/2023
@PeterF 它不会产生任何输出
0赞 HedgeHog 2/19/2023
这里的主要问题是您尝试从中选择它们,但没有如此使用或相反。<head><a>for x in soup.body.find_all('a'):for x in soup.find_all('a'):
1赞 Codist 2/19/2023
如果存在语法错误(没有),则在尝试运行代码时会看到 SyntaxError 异常

答:

0赞 Codist 2/19/2023 #1

如果您尝试从指定的 URL 获取所有代码,请:

url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'

with requests.get(url) as r:
    r.raise_for_status()
    soup = BeautifulSoup(r.text, 'lxml')
    if (body := soup.body):
        for a in body.find_all('a', href=True):
            print(a['href'])