Beautiful Soup 只提取一个标签,当可以在 html 代码中看到所有其他标签时

Beautiful Soup only extracting one tag when can see all the others in the html code

提问人:Jake Wright 提问时间:11/21/2021 更新时间:11/21/2021 访问量:93

问:

试图了解网页抓取的工作原理:

import requests
from bs4 import BeautifulSoup as soup
url = "https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
result = requests.get(url)
doc = soup(result.text, "lxml")
items = doc.find_all('div', {'class': 'col-sm-4 col-lg-4 col-md-4'})
for item in items:
    caption = item.find('div', {'class': 'caption'})
    price = item.find('h4', {'class': 'pull-right price'})
print(price.string)

但是,当我运行此操作时,返回的只是网站的最终价格(1799.00 美元)。为什么它跳过所有其他 h4 标签而只返回最后一个?

任何帮助将不胜感激!

如果您需要更多信息,请告诉我

python 网页抓取 beautifulsoup html 解析 lxml

评论


答:

6赞 HedgeHog 11/21/2021 #1

会发生什么?

在你最终迭代了你的结果之后,你才打电话,这就是为什么你只得到最后一个。print()

如何解决?

将 放入您的循环中print()

for item in items:
    caption = item.find('div', {'class': 'caption'})
    price = item.find('h4', {'class': 'pull-right price'})
    print(price.string)

输出

$295.99
$299.00
$299.00
$306.99
$321.94
$356.49
$364.46
$372.70
$379.94
$379.95
$391.48
$393.88
$399.00
$399.99
$404.23
$408.98
$409.63
$410.46
$410.66
$416.99
$433.30
$436.29
$436.29
$439.73
$454.62
$454.73
$457.38
$465.95
$468.56
$469.10
$484.23
$485.90
$487.80
$488.64
$488.78
$494.71
$497.17
$498.23
$520.99
$564.98
$577.99
$581.99
$609.99
$679.00
$679.00
$729.00
$739.99
$745.99
$799.00
$809.00
$899.00
$999.00
$1033.99
$1096.02
$1098.42
$1099.00
$1099.00
$1101.83
$1102.66
$1110.14
$1112.91
$1114.55
$1123.87
$1123.87
$1124.20
$1133.82
$1133.91
$1139.54
$1140.62
$1143.40
$1144.20
$1144.40
$1149.00
$1149.00
$1149.73
$1154.04
$1170.10
$1178.19
$1178.99
$1179.00
$1187.88
$1187.98
$1199.00
$1199.00
$1199.73
$1203.41
$1212.16
$1221.58
$1223.99
$1235.49
$1238.37
$1239.20
$1244.99
$1259.00
$1260.13
$1271.06
$1273.11
$1281.99
$1294.74
$1299.00
$1310.39
$1311.99
$1326.83
$1333.00
$1337.28
$1338.37
$1341.22
$1347.78
$1349.23
$1362.24
$1366.32
$1381.13
$1399.00
$1399.00
$1769.00
$1769.00
$1799.00

与其只是在迭代时打印结果,不如将它们结构化地存储在字典列表中,并在 for 循环之后打印或保存

import requests
from bs4 import BeautifulSoup as soup
url = "https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
result = requests.get(url)
doc = soup(result.text, "lxml")
items = doc.find_all('div', {'class': 'col-sm-4 col-lg-4 col-md-4'})
data = []
for item in items:
    data.append({
        'caption' : item.a['title'],
        'price' : item.find('h4', {'class': 'pull-right price'}).string
    })
    
print(data)