提问人:Omar Kabil 提问时间:6/19/2023 更新时间:6/19/2023 访问量:19
我正在尝试从页面源代码中抓取一个html文件,但是代码在运行代码时给我错误
I'm trying to scraping a html file from page source but the code give me error when run code
问:
从 bs4 导入 BeautifulSoup 导入 CSV 从 itertools 导入zip_longest
将 open('dhsh.html', 'r') 替换为main_page: src = main_page.read()
rod_length = []
rod_power = []
line_rating = []
number_of_pc = []
lure_rating = []
rod_handle_type = []
number_of_guides_with_tip = []
water_type = []
model_number = []
soup = BeautifulSoup(src, 'lxml')
rod_lengths = soup.find_all('td', {'class', 'rod_lengths'})
rod_powers = soup.find_all('td', {'class', 'rod_powers'})
line_ratings = soup.find_all('td', {'class', 'line_ratings'})
number_of_pcs = soup.find_all('td', {'class', 'num_of_pcs'})
rod_handle_types = soup.find_all('td', {'class', 'rod_handle_type'})
lure_ratings = soup.find_all('td', {'class', 'lure_ratings'})
water_types = soup.find_all('td', {'class', 'water_type'})
model_numbers = soup.find_all('a', {'class', 'model_numbers'})
# ugly_tech = {'stik type':}
for i in range(len(rod_lengths)):
rod_length.append(rod_lengths[i].text)
rod_power.append(rod_powers[i].text)
line_rating.append(line_ratings[i].text)
number_of_pc.append(number_of_pcs[i].text)
rod_handle_type.append(rod_handle_types[i].text)
lure_rating.append(lure_ratings[i].text)
water_type.append(water_types[i].text)
model_number.append(model_numbers[i].text)
gx22s = [rod_length, rod_power, line_rating, number_of_pc, rod_handle_type, lure_rating, water_type, model_number]
unpaked = zip_longest(*gx22s)
with open('ugly_stik.csv', 'w') as ugly_stik:
wr = csv.writer(ugly_stik)
wr.writerow(['rod lengths', 'rod powers', 'line ratings', 'number of pcs', 'rod handle types', 'lure ratings', 'water types', 'model numbers'])
wr.writerows(unpaked)
回溯(最近一次调用最后一次): 文件“c:\Users\Dkabil\web scrapping\gx2.py”,第 36 行,在 model_number.append(model_numbers[i].text) ~~~~~~~~~~~~~^^^ IndexError:列表索引超出范围
答: 暂无答案
评论