提问人:Ahmad Abdelbaset 提问时间:5/21/2023 最后编辑:Ahmad Abdelbaset 更新时间:6/18/2023 访问量:99
为什么“requests-html”不能呈现所有 HTML 内容?
Why is "requests-html" not rendering all HTML content?
问:
我正在尝试抓取数据,但脚本并未加载所有 html 内容,尽管我更改了渲染时间。请看下面的代码:
from requests_html import HTMLSession, AsyncHTMLSession
url = 'https://www.aliexpress.com/w/wholesale-test.html?catId=0&initiative_id=SB_20230516115154&SearchText=test&spm=a2g0o.home.1000002.0'
def create_session(url):
session = HTMLSession()
request = session.get(url)
print("Before ",len(request.html.html),"\n\n")
request.html.render(sleep=5,timeout=20) #Because it is dynamic website, will wait until to load the page
prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a:nth-child(1) > div.manhattan--content--1KpBbUi')
print("After ",len(request.html.html),"\n\n")
print("output:",prod)
session.close()
create_session(url)
当我第一次运行代码时,输出是:
Before 55448
After 542927
output: [<Element 'div' class=('manhattan--content--1KpBbUi',)>]
当我再次运行程序时(不更改代码中的任何内容),我得到:
Before 55448
After 251734
output: []
当我将睡眠时间从 5 更改为 100: to 时,我也收到了类似的输出:request.html.render(sleep=5,timeout=20)
request.html.render(sleep=100,timeout=20)
Before 55448
After 242881
output: []
它不会呈现所有 html 内容
答: 暂无答案
评论