使用“加载更多”/“显示更多”按钮解析网站

Parsing a website with "load more"/"show more" buttons

提问人:Rustam 提问时间:7/13/2023 最后编辑:ChimdumebiNebolisaRustam 更新时间:7/13/2023 访问量:49

问:

对于我的项目,我需要在 Capterra 上获得企业的所有评论(应该适用于不同的企业)。我在网络抓取方面没有那么有经验,所以我很难做到这一点)。我尝试获取 Hubspot 的评论(链接到评论部分:https://www.capterra.com/p/152373/HubSpot-CRM/reviews/)

我尝试使用 Python 这样做。(如果有任何 js 选项,我也很高兴听到这些)

显然,基本的 bs4 HTML 方法不起作用,并且对于 Capterra,我认为查看其 Fetch/XHR 也无济于事(无法找到与那里的页面相关的任何内容)。

在这一点上,我感觉有点卡住了,真的不知道如何处理这项任务。最近偶然发现了剧作家和木偶师,但还没有真正了解他们。

P.S:上面提到的方法可能是有效的,但由于我在该领域的无能,我根本没有正确执行。

python 网页抓取 html 解析

评论

0赞 Gugu72 7/13/2023
您可能想更多地了解硒,它完全符合您的需求!
0赞 Community 7/13/2023
请提供足够的代码,以便其他人可以更好地理解或重现问题。

答:

0赞 Andrej Kesely 7/13/2023 #1

您可以使用 Ajax 分页 API 加载更多评论,例如(不过要小心验证码页面):

import requests

api_url = "https://www.capterra.com/spotlight/rest/reviews"

params = {"apiVersion": "2", "productId": "152373", "from": "0", "size": "25"}
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0'}

with requests.session() as s:
    s.headers = headers

    n_pages = 2
    for params['from'] in range(0, 25*n_pages, 25):
        data = s.get(api_url, params=params).json()
        for h in data['hits']:
            print(f'{h["generalComments"][:30]:<30}   {h["prosText"][:30]:<30}   {h["consText"][:30]:<30}')

指纹:

                                 HubSpot is absolutely amazing!   I cannot think of any, to be h
It's great for making a sales    Hubspot manages our sales, mar   HubSpot's appointment calendar
I had to switch to different s   It has some interesting and us   As far as it is advertised as 
We are very satisfied with Hub   I love marketing, sales and cu   The onboarding of the HubSpot 
Easy to learn and easy to mana   It's so clean and intuitive, t   It's expensive. Although you'r
I really appreciate the user-f   HubSpot CRM is a powerful tool   it allows users to segment con
HubSpot is one of the most cri   The integration of everything    There is the occasional rough 
The CRM can easily be deployed   Hubspot is very scaleable and    The pricing model always keeps
Userfriendly, great tool to ha   I like the fact you can make c   I think the inbox feature was 
                                 I liked it for its ease of use   I did not like the waiting tim
We've made amazing sense of ou   I'm usually very leery about u   Sometimes the procedures for t
Bad things aside, I genuinely    It is an incredibly powerful C   I only have a few issues with 
Overall this is by far the bes   HubSpot is easy to deploy and    There are some features that a
As a small business, managing    HubSpot CRM is really great be   I have noticed that some featu
HubSpot has worked well for us   The biggest priorities for us    The biggest con for us is that
It has provided a central plac   I use the free version. It pro   _It is not intuitive to use. Y
Hubspot has a great starter pl   The email marketing tool is ve   The amount of emails you can s
Hubspot has been there every s   Hubspot has a ton of amazing f   Pricing per contact starts to 
Using Sequences has saved a hu   Keeping track of the customers   The full integration with cale
I recommend to anyone who make   Intuitive, easy to use, great    A few missing features, but Hu
absolutely perfect if goals ar   Hubspot is absolutely great as   Like almost every other CRM, H
Must-Try -  A Complete No Brai   > It is free and offers unlimi   I really can not say anything 
Once you get to know your way    It's easy to keep track of eve   It very rarely can be glitchy 
Overall, Hubspot is an incredi   I like the overall ease of use   Hubspot is a wonderful softwar
Overall, the ease of setup and   Without a doubt, the ability t   Some aspects of the software a
PlatformUser150 words about Th   HubSpot is a powerful and comp   While HubSpot offers many adva
I used to hate CRM software. I   The email campaigns, customer    The price you need to pay to p


...and so on.