无法解析大多数 Google Play 应用程序

Failing to parse most of the google play applications

提问人:John Niroupman 提问时间:6/10/2023 更新时间:6/10/2023 访问量:30

问:

这是我为要解析的大多数应用程序运行的代码:

r = requests.get("https://play.google.com/store/apps/details?id=com.pocketly");
soup = BeautifulSoup(r.text)

我大多数时候得到的结果是:

<!DOCTYPE html>
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/><meta content="width=device-width, initial-scale=1" name="viewport"/><link href="//www.gstatic.com/android/market_images/web/favicon_v3.ico" rel="shortcut icon"/><title>Not Found</title>
<style nonce="o8Z-lUeTEzblbdo5fMv2Ew">
  body {
    font-family: arial,sans-serif;
    margin: 50px 10px;
    padding: 0;
    text-align: center;
  }
  img {
    border: 0
  }
  .rounded {
    -webkit-border-radius: 5px;
    -moz-border-radius: 5px;
    border-radius: 5px;
  }
  #content {
    margin: 0 auto;
    width: 750px;
  }
  #error-section {
    background-color: #d2e3fb;
    border: 1px solid #a1b4d9;
    color: #666;
    font-weight: bold;
    padding: 12px 0;
  }
  #search-section {
    border: 1px solid #a1b4d9;
    margin: 10px 0;
  }
  #play-logo {
    float: left;
    margin: 17px;
  }
  #search-box {
    float: left;
    margin: 20px;
  }
  #debug {
    margin-top: 50px;
    text-align:left;
  }
  </style>
</head><body bgcolor="#ffffff" dir="ltr" text="#000000"><div id="content"><div class="uaxL4e" id="error-section">We're sorry, the requested URL was not found on this server.</div><div class="uaxL4e" id="search-section"><a href="/store"><img alt="Google Play" id="play-logo" src="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_1x.png" srcset="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_2x.png 2x"/></a><form action="/store/search" id="search-box" method="get" style="margin: 32px 10px;"><input name="q" type="text" value=""/><input type="submit" value="Search"/></form><div style="clear:both"></div></div></div></body></html>

一些应用程序返回带有完整页面信息的正常结果,但是它们中的大多数都像上面这样......

可能是什么问题?请帮忙

Python 解析 Beautifulsoup

评论

0赞 Andrej Kesely 6/10/2023
您需要从该页面获取哪些信息?
0赞 John Niroupman 6/10/2023
@AndrejKesely我需要获取应用程序信息,例如它的描述,下载次数等

答:

1赞 Andrej Kesely 6/10/2023 #1

似乎您需要为请求提供 HTTP 标头才能返回正确的信息:User-Agent

import requests
from bs4 import BeautifulSoup

url = 'https://play.google.com/store/apps/details?id=com.pocketly'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

desc = soup.select_one('[data-g-id="description"]').text
print(desc)

指纹:

Pocketly – Your Go-To Personal Loan App for Instant LoansExample | Repayment Time | APR | Amounts | LendersProcessing fees of INR 20 to INR 120 or 3%-7%. GST extra as applicable.

...

评论

1赞 John Niroupman 6/10/2023
谢谢!那有效