提问人:John Niroupman 提问时间:6/10/2023 更新时间:6/10/2023 访问量:30
无法解析大多数 Google Play 应用程序
Failing to parse most of the google play applications
问:
这是我为要解析的大多数应用程序运行的代码:
r = requests.get("https://play.google.com/store/apps/details?id=com.pocketly");
soup = BeautifulSoup(r.text)
我大多数时候得到的结果是:
<!DOCTYPE html>
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/><meta content="width=device-width, initial-scale=1" name="viewport"/><link href="//www.gstatic.com/android/market_images/web/favicon_v3.ico" rel="shortcut icon"/><title>Not Found</title>
<style nonce="o8Z-lUeTEzblbdo5fMv2Ew">
body {
font-family: arial,sans-serif;
margin: 50px 10px;
padding: 0;
text-align: center;
}
img {
border: 0
}
.rounded {
-webkit-border-radius: 5px;
-moz-border-radius: 5px;
border-radius: 5px;
}
#content {
margin: 0 auto;
width: 750px;
}
#error-section {
background-color: #d2e3fb;
border: 1px solid #a1b4d9;
color: #666;
font-weight: bold;
padding: 12px 0;
}
#search-section {
border: 1px solid #a1b4d9;
margin: 10px 0;
}
#play-logo {
float: left;
margin: 17px;
}
#search-box {
float: left;
margin: 20px;
}
#debug {
margin-top: 50px;
text-align:left;
}
</style>
</head><body bgcolor="#ffffff" dir="ltr" text="#000000"><div id="content"><div class="uaxL4e" id="error-section">We're sorry, the requested URL was not found on this server.</div><div class="uaxL4e" id="search-section"><a href="/store"><img alt="Google Play" id="play-logo" src="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_1x.png" srcset="//www.gstatic.com/android/market_images/web/play_prism_hlock_v2_2x.png 2x"/></a><form action="/store/search" id="search-box" method="get" style="margin: 32px 10px;"><input name="q" type="text" value=""/><input type="submit" value="Search"/></form><div style="clear:both"></div></div></div></body></html>
一些应用程序返回带有完整页面信息的正常结果,但是它们中的大多数都像上面这样......
可能是什么问题?请帮忙
答:
1赞
Andrej Kesely
6/10/2023
#1
似乎您需要为请求提供 HTTP 标头才能返回正确的信息:User-Agent
import requests
from bs4 import BeautifulSoup
url = 'https://play.google.com/store/apps/details?id=com.pocketly'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
desc = soup.select_one('[data-g-id="description"]').text
print(desc)
指纹:
Pocketly – Your Go-To Personal Loan App for Instant LoansExample | Repayment Time | APR | Amounts | LendersProcessing fees of INR 20 to INR 120 or 3%-7%. GST extra as applicable.
...
评论
1赞
John Niroupman
6/10/2023
谢谢!那有效
评论