提问人:CaptainG 提问时间:10/31/2022 更新时间:10/31/2022 访问量:78
如何将特定 div 从网页导出到 dataframe?
How to export specific div from webpage to dataframe?
问:
我想从网页中导出一个特定的div。在这种情况下,我想导出 id 为“producer-votes-wrapper”的 div;页面的这一部分包含我想要获取的所有数字(数据)。
使用前面的例子和问题,我尝试自己做,但没有得到想要的结果。当前代码如下所示:
import urllib.request
url= 'https://bloks.io/account/eoseouldotio#votes'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.52',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}
request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need
data
我试图获取网页上提到的所有数据,然后解析并留下最重要的数据。但是使用此代码,我只是得到了无用的文本和信息。
答:
0赞
Andrej Kesely
10/31/2022
#1
你可以使用 js2py
/ 来获取列表,例如:requests
import js2py
import requests
api_url = "https://eos.hyperion.eosrio.io/v2/state/get_voters?producer=eoseouldotio&limit=100"
data = requests.get(api_url).json()
js_func = """\
function $(num) {
var CHAIN = 'eos';
var CORE_PRECISION = 4;
function calculateVoteWeight() {
let e = 'wax' === CHAIN ? 13 : 52,
t = 946684800000,
n = Date.now() / 1000 - t / 1000,
a = Math.floor(n / 604800) / e;
return Math.pow(2, a)
}
function weightedVoteToNumber(e, t) {
return + e / t / Math.pow(10, CORE_PRECISION)
}
return weightedVoteToNumber(num, calculateVoteWeight());
}
"""
get_votes = js2py.eval_js(js_func)
for i, v in enumerate(data["voters"], 1):
print("{:>3} {:<15} {}".format(i, v["account"], get_votes(v["weight"])))
指纹:
1 genpoolproxy 12223404.962632878
2 proxy4nation 9988023.902857617
3 brockpierce1 7567714.822700241
4 votetobestbp 5000016.0222
5 infstonespxy 4893994.756118013
6 3g33s3yvm14o 4000186.264188379
7 ujmbnmhhgege 3454463.383436105
8 votefordefi3 3000006.4445
9 boxproxyacc2 2991160.2925
10 vjyfeimpfipm 2960178.44507899
11 bestbpproxy3 2769515.4555716044
12 eosblueocean 2017680.053802568
13 votefordefi2 2000004.4447
14 goodbpproxy2 2000001.279242206
15 votebpproxy2 2000000.2716760403
16 bestbpproxy2 2000000.1161000005
17 foreosproxy2 2000000.0773
18 22kon3zii2xj 1973517.3920720608
19 eosproxy2222 1846278.1276148804
20 eosuniverse4 1729434.0745837307
21 max1research 1727233.497893132
22 dfsbpsproxy1 1346609.485410652
23 newdexproxy1 1190464.2408884696
24 newdexproxy4 1146121.3897043345
25 newposproxy4 1077755.6027183502
26 iloveeosbp12 1000004.1188
27 iloveeosbp11 1000002.0762
28 tothemoonbp2 1000000.9467
29 foreosvote12 1000000.6146
30 tothemoonbp1 1000000.2276999998
31 goodbpproxy1 1000000.1421999998
32 kvotebpproxy 1000000.1080999998
33 foreosvote11 1000000.0982
34 foreosproxy1 1000000.0768
35 votebpproxy1 1000000.0764
36 zgjenxyhyr2f 986758.6966280856
37 fzrv13xeykgc 986758.6963320582
38 wynxe2mjlun3 986758.6959373545
39 rbqdtrfzkjin 986679.7549457861
40 binanceprxy4 963293.8089988151
41 binanceprxy5 963245.2874580923
42 5hedxzqf2up3 948077.5151924414
43 43vxynfoilbp 948077.5150976336
44 vwdkolamq5xh 948077.5150028258
45 newdexproxy5 833220.2371066288
46 newposproxy2 744271.3075562501
47 newposproxy3 654466.93982948
48 newdexproxy3 609038.8722571591
49 newposproxyp 546820.5754945058
50 newposproxy1 528395.2334034096
51 2lnyrx1ojudz 493379.74351089983
52 newposproxy5 461594.99237992516
53 newdexproxy2 372770.3207974433
54 newposproxyc 205402.62646861575
55 newposproxyk 165242.03631287487
56 newposproxyl 147335.32452166546
57 newposproxyo 147117.71954609765
58 whaleexproxy 124536.93289501838
59 haydinruhage 120025.38310000002
60 newposproxyd 119111.79127485356
61 newposproxyq 113303.98983725329
62 newposproxya 89222.96972714498
63 bloksioproxy 75091.47129018814
64 newposproxyb 65037.34360974116
65 newposproxyf 62185.83976808534
66 newposproxyi 47540.47210508377
67 newposproxyj 44846.98636416771
68 maxmaxmaxmac 42472.890608290276
69 edenproxyrew 42039.79163716058
70 proxy2nation 37521.97772731668
71 newposproxye 24979.75775478982
72 whaleexvotee 24651.26629999998
73 whaleexvote2 18216.5297
74 edenproxydon 16143.093300650424
75 shankooooooo 10230.596108077649
76 fvztvnwifaaa 9242.21950142053
77 newposproxyn 5833.500486037437
78 newposproxym 5376.35260346395
79 whaleexvotei 3239.2693
80 4cdfa3451ba3 2908.8954212365957
81 start11.io 1603.294770884103
82 rexproxy.tp 1582.6070414383278
83 freeandpeace 1155.2506065612258
84 charityalfap 1001.7386707908703
85 bywire.com 853.1515670570791
86 whaleexvotej 707.0379
87 buziliauskas 490.1615893241484
88 voteproxy134 358.65694154656836
89 holdyourfire 355.73633343799105
90 guytsobyg4ge 300
91 gu2tcnjvguge 279.737124995751
92 gm2teojvhege 275.5550399573742
93 darks1de2111 260.70447331984116
94 tvuskwallete 203.2589396612221
95 xy1gullq42cp 201.3439671772095
96 fvztvnwifaab 199.76702136820313
97 songchunlong 194.0824157723007
98 nxcpnfylm.gm 187.10474602129185
99 tlorusso1.gm 184.74382433954523
100 boidcomproxy 182.69725348851
评论
0赞
CaptainG
10/31/2022
请问,您从哪里获得 API URL?
0赞
Andrej Kesely
10/31/2022
@CaptainG我打开了Firefox开发人员工具->网络选项卡(该页面正在执行的所有请求)。但是从 API 调用返回的值需要使用该函数重新计算(该函数可在页面的 Javascript 源代码中找到)。weightedVoteToNumber
0赞
CaptainG
10/31/2022
多谢!还有一个问题(因为我对 JS 和 API 没有太多经验):当我使用这些类型的网站时,我相信 API 不会总是可用,对吧?如果没有,那么我将不得不继续使用硒,如上所述。我说得对吗?
0赞
Andrej Kesely
10/31/2022
@CaptainG 在很多情况下,数据必须来自某个地方 - 因此该页面正在使 Ajax 调用:)但是,是的,有时或类似是必要的。selenium
1赞
CaptainG
10/31/2022
考虑到我在 JS 方面没有那么多经验,即使我找到了 API,将提到的解决方案推广到其他解决方案也会有点困难;但最重要的是,它在这种情况下帮助了我,非常感谢!
评论
requests
beautifulsoup4
<div>
soup
pd.read_html()