检索锚标记中的所有 href-解网

问：

import warnings
import numpy as np
from datetime import datetime
import json
from bs4 import BeautifulSoup

warnings.filterwarnings('ignore')

url = "https://understat.com/league/EPL/2022"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

for link in soup.find_all("a", class_="match-info"):
    href = link.get("href")
    print(href)

不幸的是，此代码未找到任何结果，所需的结果是网页这一部分中的 hrefs

< class=“match-info” data-isresult=“true” href = “match/18265” >

有什么想法吗？

蟒蛇美汤

import re
import json
import requests
import pandas as pd


url = "https://understat.com/league/EPL/2022"
html_doc = requests.get(url).text

data = re.search(r"datesData\s*=\s*JSON\.parse\('(.*?)'\)", html_doc).group(1)
data = re.sub(r'\\x([\dA-F]{2})', lambda g: chr(int(g.group(1), 16)), data)
data = json.loads(data)

all_data = []
for d in data:
    all_data.append({
        'Team 1': d['h']['title'],
        'Team 2': d['a']['title'],
        'Goals': f'{d["goals"]["h"]} - {d["goals"]["a"]}',
        'Date': d['datetime'],
        'xG': [d['xG']['h'], d['xG']['a']],
        'forecast': list(d.get('forecast', {}).values())
    })

df = pd.DataFrame(all_data)
print(df)

指纹：

                      Team 1                   Team 2        Goals                 Date                     xG                  forecast
0             Crystal Palace                  Arsenal        0 - 2  2022-08-05 19:00:00     [1.20637, 1.43601]  [0.2864, 0.2912, 0.4224]
1                     Fulham                Liverpool        2 - 2  2022-08-06 11:30:00     [1.26822, 2.34111]  [0.1225, 0.2133, 0.6642]
2                Bournemouth              Aston Villa        2 - 0  2022-08-06 14:00:00   [0.588341, 0.488895]   [0.3213, 0.4397, 0.239]
3                      Leeds  Wolverhampton Wanderers        2 - 1  2022-08-06 14:00:00     [0.88917, 1.10119]  [0.2798, 0.3166, 0.4036]
4           Newcastle United        Nottingham Forest        2 - 0  2022-08-06 14:00:00     [1.8591, 0.235825]  [0.8023, 0.1695, 0.0282]
5                  Tottenham              Southampton        4 - 1  2022-08-06 14:00:00     [1.6172, 0.386546]  [0.7002, 0.2209, 0.0789]
6                    Everton                  Chelsea        0 - 1  2022-08-06 16:30:00    [0.541983, 1.92315]    [0.06, 0.1717, 0.7683]
7          Manchester United                 Brighton        1 - 2  2022-08-07 13:00:00      [1.42103, 1.7289]      [0.281, 0.269, 0.45]
8                  Leicester                Brentford        2 - 2  2022-08-07 13:00:00   [0.455695, 0.931067]  [0.1615, 0.3491, 0.4894]

...and so on.

检索锚标记中的所有 href

Retrieving all hrefs in anchor tags

评论

评论