根据循环中的列表内容将列添加到数据框?-蟒

Adding Column to data frame based on list content in a loop? - Python

提问人:cvel 提问时间:1/24/2023 更新时间:1/24/2023 访问量:49

问:

我正在从 NHL API 中提取数据,以获取基于单个比赛的球员统计数据。我正在尝试创建一个循环来调用数据,解析 JSON,创建一个字典,然后我可以从中为整个团队创建一个数据帧。循环前的代码如下所示:

API_URL = "https://statsapi.web.nhl.com/api/v1"

response = requests.get(API_URL + "/people/8477956/stats?stats=gameLog", params={"Content-Type": "application/json"})

data = json.loads(response.text)

df_list_dict = []
for game in data['stats'][0]['splits']:
  curr_dict = game['stat']
  curr_dict['date'] = game['date']
  curr_dict['isHome'] = game['isHome']
  curr_dict['isWin'] = game['isWin']
  curr_dict['isOT'] = game['isOT']
  curr_dict['team'] = game['team']['name']
  curr_dict['opponent'] = game['opponent']['name']

  df_list_dict.append(curr_dict)

df = pd.DataFrame.from_dict(df_list_dict)
print(df)

这为我提供了一个单人玩家的可理解数据帧。(/人/{玩家}/....

我想遍历一个列表(该列表是一支 NHL 球队),同时添加一个列来标识球员并连接创建的数据帧。到目前为止,我的尝试如下所示:

import requests 
import json
import pandas as pd

Rangers  = ['8478550', '8476459', '8479323', '8476389', '8475184', '8480817', '8480078', '8476624', '8481554', '8482109', '8476918', '8476885', '8479324', 
'8482073', '8479328', '8480833', '8478104', '8477846', '8477380', '8477380', '8477433', '8479333', '8479991']


def callapi(player):
    response = (requests.get(f'https://statsapi.web.nhl.com/api/v1/people/{player}/stats?stats=gameLog', params={"Content-Type": "application/json"}))
    data = json.loads(response.text)
    df_list_dict = []
    for game in data['stats'][0]['splits']:
        curr_dict = game['stat']
        curr_dict['date'] = game['date']
        curr_dict['isHome'] = game['isHome']
        curr_dict['isWin'] = game['isWin']
        curr_dict['isOT'] = game['isOT']
        curr_dict['team'] = game['team']['name']
        curr_dict['opponent'] = game['opponent']['name']
        
        df_list_dict.append(curr_dict)
    df = pd.DataFrame.from_dict(df_list_dict)
    print(df)

for player in Rangers:  
    callapi(player)
    print(callapi)

打印出来后,我可以看到创建的所有数据框。我不能使用 curr_dict[] 根据列表位置(玩家 ID)添加列,因为必须是切片或整数,而不是字符串。

我希望做的是制作一个数据帧,其中统计数据由玩家 ID 列标识。

我的 python 知识非常分散,我觉得随着我取得的进步,我应该知道如何完成这项工作,但我只是碰壁了。任何帮助将不胜感激。

Python Pandas DataFrame 循环数据 操作

评论


答:

0赞 Jason Baker 1/24/2023 #1

您可以使用 concurrent.futures 在将请求全部连接在一起之前并行化请求,并使用 json_normalize 来解析 json。

import concurrent.futures
import json
import os

import pandas as pd
import requests


class Scrape:
    def main(self) -> pd.DataFrame:
        rangers = ["8478550", "8476459", "8479323", "8476389", "8475184", "8480817", "8480078",
                   "8476624", "8481554", "8482109", "8476918", "8476885", "8479324", "8482073",
                   "8479328", "8480833", "8478104", "8477846", "8477380", "8477380", "8477433",
                   "8479333", "8479991"]

        with concurrent.futures.ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
            return pd.concat(executor.map(self.get_stats, rangers)).reset_index(drop=True).fillna(0)

    @staticmethod
    def get_stats(player: str) -> pd.DataFrame:
        url = f"https://statsapi.web.nhl.com/api/v1/people/{player}/stats?stats=gameLog"

        with requests.Session() as request:
            response = request.get(url, timeout=30)
        if response.status_code != 200:
            print(response.raise_for_status())

        data = json.loads(response.text)

        df = (pd.
              json_normalize(data=data, record_path=["stats", "splits"])
              .rename(columns={"team.id": "team_id", "team.name": "team_name",
                               "opponent.id": "opponent_id", "opponent.name": "opponent_name"})
              ).assign(player_id=player)

        df = df[df.columns.drop(list(df.filter(regex="link|gamePk")))]
        df.columns = df.columns.str.split(".").str[-1]

        if "faceOffPct" not in df.columns:
            df["faceOffPct"] = 0

        return df


if __name__ == "__main__":
    stats = Scrape().main()
    print(stats)