提问人:cvel 提问时间:1/24/2023 更新时间:1/24/2023 访问量:49
根据循环中的列表内容将列添加到数据框?-蟒
Adding Column to data frame based on list content in a loop? - Python
问:
我正在从 NHL API 中提取数据,以获取基于单个比赛的球员统计数据。我正在尝试创建一个循环来调用数据,解析 JSON,创建一个字典,然后我可以从中为整个团队创建一个数据帧。循环前的代码如下所示:
API_URL = "https://statsapi.web.nhl.com/api/v1"
response = requests.get(API_URL + "/people/8477956/stats?stats=gameLog", params={"Content-Type": "application/json"})
data = json.loads(response.text)
df_list_dict = []
for game in data['stats'][0]['splits']:
curr_dict = game['stat']
curr_dict['date'] = game['date']
curr_dict['isHome'] = game['isHome']
curr_dict['isWin'] = game['isWin']
curr_dict['isOT'] = game['isOT']
curr_dict['team'] = game['team']['name']
curr_dict['opponent'] = game['opponent']['name']
df_list_dict.append(curr_dict)
df = pd.DataFrame.from_dict(df_list_dict)
print(df)
这为我提供了一个单人玩家的可理解数据帧。(/人/{玩家}/....
我想遍历一个列表(该列表是一支 NHL 球队),同时添加一个列来标识球员并连接创建的数据帧。到目前为止,我的尝试如下所示:
import requests
import json
import pandas as pd
Rangers = ['8478550', '8476459', '8479323', '8476389', '8475184', '8480817', '8480078', '8476624', '8481554', '8482109', '8476918', '8476885', '8479324',
'8482073', '8479328', '8480833', '8478104', '8477846', '8477380', '8477380', '8477433', '8479333', '8479991']
def callapi(player):
response = (requests.get(f'https://statsapi.web.nhl.com/api/v1/people/{player}/stats?stats=gameLog', params={"Content-Type": "application/json"}))
data = json.loads(response.text)
df_list_dict = []
for game in data['stats'][0]['splits']:
curr_dict = game['stat']
curr_dict['date'] = game['date']
curr_dict['isHome'] = game['isHome']
curr_dict['isWin'] = game['isWin']
curr_dict['isOT'] = game['isOT']
curr_dict['team'] = game['team']['name']
curr_dict['opponent'] = game['opponent']['name']
df_list_dict.append(curr_dict)
df = pd.DataFrame.from_dict(df_list_dict)
print(df)
for player in Rangers:
callapi(player)
print(callapi)
打印出来后,我可以看到创建的所有数据框。我不能使用 curr_dict[] 根据列表位置(玩家 ID)添加列,因为必须是切片或整数,而不是字符串。
我希望做的是制作一个数据帧,其中统计数据由玩家 ID 列标识。
我的 python 知识非常分散,我觉得随着我取得的进步,我应该知道如何完成这项工作,但我只是碰壁了。任何帮助将不胜感激。
答:
0赞
Jason Baker
1/24/2023
#1
您可以使用 concurrent.futures 在将请求全部连接在一起之前并行化请求,并使用 json_normalize 来解析 json。
import concurrent.futures
import json
import os
import pandas as pd
import requests
class Scrape:
def main(self) -> pd.DataFrame:
rangers = ["8478550", "8476459", "8479323", "8476389", "8475184", "8480817", "8480078",
"8476624", "8481554", "8482109", "8476918", "8476885", "8479324", "8482073",
"8479328", "8480833", "8478104", "8477846", "8477380", "8477380", "8477433",
"8479333", "8479991"]
with concurrent.futures.ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
return pd.concat(executor.map(self.get_stats, rangers)).reset_index(drop=True).fillna(0)
@staticmethod
def get_stats(player: str) -> pd.DataFrame:
url = f"https://statsapi.web.nhl.com/api/v1/people/{player}/stats?stats=gameLog"
with requests.Session() as request:
response = request.get(url, timeout=30)
if response.status_code != 200:
print(response.raise_for_status())
data = json.loads(response.text)
df = (pd.
json_normalize(data=data, record_path=["stats", "splits"])
.rename(columns={"team.id": "team_id", "team.name": "team_name",
"opponent.id": "opponent_id", "opponent.name": "opponent_name"})
).assign(player_id=player)
df = df[df.columns.drop(list(df.filter(regex="link|gamePk")))]
df.columns = df.columns.str.split(".").str[-1]
if "faceOffPct" not in df.columns:
df["faceOffPct"] = 0
return df
if __name__ == "__main__":
stats = Scrape().main()
print(stats)
上一个:根据自定义字典对数据帧进行排序
下一个:R 中多列的 If 语句
评论