提问人:Tez 提问时间:12/12/2022 更新时间:12/12/2022 访问量:38
数据帧中的嵌套 JSON 提取
nested JSON extraction in dataframe
问:
我已经设法将下面的 json 转换为数据帧,但嵌套列表更复杂,即阵容中的列表,包括球员事件、球员 ID 和球衣号码。它们不容易与其他数据合并,因为它们也是嵌套的,并且所有这些数据都嵌套在数据帧中的一列中。
例如,我设法将所有数据提取到一个带有列的数据帧中,但行将所有嵌套数据包含在一个单元格中。我正在努力弄清楚如何取消嵌套所有数据,以便它仍然反映在与之关联的匹配项的数据帧中的一行中。
{
"data": {
"GoalCount_2hg": 1,
"HTGoalCount": 2,
"attacks_recorded": 1,
"attendance": 31830,
"avg_potential": 3.04,
"awayGoalCount": 1,
"awayGoals": [
"85"
],
"awayID": 218,
"away_image": "teams/england-brentford-fc.png",
"away_name": "Brentford",
"away_ppg": 1.16,
"away_url": "/clubs/brentford-fc-218",
"homeID": 108,
"home_image": "teams/england-leicester-city-fc.png",
"home_name": "Leicester City",
"home_ppg": 1.79,
"home_url": "/clubs/leicester-city-fc-108",
"ht_goals_team_a": 2,
"ht_goals_team_b": 0,
"id": 1308560,
"lineups": {
"team_a": [
{
"player_events": [],
"player_id": 3212,
"shirt_number": 1
},
{
"player_events": [],
"player_id": 3219,
"shirt_number": 18
},
{
"player_events": [
{
"event_time": "20",
"event_type": "Goal"
}
],
"player_id": 21538,
"shirt_number": 27
},
{
谢谢
答:
0赞
Jason Baker
12/12/2022
#1
使用 json_normalize 进行解析,使用 functools 进行合并:
from functools import reduce
import pandas as pd
main = pd.json_normalize(data=data["data"]).drop(columns=["lineups.team_a", "lineups.team_b"]).explode("awayGoals")
team_a = pd.json_normalize(
data=data["data"],
meta=["id", ["lineups", "team_a", "player_id"], ["lineups", "team_a", "shirt_number"]],
record_path=["lineups", "team_a", "player_events"]
)
team_b = pd.json_normalize(
data=data["data"],
meta=["id", ["lineups", "team_b", "player_id"], ["lineups", "team_b", "shirt_number"]],
record_path=["lineups", "team_b", "player_events"]
)
dfs = [main, team_a, team_b]
df_final = reduce(lambda left, right: pd.merge(left, right, on="id"), dfs)
df_final.columns = df_final.columns.str.split(pat=".", n=1).str[-1]
print(df_final)
评论