提问人:Aakash 提问时间:9/21/2023 更新时间:9/21/2023 访问量:37
Python Pandas - 来自嵌套 JSON 的数据帧
Python Pandas- Dataframe from Nested JSON
问:
我正在尝试规范化 pandas 中的嵌套 JSON 文件。
我无法在开始时获取ab_id列,如当前输出屏幕截图中所示。
此外,由于如果我从代码中删除记录前缀,我会看到一个错误,如果我添加它,它会生成几列空。
当前使用的代码:
df=pd.json_normalize(data=response['val'],record_path=['activity'],meta=['msn','iis','ica','iada'],errors='ignore', record_prefix='_')
JSON文件:
{
"id":"ijewiofn23441312223",
"val":[
{
"ab_id":"ab_123",
"activity":[
{
"msn":"acpfile_source_conn",
"iia":true,
"ica":false,
"iada":false
},
{
"msn":"adefile_source_conn",
"iia":true,
"ica":false,
"iada":false
}
}
]
},
{
"ab_id":"ab_421",
"activity":[
{
"msn":"adbfile_source_conn",
"iia":true,
"ica":true,
"iada":false
},
{
"msn":"aile_source_conn",
"iia":true,
"ica":false,
"iada":false
}
}
]
}
]
}
有人可以帮忙吗?
提前非常感谢。
答:
2赞
Сервер Чауш
9/21/2023
#1
import pandas as pd
from pandas import json_normalize
data = {
"id": "ijewiofn23441312223",
"val": [
{
"ab_id": "ab_123",
"activity": [
{
"msn": "acpfile_source_conn",
"iia": True,
"ica": False,
"iada": False
},
{
"msn": "adefile_source_conn",
"iia": True,
"ica": False,
"iada": False
}
]
},
{
"ab_id": "ab_421",
"activity": [
{
"msn": "adbfile_source_conn",
"iia": True,
"ica": True,
"iada": False
},
{
"msn": "aile_source_conn",
"iia": True,
"ica": False,
"iada": False
}
]
}
]
}
df = pd.json_normalize(data['val'], record_path=['activity'], meta=['ab_id'])
df = df[['ab_id'] + [col for col in df.columns if col != 'ab_id']]
df.columns = ['id'] + df.columns[1:].tolist()
print(df)
1赞
Andrej Kesely
9/21/2023
#2
您可以尝试使用模块并手动构造 DataFrame:json
import json
with open("data.json", "r") as f_in:
data = json.load(f_in)
df = pd.DataFrame(
[{"ab_id": v["ab_id"], **a} for v in data["val"] for a in v["activity"]]
)
print(df)
指纹:
ab_id msn iia ica iada
0 ab_123 acpfile_source_conn True False False
1 ab_123 adefile_source_conn True False False
2 ab_421 adbfile_source_conn True True False
3 ab_421 aile_source_conn True False False
内容:data.json
{
"id":"ijewiofn23441312223",
"val":[
{
"ab_id":"ab_123",
"activity":[
{
"msn":"acpfile_source_conn",
"iia":true,
"ica":false,
"iada":false
},
{
"msn":"adefile_source_conn",
"iia":true,
"ica":false,
"iada":false
}
]
},
{
"ab_id":"ab_421",
"activity":[
{
"msn":"adbfile_source_conn",
"iia":true,
"ica":true,
"iada":false
},
{
"msn":"aile_source_conn",
"iia":true,
"ica":false,
"iada":false
}
]
}
]
}
1赞
Timeless
9/21/2023
#3
你的 json 示例中有两个额外的内容。此外,这个与 I/O 映像不匹配。}
但是您仍然可以尝试以下方法:
df = pd.json_normalize(response["val"], "activity", "ab_id")
# if columns-order is important
df = df[np.roll(df.columns, 1)]
输出:
print(df)
ab_id msn iia ica iada
0 ab_123 acpfile_source_conn True False False
1 ab_123 adefile_source_conn True False False
2 ab_421 adbfile_source_conn True True False
3 ab_421 aile_source_conn True False False
评论