提问人:Megan 提问时间:2/22/2023 最后编辑:YnjxsjmhMegan 更新时间:2/22/2023 访问量:38
使用具有不同行数和列数的数据透视表
use pivot table with number of different rows and columns
问:
嗨,我有一个 json 文件,其中包含几列问题,有些是相同的,有些则与以下 json 文件中的变化相同
apend = [
{
"first_name": "Raúl Pedro",
"last_name": "Moreno Zavaleta",
"email": "[email protected]",
"custom_questions": [
{
"title": "a",
"value": "si"
},
{
"title": "b",
"value": "no"
},
{
"title": "c",
"value": "001"
}
],
"status": "approved",
"create_time": "2023-02-18T17:25:30Z"
},
{
"first_name": "Milagritos",
"last_name": "Canales Lora",
"email": "[email protected]",
"custom_questions": [
{
"title": "a",
"value": "no"
},
{
"title": "b",
"value": "si"
}
],
"status": "approved",
"create_time": "2023-02-21T23:07:24Z",
},
{
"first_name": "Eliza",
"last_name": "Carbajal Leon",
"email": "[email protected]",
"custom_questions": [
{
"title": "a",
"value": "no"
},
{
"title": "e",
"value": "identiti"
}
],
"status": "approved",
"create_time": "2023-02-21T23:07:24Z",
}
]
我应用了以下代码来规范化数据
pp1 = pd.json_normalize(apend)
pp = pd.DataFrame.from_dict(np.concatenate(pp1\['custom_questions'\]).tolist())
crear = pd.pivot_table(pp, values='value',columns='title', aggfunc= list).reset_index()
crear = (crear.apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('index', 1))
ee = crear.drop(\["level_0", "level_1"\], axis=1).reset_index(drop=True)
unir = pd.merge(pp1,ee, how = "outer", left_index = True, right_index = True)
unir = unir.drop(\['custom_questions'\], axis = 1)
我得到这样的输出
但是我丢失了第三个用户的信息。列 e 列列为第一个用户的信息。我需要在利用或使用枢轴时,可以正确排序信息。
输出应该是这样的
答:
1赞
Jason Baker
2/22/2023
#1
链接 json.normalize() 和 pivot():
meta_columns = ["first_name", "last_name", "email", "status", "create_time"]
df = pd.json_normalize(
data=apend,
meta=meta_columns,
record_path="custom_questions"
).pivot(
index=meta_columns,
columns="title",
values="value"
).reset_index().rename_axis(None, axis=1)
print(df)
输出:
first_name last_name email status create_time a b c e
0 Eliza Carbajal Leon [email protected] approved 2023-02-21T23:07:24Z no NaN NaN identiti
1 Milagritos Canales Lora [email protected] approved 2023-02-21T23:07:24Z no si NaN NaN
2 Raúl Pedro Moreno Zavaleta [email protected] approved 2023-02-18T17:25:30Z si no 001 NaN
评论
0赞
Megan
2/23/2023
我用过它,但收到此错误 TypeError:字符串索引必须是整数
0赞
Jason Baker
2/23/2023
更具体地说是哪一行导致了该错误。这适用于您的示例数据,也许是您的真实数据导致了它
下一个:如何从数据框列表中删除行
评论