使用具有不同行数和列数的数据透视表

use pivot table with number of different rows and columns

提问人:Megan 提问时间:2/22/2023 最后编辑:YnjxsjmhMegan 更新时间:2/22/2023 访问量:38

问:

嗨,我有一个 json 文件,其中包含几列问题,有些是相同的,有些则与以下 json 文件中的变化相同

apend = [
    {
        "first_name": "Raúl Pedro",
        "last_name": "Moreno Zavaleta",
        "email": "[email protected]",
        "custom_questions": [
            {
                "title": "a",
                "value": "si"
            },
            {
                "title": "b",
                "value": "no"
            },
            {
                "title": "c",
                "value": "001"
            } 
        ],
        "status": "approved",
        "create_time": "2023-02-18T17:25:30Z"
    },
    {
        
        "first_name": "Milagritos",
        "last_name": "Canales Lora",
        "email": "[email protected]",
        "custom_questions": [
            {
                "title": "a",
                "value": "no"
            },
            {
                "title": "b",
                "value": "si"
            }
                        
        ],
        "status": "approved",
        "create_time": "2023-02-21T23:07:24Z",

    },
    {
            
        "first_name": "Eliza",
        "last_name": "Carbajal Leon",
        "email": "[email protected]",
        "custom_questions": [
            {
                "title": "a",
                "value": "no"
            },
            {
                "title": "e",
                "value": "identiti"
            }
                        
        ],
        "status": "approved",
        "create_time": "2023-02-21T23:07:24Z",

    }
]

我应用了以下代码来规范化数据

pp1 = pd.json_normalize(apend)
pp = pd.DataFrame.from_dict(np.concatenate(pp1\['custom_questions'\]).tolist())
crear = pd.pivot_table(pp, values='value',columns='title', aggfunc= list).reset_index()
crear = (crear.apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('index', 1))
ee = crear.drop(\["level_0", "level_1"\], axis=1).reset_index(drop=True)
unir = pd.merge(pp1,ee, how = "outer", left_index = True, right_index = True)
unir = unir.drop(\['custom_questions'\], axis = 1)

我得到这样的输出

enter image description here

但是我丢失了第三个用户的信息。列 e 列列为第一个用户的信息。我需要在利用或使用枢轴时,可以正确排序信息。

输出应该是这样的

enter image description here

python json pandas 嵌套 嵌套列表

评论


答:

1赞 Jason Baker 2/22/2023 #1

链接 json.normalize() 和 pivot():

meta_columns = ["first_name", "last_name", "email", "status", "create_time"]

df = pd.json_normalize(
    data=apend,
    meta=meta_columns,
    record_path="custom_questions"
).pivot(
    index=meta_columns,
    columns="title",
    values="value"
).reset_index().rename_axis(None, axis=1)

print(df)

输出:

   first_name        last_name                      email    status           create_time   a    b    c         e
0       Eliza    Carbajal Leon  [email protected]  approved  2023-02-21T23:07:24Z  no  NaN  NaN  identiti
1  Milagritos     Canales Lora  [email protected]  approved  2023-02-21T23:07:24Z  no   si  NaN       NaN
2  Raúl Pedro  Moreno Zavaleta    [email protected]  approved  2023-02-18T17:25:30Z  si   no  001       NaN

评论

0赞 Megan 2/23/2023
我用过它,但收到此错误 TypeError:字符串索引必须是整数
0赞 Jason Baker 2/23/2023
更具体地说是哪一行导致了该错误。这适用于您的示例数据,也许是您的真实数据导致了它