Python Pandas - 来自嵌套 JSON 的数据帧

Python Pandas- Dataframe from Nested JSON

提问人:Aakash 提问时间:9/21/2023 更新时间:9/21/2023 访问量:37

问:

我正在尝试规范化 pandas 中的嵌套 JSON 文件。
我无法在开始时获取ab_id列,如当前输出屏幕截图中所示。
此外,由于如果我从代码中删除记录前缀,我会看到一个错误,如果我添加它,它会生成几列空。

当前和预期的输出附在下面的屏幕截图中:
enter image description here

当前使用的代码:

df=pd.json_normalize(data=response['val'],record_path=['activity'],meta=['msn','iis','ica','iada'],errors='ignore', record_prefix='_')

JSON文件:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      }
   ]
}

有人可以帮忙吗?
提前非常感谢。

python json pandas 嵌套 规范化

评论


答:

2赞 Сервер Чауш 9/21/2023 #1
import pandas as pd
from pandas import json_normalize

data = {
    "id": "ijewiofn23441312223",
    "val": [
        {
            "ab_id": "ab_123",
            "activity": [
                {
                    "msn": "acpfile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                },
                {
                    "msn": "adefile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        },
        {
            "ab_id": "ab_421",
            "activity": [
                {
                    "msn": "adbfile_source_conn",
                    "iia": True,
                    "ica": True,
                    "iada": False
                },
                {
                    "msn": "aile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        }
    ]
}
df = pd.json_normalize(data['val'], record_path=['activity'], meta=['ab_id'])
df = df[['ab_id'] + [col for col in df.columns if col != 'ab_id']]
df.columns = ['id'] + df.columns[1:].tolist()

print(df)
1赞 Andrej Kesely 9/21/2023 #2

您可以尝试使用模块并手动构造 DataFrame:json

import json

with open("data.json", "r") as f_in:
    data = json.load(f_in)

df = pd.DataFrame(
    [{"ab_id": v["ab_id"], **a} for v in data["val"] for a in v["activity"]]
)

print(df)

指纹:

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False

内容:data.json

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      }
   ]
}
1赞 Timeless 9/21/2023 #3

你的 json 示例中有两个额外的内容。此外,这个与 I/O 映像不匹配。}

但是您仍然可以尝试以下方法:

df = pd.json_normalize(response["val"], "activity", "ab_id")
​
# if columns-order is important
df = df[np.roll(df.columns, 1)]

​ 输出:

print(df)

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False