json文件的扁平化

Flattening of json file

提问人:Nano 提问时间:11/3/2023 最后编辑:Nano 更新时间:11/4/2023 访问量:72

问:

我在将 json 文件转换为数据帧时遇到问题。json 文件结构如下:

"results": [
    {
      "submissions": [
        {
          "submission_type": "SUPPL",
          "submission_number": "26",
          "submission_status": "AP",
          "submission_status_date": "20110902",
          "submission_class_code": "LABELING",
          "submission_class_code_description": "Labeling",
          "application_docs": [
            {
              "id": "39507",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/076175s026lbl.pdf",
              "date": "20120516",
              "type": "Label"
            }
          ]
        },
        {
          "submission_type": "SUPPL",
          "submission_number": "30",
          "submission_status": "AP",
          "submission_status_date": "20130726",
          "review_priority": "STANDARD",
          "submission_class_code": "LABELING",
          "submission_class_code_description": "Labeling",
          "application_docs": [
            {
              "id": "39508",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2013/076175s030lbl.pdf",
              "date": "20130729",
              "type": "Label"
            }
          ]
        },
        {
          "submission_type": "ORIG",
          "submission_number": "1",
          "submission_status": "AP",
          "submission_status_date": "20020220",
          "application_docs": [
            {
              "id": "18441",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2002/76175_Mefloquine Hydrochloride_Prntlbl.pdf",
              "date": "20031224",
              "type": "Label"
            },
            {
              "id": "22542",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/anda/2002/076175_mefloquine_toc.cfm",
              "date": "20030804",
              "type": "Review"
            },
            {
              "id": "31095",
              "url": "http://www.accessdata.fda.gov/drugsatfda_docs/appletter/2002/76175.ap.pdf",
              "date": "20030411",
              "type": "Letter"
            }
          ]
        },
        {
     
      ],
      "application_number": "ANDA076175",
      "sponsor_name": "SANDOZ",`your text`
      "products": [
        {
          "product_number": "001",
          "reference_drug": "No",
          "brand_name": "MEFLOQUINE HYDROCHLORIDE",
          "active_ingredients": [
            {
              "name": "MEFLOQUINE HYDROCHLORIDE",
              "strength": "250MG"
            }
          ],
          "reference_standard": "No",
          "dosage_form": "TABLET",
          "route": "ORAL",
          "marketing_status": "Discontinued"
        }
      ]
    },

到目前为止,我编写的代码是:

   **df_flattened = pd.json_normalize(data=rawData['results'])

     df_flattened.tail()**

然后为了进一步规范化数据,我正在尝试在提交和产品列上执行此操作:

**df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name'])

df_submissions.head()**

但我得到错误说:

TypeError 回溯(最近一次调用,最后一次) 在 ----> 1 df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name']) 2 df_submissions.head() 3 TypeError:列表索引必须是整数或切片,而不是 str

我无法转换提交和产品中的嵌套字典列表。提交列和产品列是带有字典的嵌套列表json_normalize不适用于它们。我尝试使用它,但出现错误。如何将此json文件转换为dataframe?

对此的任何意见都会有所帮助

python json pandas 数据帧 json-规范化

评论

0赞 JonSG 11/3/2023
请不要发布代码图片。他们很难合作,在某种程度上,你要求人们从头开始重新输入你的代码,以试图帮助你。你能指出你希望加载这个json的最终结果是什么吗?您是否尝试过参数和?json_normalize()record_pathmeta
1赞 Nano 11/3/2023
亲爱的乔恩,对不起,我是新手,所以我要记住。我不是要求重新输入所有内容。我想知道是否有办法将提交广告产品列中的数据展平。我使用了 record_path 和 meta,但它不起作用。我收到错误“列表索引应该是整数而不是 str”,我怀疑这是因为提交和产品是字典列表,因此出现错误。我想要一种方法,以便我可以进一步将这些列展平到数据框中。谢谢!
0赞 JonSG 11/3/2023
不用担心:-)您能否为我们编辑问题并将您的代码粘贴到代码块中?

答:

0赞 Chinmay T 11/4/2023 #1

我了解了这个库,它减少了大量代码,并将嵌套的 json 对象扁平化到最低级别并制作单独的列。欲了解更多信息,请点击这里

#https://pypi.org/project/flatten-json/
from flatten_json import flatten
with open('.\\Results.Json') as json_data:
    data = json.load(json_data)

print(data)

dic_flattened = [flatten(d) for d in data['results']]     
df = pd.DataFrame(dic_flattened)

print(df)

输出:

enter image description here