提问人:Nano 提问时间:11/3/2023 最后编辑:Nano 更新时间:11/4/2023 访问量:72
json文件的扁平化
Flattening of json file
问:
我在将 json 文件转换为数据帧时遇到问题。json 文件结构如下:
"results": [
{
"submissions": [
{
"submission_type": "SUPPL",
"submission_number": "26",
"submission_status": "AP",
"submission_status_date": "20110902",
"submission_class_code": "LABELING",
"submission_class_code_description": "Labeling",
"application_docs": [
{
"id": "39507",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/076175s026lbl.pdf",
"date": "20120516",
"type": "Label"
}
]
},
{
"submission_type": "SUPPL",
"submission_number": "30",
"submission_status": "AP",
"submission_status_date": "20130726",
"review_priority": "STANDARD",
"submission_class_code": "LABELING",
"submission_class_code_description": "Labeling",
"application_docs": [
{
"id": "39508",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2013/076175s030lbl.pdf",
"date": "20130729",
"type": "Label"
}
]
},
{
"submission_type": "ORIG",
"submission_number": "1",
"submission_status": "AP",
"submission_status_date": "20020220",
"application_docs": [
{
"id": "18441",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/label/2002/76175_Mefloquine Hydrochloride_Prntlbl.pdf",
"date": "20031224",
"type": "Label"
},
{
"id": "22542",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/anda/2002/076175_mefloquine_toc.cfm",
"date": "20030804",
"type": "Review"
},
{
"id": "31095",
"url": "http://www.accessdata.fda.gov/drugsatfda_docs/appletter/2002/76175.ap.pdf",
"date": "20030411",
"type": "Letter"
}
]
},
{
],
"application_number": "ANDA076175",
"sponsor_name": "SANDOZ",`your text`
"products": [
{
"product_number": "001",
"reference_drug": "No",
"brand_name": "MEFLOQUINE HYDROCHLORIDE",
"active_ingredients": [
{
"name": "MEFLOQUINE HYDROCHLORIDE",
"strength": "250MG"
}
],
"reference_standard": "No",
"dosage_form": "TABLET",
"route": "ORAL",
"marketing_status": "Discontinued"
}
]
},
到目前为止,我编写的代码是:
**df_flattened = pd.json_normalize(data=rawData['results'])
df_flattened.tail()**
然后为了进一步规范化数据,我正在尝试在提交和产品列上执行此操作:
**df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name'])
df_submissions.head()**
但我得到错误说:
TypeError 回溯(最近一次调用,最后一次) 在 ----> 1 df_submissions = pd.json_normalize(rawData, record_path = rawData['results']['submissions'], meta = ['application_name', 'sponsor_name']) 2 df_submissions.head() 3 TypeError:列表索引必须是整数或切片,而不是 str
我无法转换提交和产品中的嵌套字典列表。提交列和产品列是带有字典的嵌套列表json_normalize不适用于它们。我尝试使用它,但出现错误。如何将此json文件转换为dataframe?
对此的任何意见都会有所帮助
答:
0赞
Chinmay T
11/4/2023
#1
我了解了这个库,它减少了大量代码,并将嵌套的 json 对象扁平化到最低级别并制作单独的列。欲了解更多信息,请点击这里。
#https://pypi.org/project/flatten-json/
from flatten_json import flatten
with open('.\\Results.Json') as json_data:
data = json.load(json_data)
print(data)
dic_flattened = [flatten(d) for d in data['results']]
df = pd.DataFrame(dic_flattened)
print(df)
输出:
评论
json_normalize()
record_path
meta