提问人:gondev 提问时间:11/5/2023 更新时间:11/5/2023 访问量:35
从具有多个对象的 pandas 中的 JSON 文件中读取数据
Reading Data from a JSON File in pandas with multiple objects
问:
我正在使用 ACN Data 中的 JSON 文件进行 EV 充电行为。我想在 python 中读取它并将其转换为 pandas 数据帧。问题是JSON有多个对象,我面临着一些困难,很难在表行中汇总所有数据,包括userInputs。
下面的 JSON:
{
"_meta":
{
"end": "Sat, 01 Jan 2022 08:00:00 GMT",
"min_kWh": null,
"site": "caltech",
"start": "Mon, 01 Jan 2018 08:00:00 GMT"
},
"_items": [
{
"_id": "5bc90cb9f9af8b0d7fe77cd2",
"clusterID": "0039",
"connectionTime": "Wed, 25 Apr 2018 11:08:04 GMT",
"disconnectTime": "Wed, 25 Apr 2018 13:20:10 GMT",
"doneChargingTime": "Wed, 25 Apr 2018 13:21:10 GMT",
"kWhDelivered": 7.932,
"sessionID": "2_39_78_362_2018-04-25 11:08:04.400812",
"siteID": "0002",
"spaceID": "CA-496",
"stationID": "2-39-78-362",
"timezone": "America/Los_Angeles",
"userID": null,
"userInputs": null
},
{
"_id": "5ca2ad12f9af8b68e0cb5d47",
"clusterID": "0039",
"connectionTime": "Sat, 16 Mar 2019 14:39:41 GMT",
"disconnectTime": "Sat, 16 Mar 2019 18:39:04 GMT",
"doneChargingTime": "Sat, 16 Mar 2019 18:25:28 GMT",
"kWhDelivered": 24.804,
"sessionID": "2_39_124_22_2019-03-16 14:39:40.648349",
"siteID": "0002",
"spaceID": "CA-312",
"stationID": "2-39-124-22",
"timezone": "America/Los_Angeles",
"userID": "000001039",
"userInputs": [
{
"WhPerMile": 271,
"kWhRequested": 65.04,
"milesRequested": 240,
"minutesAvailable": 203,
"modifiedAt": "Sat, 16 Mar 2019 14:40:30 GMT",
"paymentRequired": true,
"requestedDeparture": "Sat, 16 Mar 2019 18:02:41 GMT",
"userID": 1039
}
]
}
]}
我试过只阅读“_items”,它奏效了。但是我仍然无法创建包含所有用户输入数据的行。
Python代码如下:
import json
import pandas as pd
data = json.load(open("content.json"))
acn_ev_set_json = pd.DataFrame(data)
acn_ev_set_json.tail(3)
一些对我有帮助的链接: 数据集:https://ev.caltech.edu/dataset 文章 数据集:https://ev.caltech.edu/assets/pub/ACN_Data_Analysis_and_Applications.pdf stackoverflow 问题:将 JSON 读取到 pandas 数据帧 - ValueError:将字典与非 Series 混合可能会导致排序不明确
答:
0赞
gtomer
11/5/2023
#1
试试这个:
pd.concat([acn_ev_set_json.drop(['userInputs'], axis=1), acn_ev_set_json['userInputs'].explode().apply(pd.Series)], axis=1)
评论