提问人:ahmad 提问时间:11/17/2023 最后编辑:ahmad 更新时间:11/18/2023 访问量:42
格式问题:使用主对象中嵌入的多个 JSON 对象将 JSON 转换为 CSV
Issues with Formatting: Converting JSON into CSV using multiple JSON objects embedded with in the main object
问:
在将JSON对象转换为CSV时,我在格式方面遇到了一些问题
我的 json 文件是 json 对象的列表,它们都共享相同的主键名称,除了 Dict2 有些可能没有它,Dict1 也可能不是字典类型,也可能包括其他字典。输出将是所有这些对象的 csv 文件,尽管它们具有参数
我的 Json 对象是:
[{
"Name": "NameOfObject1",
"Number": "objectNumber1",
"Dict1": {
"Key1": "true",
"Key2": "false",
"Key3": {
"key1InK3": "Hello",
"key2InK3": "There",
}
},
"BooleanValue": false,
},
{
"Name": "NameOfObject2",
"Number": "objectNumber2",
"Dict1": {
"Key1": "true",
"Key2": "false",
"Key3": "true"
},
"BooleanValue": false,
"Dict2":{
"otherKey1" : "this ",
"otherKey2" : "is ",
"otherKey3" : "a ",
"otherKey4" : "test ",
"otherKey5" : "haha ",
"otherKey6" : "i ",
"otherKey7" : "like ",
"otherKey8" : "strawberry "
}
},
{
"Name": "NameOfObject3",
"Number": "objectNumber",
"Dict1": "this is a string",
"BooleanValue": false
}
]
我的输出如下所示:
Name Number Dic1 Dic1 Values Dic1 Values Of dictionaries Inside Boolean Value Dic2 Dic2 Values
NameOfObject1 objectNumber1 FALSE
NameOfObject1 objectNumber1 Key1 TRUE FALSE
NameOfObject1 objectNumber1 Key2 FALSE
NameOfObject1 objectNumber1 Key2 key1InK2 FALSE
NameOfObject1 objectNumber1 Key2 key1InK2 Hello FALSE
NameOfObject1 objectNumber1 Key2 key2InK2 FALSE
NameOfObject1 objectNumber1 Key2 key2InK2 There FALSE
NameOfObject2 objectNumber2 TRUE
NameOfObject1 objectNumber1 Key3 FALSE
NameOfObject1 objectNumber1 Key3 key1InK3 FALSE
NameOfObject1 objectNumber1 Key3 key1InK3 Good FALSE
NameOfObject1 objectNumber1 Key3 key2InK3 FALSE
NameOfObject1 objectNumber1 Key3 key2InK3 Morining FALSE
NameOfObject2 objectNumber2 Key1 TRUE TRUE
NameOfObject2 objectNumber2 Key2 FALSE TRUE
NameOfObject2 objectNumber2 Key3 TRUE TRUE
NameOfObject2 objectNumber2 TRUE otherKey1 This
NameOfObject2 objectNumber2 TRUE otherKey2 is
NameOfObject2 objectNumber2 TRUE otherKey3 a
NameOfObject2 objectNumber2 TRUE otherKey4 test
NameOfObject2 objectNumber2 TRUE otherKey5 haha
NameOfObject2 objectNumber2 TRUE otherKey6 i
NameOfObject2 objectNumber2 TRUE otherKey7 like
NameOfObject2 objectNumber2 TRUE otherKey8 strawberry
NameOfObject3 objectNumber3 This is a String FALSE
我希望它看起来像这样:注意最后 2 列是如何上移的,而对象 2 的最后 3 行是如何被删除的。还要注意,对象 2 中 Dic1 值为空的所有行也被删除
Name Number Dic1 Dic1 Values Dic1 Values Of dictionaries Inside Boolean Value Dic2 Dic2 Values
NameOfObject1 objectNumber1 FALSE
NameOfObject1 objectNumber1 Key1 TRUE FALSE
NameOfObject1 objectNumber1 Key2 key1InK2 Hello FALSE
NameOfObject1 objectNumber1 Key2 key2InK2 There FALSE
NameOfObject2 objectNumber2 TRUE
NameOfObject1 objectNumber1 Key3 key1InK3 Good FALSE
NameOfObject1 objectNumber1 Key3 key2InK3 Morining FALSE
NameOfObject2 objectNumber2 TRUE TRUE
NameOfObject2 objectNumber2 Key1 TRUE TRUE otherKey1 this
NameOfObject2 objectNumber2 Key2 FALSE TRUE otherKey2 is
NameOfObject2 objectNumber2 Key3 TRUE TRUE otherKey3 a
NameOfObject2 objectNumber2 TRUE otherKey4 test
NameOfObject2 objectNumber2 TRUE otherKey5 haha
NameOfObject2 objectNumber2 TRUE otherKey6 i
NameOfObject2 objectNumber2 TRUE otherKey7 like
NameOfObject2 objectNumber2 TRUE otherKey8 strawberry
NameOfObject2 objectNumber2 TRUE otherKey i
NameOfObject2 objectNumber2 TRUE otherKey7 like
NameOfObject2 objectNumber2 TRUE otherKey8 strawberry
NameOfObject3 objectNumber3 This is a String FALSE
我不知道该怎么做,我看不出背后的逻辑。我不想重写整个代码。 代码太长,无法在此处发布,但这是其背后的逻辑
ListOfLines = []
Loop Through Dictionary keys file
if key is dictionary:
append a new line and recursively call the function
else: (key is not dictionary)
add the keys and values to a line
Formats all the lines and cleans up the list then writes to the output file
您会看到,在 if 语句中,创建了一个新行,因此字典输出始终位于下一行。当只有 1 个字典时,这工作正常,但有 2 个字典,我得到了上面所示的输出,这还不错,但也不好。除非绝对必要,否则我不想更改代码。
我的思路是简单地使用 python 将单元格组向上移动几行,但我不知道该怎么做。即使我知道我不能简单地将其向上移动 3 行,因为其他文件在 Dict1 中可能有 5-7 个键。
总之,我的问题是: 如何根据这个逻辑使用 python 向上移动一组单元格,我想不出可以放入代码的方法? 对此有什么帮助吗? 提前致谢
答:
您可以尝试:pd.json_normalize(..., max_level=0)
import pandas as pd
import json
data = json.loads(jdata) # jdata is your JSON as string
df = pd.json_normalize(data, max_level=0)
# Extract dict columns
dcols = [k for k, v in data.items() if isinstance(v, dict)]
# Explode dict as DataFrame
dfs = []
for col in dcols:
dd = (pd.json_normalize(df.pop(col)).squeeze()
.rename_axis(col).rename(f'{col} Values')
.reset_index())
dfs.append(dd)
# Create the final dataframe
out = df.merge(pd.concat(dfs, axis=1), how='cross')
输出:
>>> out
Name Number BooleanValue Dict1 Dict1 Values Dict2 Dict2 Values
0 NameOfObject objectNumber False Key1 true otherKey1 this
1 NameOfObject objectNumber False Key2 false otherKey2 is
2 NameOfObject objectNumber False Key3 true otherKey3 a
3 NameOfObject objectNumber False NaN NaN otherKey4 test
4 NameOfObject objectNumber False NaN NaN otherKey5 haha
5 NameOfObject objectNumber False NaN NaN otherKey6 i
6 NameOfObject objectNumber False NaN NaN otherKey7 like
7 NameOfObject objectNumber False NaN NaN otherKey8 strawberry
>>> df
Name Number Dict1 BooleanValue Dict2
0 NameOfObject objectNumber {'Key1': 'true', 'Key2': 'false', 'Key3': 'true'} False {'otherKey1': 'this ', 'otherKey2': 'is ', 'ot...
输入:
jdata = """
{
"Name": "NameOfObject",
"Number": "objectNumber",
"Dict1": {
"Key1": "true",
"Key2": "false",
"Key3": "true"
},
"BooleanValue": false,
"Dict2":{
"otherKey1" : "this ",
"otherKey2" : "is ",
"otherKey3" : "a ",
"otherKey4" : "test ",
"otherKey5" : "haha ",
"otherKey6" : "i ",
"otherKey7" : "like ",
"otherKey8" : "strawberry "
}
}
"""
评论
example:{"Key":"value"}
pd.json_normalize(df.pop(col)).squeeze().rename_axis(col).rename(f'{col} Values').reset_index()
我以 Corraline 的答案为基础,让它按照我想要的方式工作
import pandas as pd
import json
with open('file.json','r') as f:
fileStuff = json.load(f)
for obj in fileStuff:
jdata=json.dump(obj)
data = json.loads(jdata) # jdata is your JSON as string
df = pd.json_normalize(data, max_level=0)
# Extract dict columns
dcols = [k for k, v in data.items() if isinstance(v, dict)]
# Explode dict as DataFrame
dfs = []
for col in dcols:
dd = (pd.json_normalize(df.pop(col)).squeeze()
.rename_axis(col).rename(f'{col} Values')
.reset_index())
dfs.append(dd)
# Create the final dataframe
if len(dcols)>0: #concat throws an exception when it a basic json object
df = df.merge(pd.concat(dfs, axis=1), how='cross')
df.to_csv('out.csv', 'a',index=False)
之后,您就拥有一个包含所有 json 对象的 csv 文件。该文件需要一些清理,但它实际上只是处理基本的 pandas 数据帧,并取决于您的情况
评论