格式问题:使用主对象中嵌入的多个 JSON 对象将 JSON 转换为 CSV

Issues with Formatting: Converting JSON into CSV using multiple JSON objects embedded with in the main object

提问人:ahmad 提问时间:11/17/2023 最后编辑:ahmad 更新时间:11/18/2023 访问量:42

问:

在将JSON对象转换为CSV时,我在格式方面遇到了一些问题

我的 json 文件是 json 对象的列表,它们都共享相同的主键名称,除了 Dict2 有些可能没有它,Dict1 也可能不是字典类型,也可能包括其他字典。输出将是所有这些对象的 csv 文件,尽管它们具有参数

我的 Json 对象是:

[{
        "Name": "NameOfObject1",
        "Number": "objectNumber1",
        "Dict1": {
            "Key1": "true",
            "Key2": "false",
            "Key3": {
                     "key1InK3": "Hello",
                     "key2InK3": "There",
}
        },
        "BooleanValue": false,
    },
{
        "Name": "NameOfObject2",
        "Number": "objectNumber2",
        "Dict1": {
            "Key1": "true",
            "Key2": "false",
            "Key3": "true"
        },
        "BooleanValue": false,
        "Dict2":{
            "otherKey1" : "this ",
            "otherKey2" : "is ",
            "otherKey3" : "a ",
            "otherKey4" : "test ",
            "otherKey5" : "haha ",
            "otherKey6" : "i ",
            "otherKey7" : "like ",
            "otherKey8" : "strawberry "
        }
    },
{
        "Name": "NameOfObject3",
        "Number": "objectNumber",
        "Dict1": "this is a string",
        "BooleanValue": false
    }
]

我的输出如下所示:

Name           Number        Dic1             Dic1 Values Dic1 Values Of dictionaries Inside Boolean Value Dic2          Dic2 Values   
         
NameOfObject1 objectNumber1                                                                 FALSE                                     
NameOfObject1 objectNumber1 Key1             TRUE                                           FALSE                                     
NameOfObject1 objectNumber1 Key2                                                            FALSE                                     
NameOfObject1 objectNumber1 Key2             key1InK2                                       FALSE                                     
NameOfObject1 objectNumber1 Key2             key1InK2    Hello                              FALSE                                     
NameOfObject1 objectNumber1 Key2             key2InK2                                       FALSE                                     
NameOfObject1 objectNumber1 Key2             key2InK2    There                              FALSE                                     
NameOfObject2 objectNumber2                              TRUE                                                                         
NameOfObject1 objectNumber1 Key3                                                            FALSE                                     
NameOfObject1 objectNumber1 Key3             key1InK3                                       FALSE                                     
NameOfObject1 objectNumber1 Key3             key1InK3    Good                               FALSE                                     
NameOfObject1 objectNumber1 Key3             key2InK3                                       FALSE                                     
NameOfObject1 objectNumber1 Key3             key2InK3    Morining                           FALSE                                     
NameOfObject2 objectNumber2 Key1             TRUE                                           TRUE                                      
NameOfObject2 objectNumber2 Key2             FALSE                                          TRUE                                      
NameOfObject2 objectNumber2 Key3             TRUE                                           TRUE                                      
NameOfObject2 objectNumber2                                                                 TRUE          otherKey1     This          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey2     is            
NameOfObject2 objectNumber2                                                                 TRUE          otherKey3     a             
NameOfObject2 objectNumber2                                                                 TRUE          otherKey4     test          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey5     haha          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey6     i             
NameOfObject2 objectNumber2                                                                 TRUE          otherKey7     like          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey8     strawberry    
NameOfObject3 objectNumber3 This is a String                                                FALSE                                     

我希望它看起来像这样:注意最后 2 列是如何上移的,而对象 2 的最后 3 行是如何被删除的。还要注意,对象 2 中 Dic1 值为空的所有行也被删除

Name           Number        Dic1             Dic1 Values Dic1 Values Of dictionaries Inside Boolean Value Dic2          Dic2 Values   

NameOfObject1 objectNumber1                                                                 FALSE                                     
NameOfObject1 objectNumber1 Key1             TRUE                                           FALSE                                     
NameOfObject1 objectNumber1 Key2             key1InK2    Hello                              FALSE                                     
NameOfObject1 objectNumber1 Key2             key2InK2    There                              FALSE                                     
NameOfObject2 objectNumber2                              TRUE                                                                         
NameOfObject1 objectNumber1 Key3             key1InK3    Good                               FALSE                                     
NameOfObject1 objectNumber1 Key3             key2InK3    Morining                           FALSE                                     
NameOfObject2 objectNumber2                  TRUE                                           TRUE                                      
NameOfObject2 objectNumber2 Key1             TRUE                                           TRUE          otherKey1     this          
NameOfObject2 objectNumber2 Key2             FALSE                                          TRUE          otherKey2     is            
NameOfObject2 objectNumber2 Key3             TRUE                                           TRUE          otherKey3     a             
NameOfObject2 objectNumber2                                                                 TRUE          otherKey4     test          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey5     haha          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey6     i             
NameOfObject2 objectNumber2                                                                 TRUE          otherKey7     like          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey8     strawberry    
NameOfObject2 objectNumber2                                                                 TRUE          otherKey      i             
NameOfObject2 objectNumber2                                                                 TRUE          otherKey7     like          
NameOfObject2 objectNumber2                                                                 TRUE          otherKey8     strawberry    
NameOfObject3 objectNumber3 This is a String                                                FALSE                                     

我不知道该怎么做,我看不出背后的逻辑。我不想重写整个代码。 代码太长,无法在此处发布,但这是其背后的逻辑

ListOfLines = []
Loop Through Dictionary keys file
    if key is dictionary:
          append a new line and recursively call the function
     else: (key is not dictionary)
          add the keys and values to a line
Formats all the lines and cleans up the list then writes to the output file

您会看到,在 if 语句中,创建了一个新行,因此字典输出始终位于下一行。当只有 1 个字典时,这工作正常,但有 2 个字典,我得到了上面所示的输出,这还不错,但也不好。除非绝对必要,否则我不想更改代码。

我的思路是简单地使用 python 将单元格组向上移动几行,但我不知道该怎么做。即使我知道我不能简单地将其向上移动 3 行,因为其他文件在 Dict1 中可能有 5-7 个键。

总之,我的问题是: 如何根据这个逻辑使用 python 向上移动一组单元格,我想不出可以放入代码的方法? 对此有什么帮助吗? 提前致谢

Python 熊猫 Excel

评论


答:

0赞 Corralien 11/17/2023 #1

您可以尝试:pd.json_normalize(..., max_level=0)

import pandas as pd
import json

data = json.loads(jdata)  # jdata is your JSON as string
df = pd.json_normalize(data, max_level=0)

# Extract dict columns
dcols = [k for k, v in data.items() if isinstance(v, dict)]

# Explode dict as DataFrame
dfs = []
for col in dcols:
    dd = (pd.json_normalize(df.pop(col)).squeeze()
            .rename_axis(col).rename(f'{col} Values')
            .reset_index())
    dfs.append(dd)

# Create the final dataframe
out = df.merge(pd.concat(dfs, axis=1), how='cross')

输出:

>>> out
           Name        Number  BooleanValue Dict1 Dict1 Values      Dict2 Dict2 Values
0  NameOfObject  objectNumber         False  Key1         true  otherKey1        this 
1  NameOfObject  objectNumber         False  Key2        false  otherKey2          is 
2  NameOfObject  objectNumber         False  Key3         true  otherKey3           a 
3  NameOfObject  objectNumber         False   NaN          NaN  otherKey4        test 
4  NameOfObject  objectNumber         False   NaN          NaN  otherKey5        haha 
5  NameOfObject  objectNumber         False   NaN          NaN  otherKey6           i 
6  NameOfObject  objectNumber         False   NaN          NaN  otherKey7        like 
7  NameOfObject  objectNumber         False   NaN          NaN  otherKey8  strawberry 

>>> df
           Name        Number                                              Dict1  BooleanValue                                              Dict2
0  NameOfObject  objectNumber  {'Key1': 'true', 'Key2': 'false', 'Key3': 'true'}         False  {'otherKey1': 'this ', 'otherKey2': 'is ', 'ot...

输入:

jdata = """
{
    "Name": "NameOfObject",
    "Number": "objectNumber",
    "Dict1": {
        "Key1": "true",
        "Key2": "false",
        "Key3": "true"
    },
    "BooleanValue": false,
    "Dict2":{
        "otherKey1" : "this ",
         "otherKey2" : "is ",
         "otherKey3" : "a ",
         "otherKey4" : "test ",
         "otherKey5" : "haha ",
         "otherKey6" : "i ",
         "otherKey7" : "like ",
         "otherKey8" : "strawberry "
    }
}
"""

评论

0赞 ahmad 11/17/2023
谢谢,但我想我应该更清楚,我的 json 文件是 json 对象的列表,它们都共享相同的主键名称,除了 Dict2 有些可能没有它,Dict1 也可能不是字典类型,也可能包括其他字典。输出将是所有这些对象的 csv 文件,尽管它们的参数。虽然这种方法比我拥有的好 100 倍,但我认为它对我不起作用
0赞 Corralien 11/17/2023
因此,请用一个简单的示例(例如真实案例)更新您的帖子,我明天会回答。
0赞 ahmad 11/24/2023
快速问题,当 dict 的大小为 1 时,它不起作用,它给出“ 'str' 对象没有属性 'rename_axis',所以我删除了所有其他函数,它只是 'pd.json_normalize(df.pop(col))',它返回一个 df,其中 key 作为标题,值作为行,我尝试转置它,但它不起作用,你知道我如何让它工作吗?example:{"Key":"value"}pd.json_normalize(df.pop(col)).squeeze().rename_axis(col).rename(f'{col} Values').reset_index()
0赞 ahmad 11/18/2023 #2

我以 Corraline 的答案为基础,让它按照我想要的方式工作

import pandas as pd
import json
with open('file.json','r') as f:
    fileStuff = json.load(f)
for obj in fileStuff:
   jdata=json.dump(obj)
   data = json.loads(jdata)  # jdata is your JSON as string
   df = pd.json_normalize(data, max_level=0)

   # Extract dict columns
   dcols = [k for k, v in data.items() if isinstance(v, dict)]

   # Explode dict as DataFrame
   dfs = []
   for col in dcols:
       dd = (pd.json_normalize(df.pop(col)).squeeze()
            .rename_axis(col).rename(f'{col} Values')
            .reset_index())
       dfs.append(dd)

   # Create the final dataframe
   if len(dcols)>0: #concat throws an exception when it a basic json object
      df = df.merge(pd.concat(dfs, axis=1), how='cross')
   df.to_csv('out.csv', 'a',index=False)

之后,您就拥有一个包含所有 json 对象的 csv 文件。该文件需要一些清理,但它实际上只是处理基本的 pandas 数据帧,并取决于您的情况