使用 Python 脚本将嵌套 JSON 转换为 CSV

Convert Nested JSON to CSV using Python script

提问人:john 提问时间:11/5/2023 最后编辑:tripleeejohn 更新时间:11/5/2023 访问量:97

问:

我对 Python 非常陌生,正在尝试将嵌套的 JSON 转换为 CSV。以下是我正在尝试的 Python 脚本,但我没有得到所需的输出。

import json
import pandas as pd

# Load via context manager and read_json() method
with open('employee_data1.json', 'r')as file:
    # load JSON data and parse into Dictionary object
    data = json.load(file)
    
# Load JSON as DataFrame 
df = pd.json_normalize(data)


# Print Result
print(df)

# output DataFrame to CSV file
df.to_csv('employee_data.csv')

我实际上正在使用上述代码尝试 2 个 JSON 数据,每个数据获得不同的输出。

employee_data1.json:

{
    "features": [
        {
            "candidate": {
                "first_name": "Margaret",
                "last_name": "Mcdonald",
                "skills": [
                    "skLearn",
                    "Java",
                    "R",
                    "SQL",
                    "Spark",
                    "C++"
                ],
                "state": "AL",
                "specialty": "Database",
                "experience": [
                    {
                        "company": "XYZ Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "ABC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "no"
            }
        },
        {
            "candidate": {
                "first_name": "Michael",
                "last_name": "Carter",
                "skills": [
                    "TensorFlow",
                    "R",
                    "Spark",
                    "MongoDB",
                    "C++",
                    "SQL"
                ],
                "state": "AR",
                "specialty": "Statistics",
                "experience": [
                    {
                        "company": "DFC Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "SDC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "yes"
            }
        }
    ]
}

employee_data2.json:

{
    "features": 
      {
        "candidate": {
          "first_name": "Margaret",
          "last_name": "Mcdonald",
          "skills": [
            "skLearn",
            "Java",
            "R",
            "SQL",
            "Spark",
            "C++"
          ],
          "state": "AL",
          "specialty": "Database",
          "experience": [
            {
              "company": "XYZ Corp",
              "position": "Software Engineer",
              "start_date": "2016-01-01",
              "end_date": "2021-03-01"
            },
            {
              "company": "ABC Inc",
              "position": "Senior Software Engineer",
              "start_date": "2021-04-01",
              "end_date": null
            }
          ],
          "relocation": "no"
        }
      }
  }

下面,我只选择了几个字段,而不是所有字段。我期待以下期望的输出。如果有人能帮助我解决这个问题,我会很高兴。

candidate.first_name, candidate.last_name, candidate.skills, candidate.state, candidate.experience.company, candidate.experience.position

Margaret, Mcdonald, "['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']", AL, XYZ Corp, Software Engineer
json python-3.x csv

评论

4赞 Tim Roberts 11/5/2023
你为什么要这样做?JSON 是存储和传输此数据的一种更智能的方式。没有标准允许在 CSV 文件中使用括号列表。
0赞 tripleee 11/5/2023
也许另见 stackoverflow.com/a/65338582/874188
0赞 john 11/6/2023
@TimRoberts 是否可以将嵌套的json数组存储在SQL表中?,json到csv,然后将csv数据存储到SQL表中?
0赞 Tim Roberts 11/6/2023
同样,这不是一条明智的道路。将数组存储在字段中不是一个好的选择。如果需要长期存储,请使用像MongoDB这样本机存储JSON文档的东西

答:

0赞 Bushmaster 11/5/2023 #1

你可以像这样使用 json_normalize():

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                              ["features","candidate","relocation"],["features","candidate","skills"],
                                    ["features","candidate","specialty"],["features","candidate","state"]])

但它会抛出这个错误:

ValueError: operands could not be broadcast together with shape (12,) (2,)

这可能是一个错误。在github上查看有关此问题的问题: BUG:json_normalize失败,数组/列表为空。为了避免这个错误,你应该将列表转换为字符串,然后使用json_normalize最后将字符串类型的列表转换为列表:

if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])

json_normalize后:

df ["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)

输出

|    | company   | position                 | start_date   | end_date   | features.candidate.first_name   | features.candidate.last_name   | features.candidate.relocation   | features.candidate.skills                             | features.candidate.specialty   | features.candidate.state   |
|---:|:----------|:-------------------------|:-------------|:-----------|:--------------------------------|:-------------------------------|:--------------------------------|:------------------------------------------------------|:-------------------------------|:---------------------------|
|  0 | XYZ Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  1 | ABC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  2 | DFC Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |
|  3 | SDC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |

完整代码

import ast
if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                ["features","candidate","relocation"],["features","candidate","skills"],
                ["features","candidate","specialty"],["features","candidate","state"]])

df["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)

评论

0赞 john 11/5/2023
嘿,首先非常感谢您的代码,您的代码适用于 employee_data1.json,但不适用于employee_data2.json,出现一些错误。.
0赞 Bushmaster 11/5/2023
好。我编辑了我的答案。你能检查一下吗?
0赞 john 11/6/2023
非常感谢,现在为这两个jsons工作。如果您不介意,您也请进行一些其他更改,即。“经验”字段列值应仅显示“相似技能”字段列值,而不是在每条记录中显示同一人的每个体验。您可以将features.candidate.first_name重命名为 first_name(没有 features.candidate),这同样适用于其他字段。再次感谢!
0赞 Tim Roberts 11/6/2023
这不是一项免费的咨询服务。您需要主动清理此处提出的建议。
0赞 john 11/6/2023
@TimRoberts我已经尝试过了,请找到我在实际主题中尝试过的脚本代码。所以来这里帮忙。我没有从我的角度尝试就问了。顺便说一句,我对 Python 很陌生。.