提问人:john 提问时间:11/5/2023 最后编辑:tripleeejohn 更新时间:11/5/2023 访问量:97
使用 Python 脚本将嵌套 JSON 转换为 CSV
Convert Nested JSON to CSV using Python script
问:
我对 Python 非常陌生,正在尝试将嵌套的 JSON 转换为 CSV。以下是我正在尝试的 Python 脚本,但我没有得到所需的输出。
import json
import pandas as pd
# Load via context manager and read_json() method
with open('employee_data1.json', 'r')as file:
# load JSON data and parse into Dictionary object
data = json.load(file)
# Load JSON as DataFrame
df = pd.json_normalize(data)
# Print Result
print(df)
# output DataFrame to CSV file
df.to_csv('employee_data.csv')
我实际上正在使用上述代码尝试 2 个 JSON 数据,每个数据获得不同的输出。
employee_data1.json
:
{
"features": [
{
"candidate": {
"first_name": "Margaret",
"last_name": "Mcdonald",
"skills": [
"skLearn",
"Java",
"R",
"SQL",
"Spark",
"C++"
],
"state": "AL",
"specialty": "Database",
"experience": [
{
"company": "XYZ Corp",
"position": "Software Engineer",
"start_date": "2016-01-01",
"end_date": "2021-03-01"
},
{
"company": "ABC Inc",
"position": "Senior Software Engineer",
"start_date": "2021-04-01",
"end_date": null
}
],
"relocation": "no"
}
},
{
"candidate": {
"first_name": "Michael",
"last_name": "Carter",
"skills": [
"TensorFlow",
"R",
"Spark",
"MongoDB",
"C++",
"SQL"
],
"state": "AR",
"specialty": "Statistics",
"experience": [
{
"company": "DFC Corp",
"position": "Software Engineer",
"start_date": "2016-01-01",
"end_date": "2021-03-01"
},
{
"company": "SDC Inc",
"position": "Senior Software Engineer",
"start_date": "2021-04-01",
"end_date": null
}
],
"relocation": "yes"
}
}
]
}
employee_data2.json
:
{
"features":
{
"candidate": {
"first_name": "Margaret",
"last_name": "Mcdonald",
"skills": [
"skLearn",
"Java",
"R",
"SQL",
"Spark",
"C++"
],
"state": "AL",
"specialty": "Database",
"experience": [
{
"company": "XYZ Corp",
"position": "Software Engineer",
"start_date": "2016-01-01",
"end_date": "2021-03-01"
},
{
"company": "ABC Inc",
"position": "Senior Software Engineer",
"start_date": "2021-04-01",
"end_date": null
}
],
"relocation": "no"
}
}
}
下面,我只选择了几个字段,而不是所有字段。我期待以下期望的输出。如果有人能帮助我解决这个问题,我会很高兴。
candidate.first_name, candidate.last_name, candidate.skills, candidate.state, candidate.experience.company, candidate.experience.position
Margaret, Mcdonald, "['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']", AL, XYZ Corp, Software Engineer
答:
0赞
Bushmaster
11/5/2023
#1
你可以像这样使用 json_normalize():
df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
meta=[["features","candidate","first_name"],["features","candidate","last_name"],
["features","candidate","relocation"],["features","candidate","skills"],
["features","candidate","specialty"],["features","candidate","state"]])
但它会抛出这个错误:
ValueError: operands could not be broadcast together with shape (12,) (2,)
这可能是一个错误。在github上查看有关此问题的问题: BUG:json_normalize失败,数组/列表为空
。为了避免这个错误,你应该将列表转换为字符串,然后使用json_normalize最后将字符串类型的列表转换为列表:
if len(your_json_data["features"]) > 1:
for i in your_json_data["features"]:
i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])
json_normalize后:
df ["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)
输出:
| | company | position | start_date | end_date | features.candidate.first_name | features.candidate.last_name | features.candidate.relocation | features.candidate.skills | features.candidate.specialty | features.candidate.state |
|---:|:----------|:-------------------------|:-------------|:-----------|:--------------------------------|:-------------------------------|:--------------------------------|:------------------------------------------------------|:-------------------------------|:---------------------------|
| 0 | XYZ Corp | Software Engineer | 2016-01-01 | 2021-03-01 | Margaret | Mcdonald | no | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++'] | Database | AL |
| 1 | ABC Inc | Senior Software Engineer | 2021-04-01 | nan | Margaret | Mcdonald | no | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++'] | Database | AL |
| 2 | DFC Corp | Software Engineer | 2016-01-01 | 2021-03-01 | Michael | Carter | yes | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics | AR |
| 3 | SDC Inc | Senior Software Engineer | 2021-04-01 | nan | Michael | Carter | yes | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics | AR |
完整代码:
import ast
if len(your_json_data["features"]) > 1:
for i in your_json_data["features"]:
i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])
df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
meta=[["features","candidate","first_name"],["features","candidate","last_name"],
["features","candidate","relocation"],["features","candidate","skills"],
["features","candidate","specialty"],["features","candidate","state"]])
df["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)
评论
0赞
john
11/5/2023
嘿,首先非常感谢您的代码,您的代码适用于 employee_data1.json,但不适用于employee_data2.json,出现一些错误。.
0赞
Bushmaster
11/5/2023
好。我编辑了我的答案。你能检查一下吗?
0赞
john
11/6/2023
非常感谢,现在为这两个jsons工作。如果您不介意,您也请进行一些其他更改,即。“经验”字段列值应仅显示“相似技能”字段列值,而不是在每条记录中显示同一人的每个体验。您可以将features.candidate.first_name重命名为 first_name(没有 features.candidate),这同样适用于其他字段。再次感谢!
0赞
Tim Roberts
11/6/2023
这不是一项免费的咨询服务。您需要主动清理此处提出的建议。
0赞
john
11/6/2023
@TimRoberts我已经尝试过了,请找到我在实际主题中尝试过的脚本代码。所以来这里帮忙。我没有从我的角度尝试就问了。顺便说一句,我对 Python 很陌生。.
评论