提问人:JiR 提问时间:9/16/2023 更新时间:9/17/2023 访问量:79
将XML转换/解析为CSV文件
Convert/Parsing XML to CSV file
问:
这里有人可以指导我如何使用 Python 3.9 将此 XML 转换为 CSV 吗?目前我很难解析这个xml。
下面是我的XML结构:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<header>
<log dateTime="2023-09-11T09:32:44.000+08:00" action="created" appInfo="ActualExporter">UIValues are used</log>
</header>
<managedObject class="LNCEL" version="FLF22R3_2207_10_2207_10" distName="PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10" id="10000">
<p name="mcc">100</p>
<p name="mnc">20</p>
<p name="name">Cell01</p>
<p name="a1TimeToTriggerDeactInterMeas">320ms</p>
<p name="a2RedirectQci1">disabled</p>
</managedObject>
</cmData>
</raml>
我想要的输出是这样的:https://i.stack.imgur.com/jK3hd.png
谢谢大家,期待您的评论。
答:
0赞
Andrej Kesely
9/16/2023
#1
下面是一个如何使用 beautifulsoup 解析 XML 的示例:
import pandas as pd
from bs4 import BeautifulSoup
with open("your_file.xml", "r") as f_in:
soup = BeautifulSoup(f_in.read(), "xml")
header = soup.header
dt = header.log["dateTime"]
all_data = []
for mo in soup.select("managedObject"):
version = mo["version"]
dist_name = mo["distName"]
moid = mo["id"]
all_data.append(
{
"DATETIME": dt,
"VERSION": version,
"DISTNAME": dist_name,
"MOID": moid,
**{p["name"]: p.text for p in mo.select("p") for a in p.attrs},
}
)
df = pd.DataFrame(all_data)
print(df)
指纹:
DATETIME VERSION DISTNAME MOID mcc mnc name a1TimeToTriggerDeactInterMeas a2RedirectQci1
0 2023-09-11T09:32:44.000+08:00 FLF22R3_2207_10_2207_10 PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10 10000 100 20 Cell01 320ms disabled
0赞
Anis Rafid
9/17/2023
#2
您可以使用 Python 的 xml.etree.ElementTree 进行 XML 解析,并使用 Python 的内置 csv 模块来编写 CSV 文件来实现它。您需要解析 XML 以提取所需的数据,并将提取的数据写入 CSV 文件。代码如下:
import xml.etree.ElementTree as ET
import csv
from datetime import datetime
xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<header>
<log dateTime="2023-09-11T09:32:44.000+08:00" action="created" appInfo="ActualExporter">UIValues are used</log>
</header>
<managedObject class="LNCEL" version="FLF22R3_2207_10_2207_10" distName="PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10" id="10000">
<p name="mcc">100</p>
<p name="mnc">20</p>
<p name="name">Cell01</p>
<p name="a1TimeToTriggerDeactInterMeas">320ms</p>
<p name="a2RedirectQci1">disabled</p>
</managedObject>
</cmData>
</raml>'''
root = ET.fromstring(xml_string)
log_header = root.find('.//{raml20.xsd}log')
date_time = log_header.attrib.get('dateTime', '') if log_header is not None else ''
with open('output.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile)
headers = ['MO', 'MO Name', 'mcc', 'mnc', 'name', 'a1TimeToTriggerDeactInterMeas', 'a2RedirectQci1', 'dateTime', 'MOID']
csv_writer.writerow(headers)
for mo in root.findall('.//{raml20.xsd}managedObject'):
dist_name = mo.attrib['distName']
version = mo.attrib['version']
moid = mo.attrib.get('id', '')
properties = {p.attrib['name']: p.text for p in mo.findall('.//{raml20.xsd}p')}
row_data = [
date_time,
version,
dist_name,
moid,
properties.get('mcc', ''),
properties.get('mnc', ''),
properties.get('name', ''),
properties.get('a1TimeToTriggerDeactInterMeas', ''),
properties.get('a2RedirectQci1', '')
]
csv_writer.writerow(row_data)
评论
import xml.etree.ElementTree as ET