将XML转换/解析为CSV文件

Convert/Parsing XML to CSV file

提问人:JiR 提问时间:9/16/2023 更新时间:9/17/2023 访问量:79

问:

这里有人可以指导我如何使用 Python 3.9 将此 XML 转换为 CSV 吗?目前我很难解析这个xml。

下面是我的XML结构:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="actual">
    <header>
      <log dateTime="2023-09-11T09:32:44.000+08:00" action="created" appInfo="ActualExporter">UIValues are used</log>
    </header>
    <managedObject class="LNCEL" version="FLF22R3_2207_10_2207_10" distName="PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10" id="10000">
      <p name="mcc">100</p>
      <p name="mnc">20</p>
      <p name="name">Cell01</p>
      <p name="a1TimeToTriggerDeactInterMeas">320ms</p>
      <p name="a2RedirectQci1">disabled</p>
    </managedObject>
  </cmData>
</raml>

我想要的输出是这样的:https://i.stack.imgur.com/jK3hd.png

谢谢大家,期待您的评论。

python-3.x xml 解析 beautifulsoup

评论

0赞 D.L 9/16/2023
您可以使用此模块来解析:...import xml.etree.ElementTree as ET

答:

0赞 Andrej Kesely 9/16/2023 #1

下面是一个如何使用 解析 XML 的示例:

import pandas as pd
from bs4 import BeautifulSoup

with open("your_file.xml", "r") as f_in:
    soup = BeautifulSoup(f_in.read(), "xml")

header = soup.header
dt = header.log["dateTime"]

all_data = []
for mo in soup.select("managedObject"):
    version = mo["version"]
    dist_name = mo["distName"]
    moid = mo["id"]
    all_data.append(
        {
            "DATETIME": dt,
            "VERSION": version,
            "DISTNAME": dist_name,
            "MOID": moid,
            **{p["name"]: p.text for p in mo.select("p") for a in p.attrs},
        }
    )

df = pd.DataFrame(all_data)
print(df)

指纹:

                        DATETIME                  VERSION                                     DISTNAME   MOID  mcc mnc    name a1TimeToTriggerDeactInterMeas a2RedirectQci1
0  2023-09-11T09:32:44.000+08:00  FLF22R3_2207_10_2207_10  PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10  10000  100  20  Cell01                         320ms       disabled
0赞 Anis Rafid 9/17/2023 #2

您可以使用 Python 的 xml.etree.ElementTree 进行 XML 解析,并使用 Python 的内置 csv 模块来编写 CSV 文件来实现它。您需要解析 XML 以提取所需的数据,并将提取的数据写入 CSV 文件。代码如下:

import xml.etree.ElementTree as ET
import csv
from datetime import datetime

xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="actual">
    <header>
      <log dateTime="2023-09-11T09:32:44.000+08:00" action="created" appInfo="ActualExporter">UIValues are used</log>
    </header>
    <managedObject class="LNCEL" version="FLF22R3_2207_10_2207_10" distName="PLMN-PLMN/MRBTS-10000/LNBTS-100007/LNCEL-10" id="10000">
      <p name="mcc">100</p>
      <p name="mnc">20</p>
      <p name="name">Cell01</p>
      <p name="a1TimeToTriggerDeactInterMeas">320ms</p>
      <p name="a2RedirectQci1">disabled</p>
    </managedObject>
  </cmData>
</raml>'''

root = ET.fromstring(xml_string)

log_header = root.find('.//{raml20.xsd}log')
date_time = log_header.attrib.get('dateTime', '') if log_header is not None else ''

with open('output.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    headers = ['MO', 'MO Name', 'mcc', 'mnc', 'name', 'a1TimeToTriggerDeactInterMeas', 'a2RedirectQci1', 'dateTime', 'MOID']
    csv_writer.writerow(headers)

    for mo in root.findall('.//{raml20.xsd}managedObject'):
        dist_name = mo.attrib['distName']
        version = mo.attrib['version']
        moid = mo.attrib.get('id', '') 
        properties = {p.attrib['name']: p.text for p in mo.findall('.//{raml20.xsd}p')}
        row_data = [
            date_time,
            version,
            dist_name,
            moid,
            properties.get('mcc', ''),
            properties.get('mnc', ''),
            properties.get('name', ''),
            properties.get('a1TimeToTriggerDeactInterMeas', ''),
            properties.get('a2RedirectQci1', '')
        ]
        csv_writer.writerow(row_data)