如何使用 Python xml.dom.minidom 转到子节点

How to go to sub child node using Python xml.dom.minidom

提问人:Smi 提问时间:4/23/2023 最后编辑:Smi 更新时间:4/24/2023 访问量:101

问:

我有下面的xml结构,我需要获取以下格式的数据以导出到电子表格 要在电子表格中写入的输出

预期电子表格输出的图像

源 XML 数据:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root testAttr="testValue">
    <family name="Hardwood">
        <child name="Jack">First</child>
        <child name="Rose">Second</child>
        <child name="Blue Ivy">Third
                <grandchildren>
                    <data>One</data>
                    <data>Two</data>
                    <unique>Twins</unique>
                </grandchildren>
            </child>
        <child name="Jane">Fourth</child>
    </family>
    <family name="Downie">
        <child name="Bill">First</child>
        <child name="Rosie">Second</child>
        <child name="Edward">
            Third
            </child>
        <child name="Jane">Fourth</child>
    </family>
</root>

我已经尝试了以下方法,但无法输入孙子标签。

import xml.dom.minidom
doc=xml.dom.minidom.parse('Sample_XML.xml')
children = doc.getElementsByTagName('family')
for child in children:
    print(child.getAttribute('name'))
    print(child.getElementsByTagName('child')[0].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[1].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
python xml 解析 dom minidom

评论


答:

0赞 Jack Fleeting 4/23/2023 #1

可能有一种更简单的方法,但如果你使用 lxml 而不是 minidom 就可以做到这一点,因为前者对 xpath 的支持更好。lxml 解析文档并创建一个 DataFrame,该数据帧可以按原样保存、保存为 CSV 或 Excel 文件。这并不简单(因为你的 xml 是深度嵌套的),但如果不言自明,我相信,如果你知道 xpath。如果没有,你应该仔细阅读它......

from lxml import etree
import pandas as pd

doc = etree.parse("Sample_XML.xml")
families = doc.xpath('//family')

rows = []
cols = ["Family","Child","Child Name","Grandchild"]

for family in families:
    children = family.xpath('.//child')
    for child in children:
        gcs = child.xpath('.//grandchildren') 
        entry = [gc.xpath('.//*/text()') for gc in gcs] if gcs else None
        row = [family.xpath('./@name')[0], child.xpath('./text()')[0].strip(), child.xpath('./@name')[0]]
        if entry is not None:            
            for ent in entry[0]:                
                subrow = row+[ent]
                rows.append(subrow)
        else:            
            row.append("none")
            rows.append(row)

pd.DataFrame(rows,columns=cols)

输出应如下所示(请原谅格式):

    Family    Child     Child Name  Grandchild
0   Hardwood    First   Jack      none
1   Hardwood    Second  Rose      none
2   Hardwood    Third   Blue Ivy    One
3   Hardwood    Third   Blue Ivy    Two
4   Hardwood    Third   Blue Ivy    Twins
5   Hardwood    Fourth  Jane    none
6   Downie  First   Bill      none
7   Downie  Second  Rosie     none
8   Downie  Third   Edward    none
9   Downie  Fourth  Jane      none

评论

0赞 Smi 4/27/2023
感谢您的解决方案,它提供了所需的输出。
0赞 Parfait 4/24/2023 #2

考虑多个嵌套循环,从家人到孩子再到孙子孙女,同时构建字典列表。需要对可选的孙子节点进行特殊处理。从那里使用,将数据写入 CSV。DictWriter

from csv import DictWriter
import xml.dom.minidom as md

doc = md.parse('Sample_XML.xml')

# PARSE XML
data = []
for family in doc.getElementsByTagName('family'):
    for child in family.getElementsByTagName('child'):
        inner = {}
        inner["Family"] = family.getAttribute('name')
        inner["Child"] = child.childNodes[0].nodeValue.strip()
        inner["Child Name"] = child.getAttribute('name')
        inner["Grandchild"] = None
        
        for grandchildren in child.getElementsByTagName('grandchildren'):
            for grandchild in grandchildren.childNodes:
                if grandchild.nodeType == md.Node.ELEMENT_NODE:
                    inner = {}
                    inner["Family"] = family.getAttribute('name')
                    inner["Child"] = child.childNodes[0].nodeValue.strip()
                    inner["Child Name"] = child.getAttribute('name')
                    inner["Grandchild"] = grandchild.childNodes[0].nodeValue
                    
                    data.append(inner)
                    
        data.append(inner)

# WRITE CSV
dkeys = list(data[0].keys())

with open("Output.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()    
    dw.writerows(data)

评论

0赞 Smi 4/27/2023
感谢您的解决方案,它提供了所需的输出。