如何使用 Python xml.dom.minidom 转到子节点-解网

问：

我有下面的xml结构，我需要获取以下格式的数据以导出到电子表格要在电子表格中写入的输出

源 XML 数据：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root testAttr="testValue">
    <family name="Hardwood">
        <child name="Jack">First</child>
        <child name="Rose">Second</child>
        <child name="Blue Ivy">Third
                <grandchildren>
                    <data>One</data>
                    <data>Two</data>
                    <unique>Twins</unique>
                </grandchildren>
            </child>
        <child name="Jane">Fourth</child>
    </family>
    <family name="Downie">
        <child name="Bill">First</child>
        <child name="Rosie">Second</child>
        <child name="Edward">
            Third
            </child>
        <child name="Jane">Fourth</child>
    </family>
</root>

我已经尝试了以下方法，但无法输入孙子标签。

import xml.dom.minidom
doc=xml.dom.minidom.parse('Sample_XML.xml')
children = doc.getElementsByTagName('family')
for child in children:
    print(child.getAttribute('name'))
    print(child.getElementsByTagName('child')[0].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[1].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
    print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)

python xml 解析 dom minidom

可能有一种更简单的方法，但如果你使用 lxml 而不是 minidom 就可以做到这一点，因为前者对 xpath 的支持更好。lxml 解析文档并创建一个 DataFrame，该数据帧可以按原样保存、保存为 CSV 或 Excel 文件。这并不简单（因为你的 xml 是深度嵌套的），但如果不言自明，我相信，如果你知道 xpath。如果没有，你应该仔细阅读它......

from lxml import etree
import pandas as pd

doc = etree.parse("Sample_XML.xml")
families = doc.xpath('//family')

rows = []
cols = ["Family","Child","Child Name","Grandchild"]

for family in families:
    children = family.xpath('.//child')
    for child in children:
        gcs = child.xpath('.//grandchildren') 
        entry = [gc.xpath('.//*/text()') for gc in gcs] if gcs else None
        row = [family.xpath('./@name')[0], child.xpath('./text()')[0].strip(), child.xpath('./@name')[0]]
        if entry is not None:            
            for ent in entry[0]:                
                subrow = row+[ent]
                rows.append(subrow)
        else:            
            row.append("none")
            rows.append(row)

pd.DataFrame(rows,columns=cols)

输出应如下所示（请原谅格式）：

    Family    Child     Child Name  Grandchild
0   Hardwood    First   Jack      none
1   Hardwood    Second  Rose      none
2   Hardwood    Third   Blue Ivy    One
3   Hardwood    Third   Blue Ivy    Two
4   Hardwood    Third   Blue Ivy    Twins
5   Hardwood    Fourth  Jane    none
6   Downie  First   Bill      none
7   Downie  Second  Rosie     none
8   Downie  Third   Edward    none
9   Downie  Fourth  Jane      none

from csv import DictWriter
import xml.dom.minidom as md

doc = md.parse('Sample_XML.xml')

# PARSE XML
data = []
for family in doc.getElementsByTagName('family'):
    for child in family.getElementsByTagName('child'):
        inner = {}
        inner["Family"] = family.getAttribute('name')
        inner["Child"] = child.childNodes[0].nodeValue.strip()
        inner["Child Name"] = child.getAttribute('name')
        inner["Grandchild"] = None
        
        for grandchildren in child.getElementsByTagName('grandchildren'):
            for grandchild in grandchildren.childNodes:
                if grandchild.nodeType == md.Node.ELEMENT_NODE:
                    inner = {}
                    inner["Family"] = family.getAttribute('name')
                    inner["Child"] = child.childNodes[0].nodeValue.strip()
                    inner["Child Name"] = child.getAttribute('name')
                    inner["Grandchild"] = grandchild.childNodes[0].nodeValue
                    
                    data.append(inner)
                    
        data.append(inner)

# WRITE CSV
dkeys = list(data[0].keys())

with open("Output.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()    
    dw.writerows(data)

如何使用 Python xml.dom.minidom 转到子节点

How to go to sub child node using Python xml.dom.minidom

评论

评论

评论