提问人:Smi 提问时间:4/23/2023 最后编辑:Smi 更新时间:4/24/2023 访问量:101
如何使用 Python xml.dom.minidom 转到子节点
How to go to sub child node using Python xml.dom.minidom
问:
我有下面的xml结构,我需要获取以下格式的数据以导出到电子表格 要在电子表格中写入的输出
源 XML 数据:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root testAttr="testValue">
<family name="Hardwood">
<child name="Jack">First</child>
<child name="Rose">Second</child>
<child name="Blue Ivy">Third
<grandchildren>
<data>One</data>
<data>Two</data>
<unique>Twins</unique>
</grandchildren>
</child>
<child name="Jane">Fourth</child>
</family>
<family name="Downie">
<child name="Bill">First</child>
<child name="Rosie">Second</child>
<child name="Edward">
Third
</child>
<child name="Jane">Fourth</child>
</family>
</root>
我已经尝试了以下方法,但无法输入孙子标签。
import xml.dom.minidom
doc=xml.dom.minidom.parse('Sample_XML.xml')
children = doc.getElementsByTagName('family')
for child in children:
print(child.getAttribute('name'))
print(child.getElementsByTagName('child')[0].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[1].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
答:
0赞
Jack Fleeting
4/23/2023
#1
可能有一种更简单的方法,但如果你使用 lxml 而不是 minidom 就可以做到这一点,因为前者对 xpath 的支持更好。lxml 解析文档并创建一个 DataFrame,该数据帧可以按原样保存、保存为 CSV 或 Excel 文件。这并不简单(因为你的 xml 是深度嵌套的),但如果不言自明,我相信,如果你知道 xpath。如果没有,你应该仔细阅读它......
from lxml import etree
import pandas as pd
doc = etree.parse("Sample_XML.xml")
families = doc.xpath('//family')
rows = []
cols = ["Family","Child","Child Name","Grandchild"]
for family in families:
children = family.xpath('.//child')
for child in children:
gcs = child.xpath('.//grandchildren')
entry = [gc.xpath('.//*/text()') for gc in gcs] if gcs else None
row = [family.xpath('./@name')[0], child.xpath('./text()')[0].strip(), child.xpath('./@name')[0]]
if entry is not None:
for ent in entry[0]:
subrow = row+[ent]
rows.append(subrow)
else:
row.append("none")
rows.append(row)
pd.DataFrame(rows,columns=cols)
输出应如下所示(请原谅格式):
Family Child Child Name Grandchild
0 Hardwood First Jack none
1 Hardwood Second Rose none
2 Hardwood Third Blue Ivy One
3 Hardwood Third Blue Ivy Two
4 Hardwood Third Blue Ivy Twins
5 Hardwood Fourth Jane none
6 Downie First Bill none
7 Downie Second Rosie none
8 Downie Third Edward none
9 Downie Fourth Jane none
评论
0赞
Smi
4/27/2023
感谢您的解决方案,它提供了所需的输出。
0赞
Parfait
4/24/2023
#2
考虑多个嵌套循环,从家人到孩子再到孙子孙女,同时构建字典列表。需要对可选的孙子节点进行特殊处理。从那里使用,将数据写入 CSV。DictWriter
from csv import DictWriter
import xml.dom.minidom as md
doc = md.parse('Sample_XML.xml')
# PARSE XML
data = []
for family in doc.getElementsByTagName('family'):
for child in family.getElementsByTagName('child'):
inner = {}
inner["Family"] = family.getAttribute('name')
inner["Child"] = child.childNodes[0].nodeValue.strip()
inner["Child Name"] = child.getAttribute('name')
inner["Grandchild"] = None
for grandchildren in child.getElementsByTagName('grandchildren'):
for grandchild in grandchildren.childNodes:
if grandchild.nodeType == md.Node.ELEMENT_NODE:
inner = {}
inner["Family"] = family.getAttribute('name')
inner["Child"] = child.childNodes[0].nodeValue.strip()
inner["Child Name"] = child.getAttribute('name')
inner["Grandchild"] = grandchild.childNodes[0].nodeValue
data.append(inner)
data.append(inner)
# WRITE CSV
dkeys = list(data[0].keys())
with open("Output.csv", "w", newline="") as f:
dw = DictWriter(f, fieldnames=dkeys)
dw.writeheader()
dw.writerows(data)
评论
0赞
Smi
4/27/2023
感谢您的解决方案,它提供了所需的输出。
上一个:将元素逐个写入 xml 列表中
评论