如何按xsi:type对xml文件进行排序?

How to sort xml file by xsi:type?

提问人:pinkiko 提问时间:10/16/2023 最后编辑:pinkiko 更新时间:10/18/2023 访问量:33

问:

我知道已经有几个关于xml排序的问题,但它们似乎都不适用于我的情况。 我有以下 xml 文件,表示 esri 文件地理数据库数据方案的剪切:

import xml.etree.ElementTree as ET
from operator import attrgetter

data = """<esri:Workspace xmlns:esri='http://www.esri.com/schemas/ArcGIS/10.8' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema'>
    <WorkspaceDefinition xsi:type='esri:WorkspaceDefinition'>
        <WorkspaceType>esriLocalDatabaseWorkspace</WorkspaceType>
        <Version/>
        <Domains xsi:type='esri:ArrayOfDomain'/>
        <Sequences xsi:type='esri:ArrayOfSequence'/>
        <DatasetDefinitions xsi:type='esri:ArrayOfDataElement'>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureDataset'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
        </DatasetDefinitions>
    </WorkspaceDefinition>
    <WorkspaceData xsi:type='esri:WorkspaceData'/>
</esri:Workspace>"""    
    
root_1 = ET.fromstring(data)

我想按标签和 DataElement 类型对其进行排序,以便按如下方式排序:

WorkspaceData {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:WorkspaceData'}
WorkspaceDefinition {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:WorkspaceDefinition'}
     DatasetDefinitions {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDataElement'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
         DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureDataset'}
     Domains {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDomain'}
     Sequences {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfSequence'}
     Version {}
     WorkspaceType {}

到目前为止,我设法按标签排序,但是如何按 DataElement 类型排序?这是我到目前为止的代码:

root_1[:] = sorted(root_1,  key=attrgetter("tag")) # WorkspaceData, WorkspaceDefinition
for node in root_1.findall("*"):  # DatasetDefinitions, Domains, Sequences, Version, WorkspaceType
    node[:] = sorted(node, key=attrgetter("tag"))
    print(node)
    for subnode in node.findall("*"): #DataElement, Domain
        subnode[:] = sorted(subnode, key=attrgetter("tag"))
        #subnode[:] = sorted(subnode, key=subnode.get['xsi:type']) # not working!
        print("\t", subnode.tag, subnode.attrib)
        for subsubnode in subnode.findall("*"): 
            print("\t\t", subsubnode.tag, subsubnode.attrib)
            subsubnode[:] = sorted(subsubnode,  key=attrgetter("tag"))
python xml 排序 elementtree xsitype

评论


答:

1赞 Andrej Kesely 10/16/2023 #1

IIUC,您可以稍微更改一下中的参数:key=sorted()

import xml.etree.ElementTree as ET
from operator import attrgetter

data = """<esri:Workspace xmlns:esri='http://www.esri.com/schemas/ArcGIS/10.8' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema'>
    <WorkspaceDefinition xsi:type='esri:WorkspaceDefinition'>
        <WorkspaceType>esriLocalDatabaseWorkspace</WorkspaceType>
        <Version/>
        <Domains xsi:type='esri:ArrayOfDomain'/>
        <Sequences xsi:type='esri:ArrayOfSequence'/>
        <DatasetDefinitions xsi:type='esri:ArrayOfDataElement'>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureDataset'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
            <DataElement xsi:type='esri:DEFeatureClass'/>
        </DatasetDefinitions>
    </WorkspaceDefinition>
    <WorkspaceData xsi:type='esri:WorkspaceData'/>
</esri:Workspace>"""

root_1 = ET.fromstring(data)

root_1[:] = sorted(root_1, key=attrgetter("tag"))  # WorkspaceData, WorkspaceDefinition

for node in root_1.findall(
    "*"
):  # DatasetDefinitions, Domains, Sequences, Version, WorkspaceType
    node[:] = sorted(node, key=attrgetter("tag"))
    print(node)
    for subnode in node.findall("*"):  # DataElement, Domain
        subnode[:] = sorted(
            subnode,
            key=lambda node: (            # <--- change key= here
                node.tag,
                node.get("{http://www.w3.org/2001/XMLSchema-instance}type"),
            ),
        )
        print("\t", subnode.tag, subnode.attrib)
        for subsubnode in subnode.findall("*"):
            print("\t\t", subsubnode.tag, subsubnode.attrib)
            subsubnode[:] = sorted(
                subsubnode,
                key=attrgetter("tag"),
            )

指纹:

<Element 'WorkspaceData' at 0x7f5ff630bec0>
<Element 'WorkspaceDefinition' at 0x7f5ff6316610>
         DatasetDefinitions {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDataElement'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureClass'}
                 DataElement {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:DEFeatureDataset'}
         Domains {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfDomain'}
         Sequences {'{http://www.w3.org/2001/XMLSchema-instance}type': 'esri:ArrayOfSequence'}
         Version {}
         WorkspaceType {}