添加/替换 XML 标记的 Python 脚本

Python script that adds/replaces XML tags

提问人:E-Flo 提问时间:9/6/2023 更新时间:9/6/2023 访问量:51

问:

我有这个 Python 脚本,它应该在 XML 文档中查找现有标签,并用新的、更具描述性的标签替换它们。问题是,在我运行脚本后,它似乎只捕获我输入的文本字符串的每几个实例。我确信它为什么会这样,这背后有一些原因,但我似乎无法弄清楚。

import xml.etree.ElementTree as ET
from lxml import etree

def replace_specific_line_tags(input_file, output_file, replacements):
    # Parse the XML file using lxml
    tree = etree.parse(input_file)
    root = tree.getroot()

    for target_text, replacement_tag in replacements:
        # Find all <line> tags with the specific target text under <content> and replace them with the new tag
        for line_tag in root.xpath('.//content/page/line[contains(., "{}")]'.format(target_text)):
            parent = line_tag.getparent()

            # Create the new tag with the desired tag name
            new_tag = etree.Element(replacement_tag)

            # Copy the attributes of the original <line> tag to the new tag
            for attr, value in line_tag.attrib.items():
                new_tag.set(attr, value)

            # Copy the text of the original <line> tag to the new tag
            new_tag.text = line_tag.text

            # Replace the original <line> tag with the new tag
            parent.replace(line_tag, new_tag)

    # Write the updated XML back to the file
    with open(output_file, 'wb') as f:
        tree.write(f, encoding='utf-8', xml_declaration=True)

if __name__ == '__main__':
    input_file_name = 'beforeTagEdits.xml'
    output_file_name = 'afterTagEdits.xml'
    
    # List of target texts and their corresponding replacement tags
    replacements = [
        ('The Washington Post', 'title'),

        # Add more target texts and their replacement tags as needed
    ]
    
    replace_specific_line_tags(input_file_name, output_file_name, replacements)

由于代码正在工作,只是没有完全预期,我尝试更改一些文本字符串以匹配原始文件中已知的确切字符串,但这似乎并不能解决问题。下面是当前 XML 文档的示例:

<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
     </content>
</root>
python xml xml 文本解析

评论

1赞 Andrej Kesely 9/6/2023
你能编辑问题并把预期的输出放在那里吗?你能用吗?beautifulsoup
1赞 Yitzhak Khabinsky 9/6/2023
最好将 XSLT 用于此类任务。
1赞 LMC 9/6/2023
您的 xpath 需要,但 XML 结构是content/page/linecontent/line
0赞 balderman 9/7/2023
共享输入(作为有效的 XML)、所需的输出(作为有效的 XML)和转换逻辑,

答:

0赞 Hermann12 9/6/2023 #1

您可以在找到搜索到的文本后对树进行 iter() 并重命名标签:

import xml.etree.ElementTree as ET

xml= """<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
          <tag>Der Spiegel</tag>
     </content>
</root>"""

root = ET.fromstring(xml)

pattern ={'title':['The Washington Post', 'Der Spiegel']}

for k, v in pattern.items():
    for elem in root.iter():
        if elem.text in v:
            elem.tag = k
            
ET.dump(root)

输出:

<root>
     <content>
          <title>The Washington Post</title>
          <title>The Washington Post</title>
          <title>Der Spiegel</title>
     </content>
</root>