添加/替换 XML 标记的 Python 脚本-解网

问：

我有这个 Python 脚本，它应该在 XML 文档中查找现有标签，并用新的、更具描述性的标签替换它们。问题是，在我运行脚本后，它似乎只捕获我输入的文本字符串的每几个实例。我确信它为什么会这样，这背后有一些原因，但我似乎无法弄清楚。

import xml.etree.ElementTree as ET
from lxml import etree

def replace_specific_line_tags(input_file, output_file, replacements):
    # Parse the XML file using lxml
    tree = etree.parse(input_file)
    root = tree.getroot()

    for target_text, replacement_tag in replacements:
        # Find all <line> tags with the specific target text under <content> and replace them with the new tag
        for line_tag in root.xpath('.//content/page/line[contains(., "{}")]'.format(target_text)):
            parent = line_tag.getparent()

            # Create the new tag with the desired tag name
            new_tag = etree.Element(replacement_tag)

            # Copy the attributes of the original <line> tag to the new tag
            for attr, value in line_tag.attrib.items():
                new_tag.set(attr, value)

            # Copy the text of the original <line> tag to the new tag
            new_tag.text = line_tag.text

            # Replace the original <line> tag with the new tag
            parent.replace(line_tag, new_tag)

    # Write the updated XML back to the file
    with open(output_file, 'wb') as f:
        tree.write(f, encoding='utf-8', xml_declaration=True)

if __name__ == '__main__':
    input_file_name = 'beforeTagEdits.xml'
    output_file_name = 'afterTagEdits.xml'
    
    # List of target texts and their corresponding replacement tags
    replacements = [
        ('The Washington Post', 'title'),

        # Add more target texts and their replacement tags as needed
    ]
    
    replace_specific_line_tags(input_file_name, output_file_name, replacements)

由于代码正在工作，只是没有完全预期，我尝试更改一些文本字符串以匹配原始文件中已知的确切字符串，但这似乎并不能解决问题。下面是当前 XML 文档的示例：

<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
     </content>
</root>

python xml xml 文本解析

import xml.etree.ElementTree as ET

xml= """<root>
     <content>
          <line>The Washington Post</line>
          <line>The Washington Post</line>
          <tag>Der Spiegel</tag>
     </content>
</root>"""

root = ET.fromstring(xml)

pattern ={'title':['The Washington Post', 'Der Spiegel']}

for k, v in pattern.items():
    for elem in root.iter():
        if elem.text in v:
            elem.tag = k
            
ET.dump(root)

输出：

<root>
     <content>
          <title>The Washington Post</title>
          <title>The Washington Post</title>
          <title>Der Spiegel</title>
     </content>
</root>

上一个：Oracle XML 多个节点

下一个：嵌套自定义类型时 xml Unmarshal 中缺少值

添加/替换 XML 标记的 Python 脚本

Python script that adds/replaces XML tags

评论