Groovy - 删除 XML 有效负载中的非唯一值

Groovy - remove non unique values in a XML payload

提问人:N21RL 提问时间:10/7/2023 更新时间:10/7/2023 访问量:43

问:

我试图从xml有效负载中提取两个节点。但这会导致一些重复的值。有没有办法获得唯一的值组合或稍后删除重复值。

import java.text.*
import groovy.xml.*

def text = '''
<root>
  <results>
    <loc>Loc 10</loc>
    <city>ABC</city>
    <points>3</points>
    <StartDate>2023-09-11T22:39:40Z</StartDate>
    <EndDate>2023-09-13T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
   <city>ABC</city> 
    <points>4</points>
    <StartDate>2023-09-18T22:39:40Z</StartDate>
    <EndDate>2023-09-18T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
    <city>ABC</city>
    <points>4</points>
    <StartDate>2023-02-16T22:39:40Z</StartDate>
    <EndDate>2023-09-18T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
    <city>XYZ</city>
    <points>4</points>
    <StartDate>2023-09-16T22:39:40Z</StartDate>
    <EndDate>2023-12-18T22:45:36.437000Z</EndDate>
  </results>
</root>
'''
def xml = new XmlSlurper().parseText( text )
def output = new XmlParser().parseText("<root/>")

xml.results.each { resXml ->
       Node resultsNode = output.appendNode( new QName("results"), [:] )
       resXml.children().findAll { child -> child.name() != "points" && child.name()!= "StartDate" && child.name() != "EndDate" }.each { child ->
          resultsNode.appendNode( new QName(child.name()), [:], child.text() )
       }

       
}
       


println XmlUtil.serialize(output ) 

上面的代码生成了以下输出:

<?xml version="1.0" encoding="UTF-8"?><root>
    
  <results xmlns="">
        
    <loc>Loc 10</loc>
        
    <city>ABC</city>
      
  </results>
    
  <results xmlns="">
        
    <loc>Loc 11</loc>
        
    <city>ABC</city>
        
    <loc_name>Loc Desc 11</loc_name>
      
  </results>
    
  <results xmlns="">
        
    <loc>Loc 11</loc>
        
    <city>ABC</city>
      
  </results>
    
  <results xmlns="">
        
    <loc>Loc 11</loc>
        
    <city>XYZ</city>
      
  </results>
  
</root>

它会生成一些重复项。有没有办法删除或仅向新有效负载添加唯一值?

xml groovy xml-解析

评论


答:

1赞 injecteer 10/7/2023 #1

我会添加一个 Set 来检查重复项:

import java.text.*
import groovy.xml.*

def text = '''
<root>
  <results>
    <loc>Loc 10</loc>
    <city>ABC</city>
    <points>3</points>
    <StartDate>2023-09-11T22:39:40Z</StartDate>
    <EndDate>2023-09-13T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
   <city>ABC</city> 
    <points>4</points>
    <StartDate>2023-09-18T22:39:40Z</StartDate>
    <EndDate>2023-09-18T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
    <city>ABC</city>
    <points>4</points>
    <StartDate>2023-02-16T22:39:40Z</StartDate>
    <EndDate>2023-09-18T22:45:36.437000Z</EndDate>
  </results>
  <results>
    <loc>Loc 11</loc>
    <city>XYZ</city>
    <points>4</points>
    <StartDate>2023-09-16T22:39:40Z</StartDate>
    <EndDate>2023-12-18T22:45:36.437000Z</EndDate>
  </results>
</root>
'''
def xml = new XmlSlurper().parseText( text )
def output = new XmlParser().parseText("<root/>")

Set uniques = new HashSet()

xml.results.each { resXml ->
    if( !uniques.add( resXml.loc.text() + '-' + resXml.city.text() ) ) return
    
    Node resultsNode = output.appendNode( new QName("results"), [:] )
    resXml.children().findAll { child -> child.name() in [ 'loc', 'city' ] }.each { child ->
        resultsNode.appendNode( new QName(child.name()), [:], child.text() )
    }
}

XmlUtil.serialize( output )

返回

<?xml version="1.0" encoding="UTF-8"?><root>
    
  <results>
        
    <loc>Loc 10</loc>
        
    <city>ABC</city>
      
  </results>
    
  <results>
        
    <loc>Loc 11</loc>
        
    <city>ABC</city>
      
  </results>
    
  <results>
        
    <loc>Loc 11</loc>
        
    <city>XYZ</city>
      
  </results>
  
</root>

评论

0赞 N21RL 10/7/2023
感谢您的解决方案!有没有办法添加组合 loc 和 city 进行设置,因为两者的组合也可能不同?
0赞 injecteer 10/7/2023
您可以同时连接这两个值 ->查看更新