ICDAR2015视频中的文本XML转换为CVS文件-解网

问：

using text for Text in Video dataset, downloaded both text localization data set and end-to-end data set.for each video here we get all text listed in one file, all clearly visible text with ID in Video___GT.txt file and Video___GT.xml file with all frames with <frame ID> , transcription, quality of text and the bounding box coordinates given . I want to convert this XML file to CSV file, where frame ID is the name of the gt_frameID.txt file with bounding box coordinates  and transcrption along with it.Here is a sample ground truth xml file.

`<Frames>
<frame ID="1">
<object Transcription="##DONT#CARE##" ID="1001" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="422" y="56"/>
<Point x="494" y="53"/>
<Point x="495" y="71"/>
<Point x="423" y="74"/>
</object>
<object Transcription="##DONT#CARE##" ID="1002" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="422" y="23"/>
<Point x="495" y="22"/>
<Point x="495" y="37"/>
<Point x="422" y="37"/>
</object>
<object Transcription="##DONT#CARE##" ID="1003" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="327" y="123"/>
<Point x="345" y="123"/>
<Point x="345" y="138"/>
<Point x="328" y="138"/>
</object>
</frame>
<Frames>`

XML 文件中还有更多帧。我尝试了以下代码，以提取其 ID 作为文件名的帧，但我无法找出所有 ###DONT CARE## 的 trasction 值，而对于所有文件，我只能获得 3 行值。

    import xml.etree.ElementTree as ET
    import os as os
    
    os.chdir("E:\\myPrj20\\k_E\\ch3_train")
    tree = ET.parse('Video_2_1_2_GT.xml')
    os.mkdir('Video_2_1_2_GTtxt')   # to create gt files from frame id
    os.chdir('.\\Video_2_1_2_GTtxt\\') #moving to the above ditectory
    root = tree.getroot()
    print(root)
    print(len(root[0]))

    print('____________sub elemnt of root node_____________')
    d=root.find('frame')
    print(list(d))
    print('____________sub elemnt of frame element_____________')
    e=d.find('object')
    print(list(e))
    
    print('____________sub elemnt of object element_____________')
    object_p={}
    obj_v={}
    sub1=e.findall('*/Point')
    print(len(sub1))
    all_data=[]
    str1=""
    xy_str=""

    for id in root:
        print('_____frame id_______',id.attrib['ID'])
        fname=id.attrib['ID']    # opening a file with franeID
        file1 = open(id.attrib['ID'], "w") 
        for obj in root.find('frame'):
            obj_key=(obj.attrib.keys())
            obj_v=obj.attrib.values()
            print(obj_key)
            print(obj_v)
           str1=str(obj.attrib['Transcription'])+str(obj.attrib['Language'])+str(obj.attrib['ID'])
            print(obj.attrib['Language'],obj.attrib['ID'])
            record={}
            
            for pt in obj.findall('Point'):
                rec_key=(pt.attrib.keys())
                
                print('x=',pt.attrib['x'],'y=',pt.attrib['y'])
                
                xy_str=str(pt.attrib['x'])+","+str(pt.attrib['y']+",")
                file1.write(xy_str)
            file1.write(","+str1+"\n")
    print("current directroy is",os.getcwd())
    print('done')

上述代码的输出

在给定的输出屏幕截图中，帧 ID 是 248，其中一个 trasncription 是“social”，但在 groundtruth 文本文件中没有这样的值，在帧 ID 248 中我们可以看到 5 个边界框坐标值，但在文本文件中我们只能看到 3 个，甚至坐标值也不同。如何解决。需要帮助。

python xml csv

ICDAR2015视频中的文本XML转换为CVS文件

ICDAR2015 text in video xml to cvs file conversion

评论