提问人:VRashi 提问时间:11/10/2023 更新时间:11/10/2023 访问量:13
ICDAR2015视频中的文本XML转换为CVS文件
ICDAR2015 text in video xml to cvs file conversion
问:
using text for Text in Video dataset, downloaded both text localization data set and end-to-end data set.for each video here we get all text listed in one file, all clearly visible text with ID in Video___GT.txt file and Video___GT.xml file with all frames with <frame ID> , transcription, quality of text and the bounding box coordinates given . I want to convert this XML file to CSV file, where frame ID is the name of the gt_frameID.txt file with bounding box coordinates and transcrption along with it.Here is a sample ground truth xml file.
`<Frames>
<frame ID="1">
<object Transcription="##DONT#CARE##" ID="1001" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="422" y="56"/>
<Point x="494" y="53"/>
<Point x="495" y="71"/>
<Point x="423" y="74"/>
</object>
<object Transcription="##DONT#CARE##" ID="1002" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="422" y="23"/>
<Point x="495" y="22"/>
<Point x="495" y="37"/>
<Point x="422" y="37"/>
</object>
<object Transcription="##DONT#CARE##" ID="1003" Language="Spanish" Mirrored="Unmirrored" Quality="LOW">
<Point x="327" y="123"/>
<Point x="345" y="123"/>
<Point x="345" y="138"/>
<Point x="328" y="138"/>
</object>
</frame>
<Frames>`
XML 文件中还有更多帧。 我尝试了以下代码,以提取其 ID 作为文件名的帧,但我无法找出所有 ###DONT CARE## 的 trasction 值,而对于所有文件,我只能获得 3 行值。
import xml.etree.ElementTree as ET
import os as os
os.chdir("E:\\myPrj20\\k_E\\ch3_train")
tree = ET.parse('Video_2_1_2_GT.xml')
os.mkdir('Video_2_1_2_GTtxt') # to create gt files from frame id
os.chdir('.\\Video_2_1_2_GTtxt\\') #moving to the above ditectory
root = tree.getroot()
print(root)
print(len(root[0]))
print('____________sub elemnt of root node_____________')
d=root.find('frame')
print(list(d))
print('____________sub elemnt of frame element_____________')
e=d.find('object')
print(list(e))
print('____________sub elemnt of object element_____________')
object_p={}
obj_v={}
sub1=e.findall('*/Point')
print(len(sub1))
all_data=[]
str1=""
xy_str=""
for id in root:
print('_____frame id_______',id.attrib['ID'])
fname=id.attrib['ID'] # opening a file with franeID
file1 = open(id.attrib['ID'], "w")
for obj in root.find('frame'):
obj_key=(obj.attrib.keys())
obj_v=obj.attrib.values()
print(obj_key)
print(obj_v)
str1=str(obj.attrib['Transcription'])+str(obj.attrib['Language'])+str(obj.attrib['ID'])
print(obj.attrib['Language'],obj.attrib['ID'])
record={}
for pt in obj.findall('Point'):
rec_key=(pt.attrib.keys())
print('x=',pt.attrib['x'],'y=',pt.attrib['y'])
xy_str=str(pt.attrib['x'])+","+str(pt.attrib['y']+",")
file1.write(xy_str)
file1.write(","+str1+"\n")
print("current directroy is",os.getcwd())
print('done')
在给定的输出屏幕截图中,帧 ID 是 248,其中一个 trasncription 是“social”,但在 groundtruth 文本文件中没有这样的值,在帧 ID 248 中我们可以看到 5 个边界框坐标值,但在文本文件中我们只能看到 3 个,甚至坐标值也不同。如何解决。需要帮助。
答: 暂无答案
下一个:呈现 xml 接口
评论