提问人:justnewbie89 提问时间:11/17/2023 更新时间:11/17/2023 访问量:80
如何将 XML 数据解析为 Pandas Dataframe(获取特定属性信息)[已关闭]
How Parse XML Data to Pandas Dataframe (to get specific attribute information) [closed]
问:
我有这个XML数据
<export>
<studentinfo school="SCHOOL 001">
<remark>JAVA STUDENT</remark>
<SocmedUrl>NONE</SocmedUrl>
<programme level="elementary school">
<summary/>
</programme>
<score math="90" physics="100" biology="80">
<summary/>
</score>
<score economic="70" geography="80" sociology="70">
<summary/>
</score>
<address region="central java" >
<summary/>
</address>
<birthdate birthdate="2004-05-10" dayofmonth="10" monthofyear="5" year="2004" city="kendal">
<summary/>
</birthdate>
</studentinfo>
</export>
目标是使用 Python 将 XML 数据解析为 CSV。之前我在这个线程中问过 如何将XML数据解析为Pandas Dataframe 。现在我有新的用例来解析 XML,但只有元素中的特定属性,例如,只有 Math 用于第 1 个元素分数,而 Economic 用于第 2 个元素分数(并且 score 元素合并为 1 列),并且仅选择属性 birthdate 元素的 birthdate。目标是拥有这样的表格
答:
0赞
jdweng
11/17/2023
#1
使用 Powershell 脚本
using assembly System.Xml.Linq
$inputFilename = "c:\temp\test.xml"
$outputFilename = "c:\temp\test.csv"
$doc = [System.Xml.Linq.XDocument]::Load($inputFilename)
$table = [System.Collections.ArrayList]::new()
$studentinfo = $doc.Descendants('studentinfo')[0]
$remark = $studentinfo.Element('remark').Value
$programme = $studentinfo.Element('programme').Attribute('level').Value
$scores = $studentinfo.Elements('score')
$scoreArray = [System.Collections.ArrayList]::new()
foreach($score in $scores)
{
foreach($attribute in $score.Attributes())
{
$scoreArray.Add([string]::Format('{0}:{1}',$attribute.Name.LocalName,$attribute.Value))
}
}
$address = $studentinfo.Element('address').Attribute('region').Value
$birthdate = $studentinfo.Element('birthdate').Attribute('birthdate').Value
$city = $studentinfo.Element('birthdate').Attribute('city').Value
$newRow = [pscustomobject]@{
remark = $remark
programme = $programme
score = [string]::Join(',',@($scoreArray))
address = $address
birthdate = [string]::Format('{0}, city:{1}',$birthdate,$city)
}
$table.Add($newRow) | Out-Null
$table | Export-Csv -Path $outputFilename -NoTypeInformation
结果
"remark","programme","score","address","birthdate"
"JAVA STUDENT","elementary school","math:90,physics:100,biology:80,economic:70,geography:80,sociology:70","central java","2004-05-10, city:kendal"
评论