提问人:ScriptPhoenix 提问时间:7/13/2023 最后编辑:ScriptPhoenix 更新时间:7/13/2023 访问量:45
导入具有重复标记的 XML 文件,每行中都包含父信息
Importing an XML File With Repeating Tags Including the Parent Info In Each Row
问:
我正在尝试导入一个 xml 并获取一个表或数据帧,每个 P 都有一行,每个父级(DIV6 和 DIV8)的 HEAD 都有一列。
示例文本如下,但有关完整 xml,请参阅 https://www.ecfr.gov/api/versioner/v1/full/2023-07-11/title-14.xml?part=27。
<DIV5 N="27" TYPE="PART" VOLUME="1" hierarchy_metadata="{"path":"/on/_SUBSTITUTE_DATE_/title-14/part-27","citation":"14 CFR Part 27","alternate_reference":"FAR Part 27"}">
<DIV6 N="A" TYPE="SUBPART" hierarchy_metadata="{"path":"/on/_SUBSTITUTE_DATE_/title-14/part-27/subpart-A","citation":"14 CFR Part 27 Subpart A","alternate_reference":"FAR Part 27 Subpart A"}">
<HEAD>Subpart A—General</HEAD>
<DIV8 N="27.1" TYPE="SECTION" VOLUME="1" hierarchy_metadata="{"path":"/on/_SUBSTITUTE_DATE_/title-14/section-27.1","citation":"14 CFR 27.1","alternate_reference":"FAR 27.1"}">
<HEAD>§ 27.1 Applicability.</HEAD>
<P>(a) This part prescribes airworthiness standards for the issue of type certificates, and changes to those certificates, for normal category rotorcraft with maximum weights of 7,000 pounds or less and nine or less passenger seats. </P>
<P>(b) Each person who applies under Part 21 for such a certificate or change must show compliance with the applicable requirements of this part. </P>
<P>(c) Multiengine rotorcraft may be type certified as Category A provided the requirements referenced in appendix C of this part are met. </P>
<CITA TYPE="N">[Doc. No. 5074, 29 FR 15695, Nov. 24, 1964, as amended by Amdt. 27–33, 61 FR 21906, May 10, 1996; Amdt. 27–37, 64 FR 45094, Aug. 18, 1999] </CITA>
</DIV8>
<DIV8 N="27.2" TYPE="SECTION" VOLUME="1" hierarchy_metadata="{"path":"/on/_SUBSTITUTE_DATE_/title-14/section-27.2","citation":"14 CFR 27.2","alternate_reference":"FAR 27.2"}">
<HEAD>§ 27.2 Special retroactive requirements.</HEAD>
<P>(a) For each rotorcraft manufactured after September 16, 1992, each applicant must show that each occupant's seat is equipped with a safety belt and shoulder harness that meets the requirements of paragraphs (a), (b), and (c) of this section. </P>
<P>(1) Each occupant's seat must have a combined safety belt and shoulder harness with a single-point release. Each pilot's combined safety belt and shoulder harness must allow each pilot, when seated with safety belt and shoulder harness fastened, to perform all functions necessary for flight operations. There must be a means to secure belts and harnesses, when not in use, to prevent interference with the operation of the rotorcraft and with rapid egress in an emergency. </P>
<P>(2) Each occupant must be protected from serious head injury by a safety belt plus a shoulder harness that will prevent the head from contacting any injurious object. </P>
<P>(3) The safety belt and shoulder harness must meet the static and dynamic strength requirements, if applicable, specified by the rotorcraft type certification basis. </P>
<P>(4) For purposes of this section, the date of manufacture is either— </P>
<P>(i) The date the inspection acceptance records, or equivalent, reflect that the rotorcraft is complete and meets the FAA-Approved Type Design Data; or </P>
<P>(ii) The date the foreign civil airworthiness authority certifies that the rotorcraft is complete and issues an original standard airworthiness certificate, or equivalent, in that country. </P>
<P>(b) For rotorcraft with a certification basis established prior to October 18, 1999— </P>
<CITA TYPE="N">[Doc. No. 26078, 56 FR 41051, Aug. 16, 1991, as amended by Amdt. 27–37, 64 FR 45094, Aug. 18, 1999] </CITA>
</DIV8>
</DIV6>
</DIV5>
下面的代码很接近,至少得到了 DIV8 头和最后一个 P。但是,我错过了所有剩余的 P。我认为问题是 P 标签的名称都相同,并且一个区域内有多个标签,但我无法找到有效的解决方案。如果您建议也获得带有 DIV6 头的柱子,则加分。我并不拘泥于使用熊猫,这只是我最接近解决方案的方法。
import pandas as pd
x=pd.read_xml("https://www.ecfr.gov/api/versioner/v1/full/2023-07-11/title-14.xml?part=27", xpath="/DIV5/DIV6//DIV8")
x.to_csv("Export.csv")
仅供参考,我确实注意到 P 标签中有一些斜体标记,在处理 xml 文件之前,我将使用简单的搜索和替换斜体标签来离线处理这些标记,因此这不是问题。
答:
0赞
Siebe Jongebloed
7/13/2023
#1
如何使用此 XPath:
xpath="//*[HEAD]"
这将选择所有带有 .
问题在于,有些人拥有而另一些人没有......(我不熟悉熊猫)DIV?
HEAD
P
评论
0赞
ScriptPhoenix
7/14/2023
谢谢,这确实给了所有的头部领域,这很好!但是它们与 p 字段不在同一行中,我可能会稍后做一些事情来分组。但至少对于大熊猫来说,它每个头部字段在文档中按顺序排列在一列中,而不是分组。
评论