提问人:CraZyDroiD 提问时间:11/8/2023 更新时间:11/8/2023 访问量:56
从 XML 文件中提取内容
extract content from a xml file
问:
我有如下xml内容
<Artificial name="Artifical name">
<Machine>
<MachineEnvironment uri="environment" />
</Machine>
<Mobile>taken phone, test
when r1
100m SUV
then
FireFly is High
end
when r2
Order of the Phonenix
then
Magic is High
end
</Mobile>
</Artificial>
我想编写一个函数,该函数接受一行(字符串)和内容(字符串),并返回所提供行所属的最接近标签的内容。
例如,如果我提供该行,它应该返回以下内容,因为它是所提供的行所属的最接近的标签。FireFly is High
<Mobile>taken phone, test
when r1
100m SUV
then
FireFly is High
end
when r2
Order of the Phonenix
then
Magic is High
end
</Mobile>
以下是我的代码
getLineContent(line: string, content: string) {
const trimmedLine = line.trim()
const isSelfClosingTag = /\/\s*>$/.test(trimmedLine)
const isPlainTextLine = !/<|>/.test(trimmedLine)
const regex = new RegExp(`(${trimmedLine}[^>]*>)([\\s\\S]*?)</(${trimmedLine.split(' ')[0].substr(1)}>)`)
const isClosingTag = /^<\/\w+>$/.test(trimmedLine)
const match = content.match(regex)
if (!isClosingTag) {
if (isSelfClosingTag) {
return trimmedLine
}
if (match && match[2]) {
return match[1] + match[2] + match[3]
}
if (isPlainTextLine) {
const regex = new RegExp(`(<[^>]*>)([\\s\\S]*?${trimmedLine.split(' ')[0].substr(1)}[\\s\\S]*?</[a-zA-Z]+>)`)
const match = content.match(regex)
console.log('isPlainTextLine', match)
if (match && match[1] && match[2]) {
return match[2]
}
}
return trimmedLine
}
}
它几乎完美地工作,但并不完全是。问题出在代码的这一部分
if (isPlainTextLine) {
const regex = new RegExp(`(<[^>]*>)([\\s\\S]*?${trimmedLine.split(' ')[0].substr(1)}[\\s\\S]*?</[a-zA-Z]+>)`)
const match = content.match(regex)
console.log('isPlainTextLine', match)
if (match && match[1] && match[2]) {
return match[2]
}
}
例如:如果我提供,则返回值为FireFly is High
<Machine>
<MachineEnvironment uri="environment" />
</Machine>
<Mobile>taken phone, test
when r1
100m SUV
then
FireFly is High
end
when r2
Order of the Phonenix
then
Magic is High
end
</Mobile>
正则表达式不是我的强项。任何帮助都是值得赞赏的。
答:
2赞
trincot
11/8/2023
#1
正则表达式不是完成此任务的正确工具。为此,请改用 XML 解析器。有很多可供选择。例如,您可以使用 fast-xml-parser。它将 XML 转换为嵌套对象结构。演示:
const { XMLParser } = require("fast-xml-parser");
function findText(obj, find, key="") {
if (typeof obj === "string" && obj.includes(find)) {
return { [key]: obj };
}
if (Object(obj) === obj) {
for (const key in obj) {
const result = findText(obj[key], find, key);
if (result) return result;
}
}
}
const xml = `<Artificial name="Artifical name">
<Machine>
<MachineEnvironment uri="environment" />
<\/Machine>
<Mobile>taken phone, test
...
FireFly is High
...
</Mobile>
<\/Artificial>`;
const obj = new XMLParser().parse(xml);
const result = findText(obj, "FireFly");
console.log(result); // { Mobile: "taken phone, ....... " }
第二个示例是,在浏览器上下文中,可以从 WebAPI 使用:DOMParser
function *iterNodes(doc, whatToShow) { // Generator for createTreeWalker
const walk = doc.createTreeWalker(doc.documentElement, whatToShow, null, false);
for (let node; node = walk.nextNode(); null) yield node;
}
function findTagByContent(xml, content) {
const doc = new DOMParser().parseFromString(xml, "text/xml");
for (const node of iterNodes(doc, NodeFilter.SHOW_TEXT)) {
if (node.textContent.includes(content)) return node.parentNode.outerHTML;
}
}
// Example run
const xml = `<Artificial name="Artifical name">
<Machine>
<MachineEnvironment uri="environment" />
</Machine>
<Mobile>taken phone, test
...
FireFly is High
...
</Mobile>
</Artificial>`;
console.log(findTagByContent(xml, "FireFly"));
评论