从 XML 文件中提取内容-解网

问：

我有如下xml内容

<Artificial name="Artifical name">
    <Machine>
        <MachineEnvironment uri="environment" />
    </Machine>
    <Mobile>taken phone, test

when r1
    100m SUV
then
    FireFly is High
end


when r2
    Order of the Phonenix 
    
then
    Magic is High
end


</Mobile>
</Artificial>

我想编写一个函数，该函数接受一行（字符串）和内容（字符串），并返回所提供行所属的最接近标签的内容。

例如，如果我提供该行，它应该返回以下内容，因为它是所提供的行所属的最接近的标签。FireFly is High

<Mobile>taken phone, test

when r1
    100m SUV
then
    FireFly is High
end


when r2
    Order of the Phonenix 

then
    Magic is High
end


</Mobile>

以下是我的代码

getLineContent(line: string, content: string) {
    const trimmedLine = line.trim()
    const isSelfClosingTag = /\/\s*>$/.test(trimmedLine)
    const isPlainTextLine = !/<|>/.test(trimmedLine)
    const regex = new RegExp(`(${trimmedLine}[^>]*>)([\\s\\S]*?)</(${trimmedLine.split(' ')[0].substr(1)}>)`)
    const isClosingTag = /^<\/\w+>$/.test(trimmedLine)
    const match = content.match(regex)

    if (!isClosingTag) {
      if (isSelfClosingTag) {
        return trimmedLine
      }

      if (match && match[2]) {
        return match[1] + match[2] + match[3]
      }
      if (isPlainTextLine) {
        const regex = new RegExp(`(<[^>]*>)([\\s\\S]*?${trimmedLine.split(' ')[0].substr(1)}[\\s\\S]*?</[a-zA-Z]+>)`)
        const match = content.match(regex)
        console.log('isPlainTextLine', match)
        if (match && match[1] && match[2]) {
          return match[2]
        }
      }
      return trimmedLine
    }
  }

它几乎完美地工作，但并不完全是。问题出在代码的这一部分

if (isPlainTextLine) {
        const regex = new RegExp(`(<[^>]*>)([\\s\\S]*?${trimmedLine.split(' ')[0].substr(1)}[\\s\\S]*?</[a-zA-Z]+>)`)
        const match = content.match(regex)
        console.log('isPlainTextLine', match)
        if (match && match[1] && match[2]) {
          return match[2]
        }
      }

例如：如果我提供，则返回值为FireFly is High

<Machine>
        <MachineEnvironment uri="environment" />
    </Machine>
    <Mobile>taken phone, test

when r1
    100m SUV
then
    FireFly is High
end


when r2
    Order of the Phonenix 

then
    Magic is High
end


</Mobile>

正则表达式不是我的强项。任何帮助都是值得赞赏的。

JavaScript 正则表达式打字稿

const { XMLParser } = require("fast-xml-parser");

function findText(obj, find, key="") {
    if (typeof obj === "string" && obj.includes(find)) {
        return { [key]: obj };
    }
    if (Object(obj) === obj) {
        for (const key in obj) {
            const result = findText(obj[key], find, key);
            if (result) return result;
       }
    }
}

const xml = `<Artificial name="Artifical name">
    <Machine>
        <MachineEnvironment uri="environment" />
    <\/Machine>
    <Mobile>taken phone, test
    ...
    FireFly is High
    ...
    </Mobile>
<\/Artificial>`;

const obj = new XMLParser().parse(xml);
const result = findText(obj, "FireFly");
console.log(result); // { Mobile: "taken phone, ....... " }

第二个示例是，在浏览器上下文中，可以从 WebAPI 使用：DOMParser

function *iterNodes(doc, whatToShow) { // Generator for createTreeWalker
    const walk = doc.createTreeWalker(doc.documentElement, whatToShow, null, false);
    for (let node; node = walk.nextNode(); null) yield node;
}

function findTagByContent(xml, content) {
    const doc = new DOMParser().parseFromString(xml, "text/xml");
    for (const node of iterNodes(doc, NodeFilter.SHOW_TEXT)) {
        if (node.textContent.includes(content)) return node.parentNode.outerHTML;
    }
}

// Example run

const xml = `<Artificial name="Artifical name">
    <Machine>
        <MachineEnvironment uri="environment" />
    </Machine>
    <Mobile>taken phone, test
    ...
    FireFly is High
    ...
    </Mobile>
</Artificial>`;

console.log(findTagByContent(xml, "FireFly"));

上一个：是否可以动态设置表单输入的模式属性以匹配该表单中另一个输入的输入值？

下一个：在 JavaScript 中解码字符串后删除转义字符时出现问题

从 XML 文件中提取内容

extract content from a xml file

评论