Javascript 中的正则表达式 - 在标签后查找多行文本

Regular Expressions in Javascript - Find Multiple Lines of Text After a Tag

提问人:alyssaeliyah 提问时间:3/10/2023 最后编辑:Peter Seligeralyssaeliyah 更新时间:3/15/2023 访问量:67

问:

我将下面的文本存储在变量中:description

This is a code update

Official Name: None

Pub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021

Agency:  

Reference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm

Citation: WAC 51-52 / WSR 23-02-055

Draft Doc Title: WSR 23-02-055 (#1)

Draft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

Draft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)

Final Doc Title: 

IECC Com Update(#1)

IECC Res Update (#2)

Final Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)

Final Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)

Effective Date:  January 4, 2023

我想在“Final Doc Title:”标签之后提取信息。它应该给我两个值。第一个值是 和 。我在下面有一个代码,可以提取标签后面的文本,直到找到新的行字符。IECC Com Update(#1)IECC Res Update (#2)

//8. Extract Final Doc Title
var final_doc_title = description.search("Final Doc Title:");
if(final_doc_title != -1){
    final_doc_title = description.match(/(?<=^Final Doc Title:)[^\n\r]+/m);
    final_doc_title = final_doc_title?.[0].trim();
}else{
    final_doc_title = '';
}
console.log('Final Doc Title: ' + final_doc_title);

此代码的问题在于它返回一个空字符串,因为在“Final Doc Title:”之后有一个换行符。

Final Doc Title:\n
IECC Com Update(#1)\n
IECC Com Update(#1)\n

我将如何修改我的代码以返回两行?谢谢!

javascript 正则表达式 匹配 capturing-group

评论

1赞 Wiktor Stribiżew 3/10/2023
或?请注意,在 JS 中,不匹配 ,因此您可以安全地替换为 。/(?<=^Final Doc Title:\s*\n).*(?:\r?\n.*)?/m.\r[^\r\n].

答:

2赞 trincot 3/10/2023 #1

您可以将这些换行符与 匹配,假设您对要查找的文本前面的空格不感兴趣。\s*

如果要查找的文本在带有冒号的行(如 中)之前结束,则可以执行以下操作:Final Source Doc: https:....

const description = "This is a code update\n\nOfficial Name: None\n\nPub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021\n\nAgency:  \n\nReference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm\n\nCitation: WAC 51-52 / WSR 23-02-055\n\nDraft Doc Title: WSR 23-02-055 (#1)\n\nDraft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)\n\nDraft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)\n\nFinal Doc Title: \n\nIECC Com Update(#1)\n\nIECC Res Update (#2)\n\nFinal Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)\n\nFinal Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)\n\nEffective Date:  January 4, 2023\nI want to extract the information after 'Final Doc Title:' tag. It should give me two values. The first value is IECC Com Update(#1) and IECC Res Update (#2). I have a code below that extracts the text after the tag until a new line character is found.\n\n//8. Extract Final Doc Title";

var result = description.match(/^Final Doc Title:\s*((?:\s*^(?:[^:\r\n]*)$)*)/m)?.[1];
var parts = result?.match?.(/.+/gm);
console.log(parts);

2赞 Peter Seliger 3/10/2023 #2

一个简单的多行仅标记正则表达式,如下所示......

/^Final Doc Title:\s+(.+)\s+(.+)/m

...它具有 2 个捕获组(它们不一定被命名)已经完成了这项工作。

const regXDocTitles = /^Final Doc Title:\s+(.+)\s+(.+)/m;
const sampleText =
`This is a code update

Draft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

Draft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)

Final Doc Title: 

IECC Com Update(#1)

IECC Res Update (#2)

Final Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)


Final Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

Effective Date:  January 4, 2023`;

console.log(
  sampleText?.match(regXDocTitles)
)
console.log(
  sampleText?.match(regXDocTitles)?.slice(-2)
)
.as-console-wrapper { min-height: 100%!important; top: 0; }