提问人:P. Luo 提问时间:9/19/2023 更新时间:9/19/2023 访问量:58
如何使用 bash regex、sed、grep 或 groovy 从 GitHub 压缩的提交消息中提取 Jira ID?
How to extract a Jira Id from a GitHub squashed commit message using bash regex, sed, grep or groovy?
问:
我想从压缩合并 GitHub PR 生成的提交消息字符串中提取 Jira ID。如果存在多个 Jira ID,我想提取第一个 Jira ID。
如何使用 bash regex、sed、grep、groovy 或任何其他可用的逗号行工具做到这一点?
测试用例:
ABCD-1231 dummy title XYZ-566 (#423)
=> ABCD-1231
[ABCD-1232, XYZ-566] dummy title (#424)
=> ABCD-1232
[ABCD-1233] dummy title (#425)
=> ABCD-1233
ABCD-1234: dummy title (#426)
=> ABCD-1234
XYZ-567 dummy title (#427)
=> XYZ-567
(XYZ-568) dummy title (#428)
=> XYZ-568
"XYZ-569" dummy title (#429)
=> XYZ-569
dummy title XYZ-570 dummy title (#430)
=> XYZ-570
DUMMY title XYZ-571 dummy title (#431)
=> XYZ-571
'feature/XYZ-572' dummy title (#432)
=> XYZ-572
FEATURE|XYZ-573 dummy title (#433)
=> XYZ-573
<Feature\XYZ-574> dummy title (#434)
=> XYZ-574
dummy title FAKE-XYZ-575 dummy title (#435)
=> <nothing>
dummy title abcdXYZ-576 dummy title (#436)
=> <nothing>
在 Bash 中,
"[^A-Z]*(([A-Z]+-[0-9]+)*.*) \(#([0-9]+)\)"
会失败
- “特点|XYZ-573 虚拟标题 (#433)”
- “[Feature\XYZ-574] 虚拟标题 (#434)”
- ...
bash 似乎不支持诸如以下内容的负面 lookbehind:".*[^A-Z]*((?<!([A-Z]+)-?)[A-Z]+-[0-9]+).*"
任何人都可以提出解决方案或正确的工具来完成这项任务吗?
答:
2赞
Patrick Janser
9/19/2023
#1
您可以尝试使用该选项启用 PCRE 引擎。这可能取决于您在
服务器。它将允许您使用环顾。grep
-P
grep
您还可以使用该选项限制匹配行,但它
不会对我们有太大帮助,因为提交消息在一行上。我会用
以防万一您的提交消息在几行上。这可以
节省几个 CPU 周期。-m 1
该选项将仅输出匹配项。然后我们可以通过
输出 to 以便仅接受第一个匹配项。-o
head
对于模式,我会尝试使用 .
我使用了一个积极的后视来匹配
不是单词或连字符的行或任何字符。(?<=^|[^\w-])[A-Z]+-\d+
我在以下 bash 脚本中测试了您的所有提交消息:
#!/bin/bash
commits=(
"ABCD-1231 dummy title XYZ-566 (#423)"
"[ABCD-1232, XYZ-566] dummy title (#424)"
"[ABCD-1233] dummy title (#425)"
"ABCD-1234: dummy title (#426)"
"XYZ-567 dummy title (#427)"
"(XYZ-568) dummy title (#428)"
'"XYZ-569" dummy title (#429)'
"dummy title XYZ-570 dummy title (#430)"
"DUMMY title XYZ-571 dummy title (#431)"
"'feature/XYZ-572' dummy title (#432)"
"FEATURE|XYZ-573 dummy title (#433)"
"<Feature\XYZ-574> dummy title (#434)"
"dummy title FAKE-XYZ-575 dummy title (#435)"
"dummy title abcdXYZ-576 dummy title (#436)"
)
for (( i=0; i<${#commits[@]}; i++ ))
do
echo ${commits[$i]}
# A) My first attempt, using head to only get the first match.
echo ${commits[$i]} | grep -P -m 1 -o '(?<=^|[^\w-])[A-Z]+-\d+' | head -n1
# B) InSync's more sofisticated solution to match only the first
# occurrence with the help of \K, which resets the starting point
# of the reported match. This is a good way to consume characters
# which we don't want in the output. It's also used because we can't
# solve this with a positive lookbehind as the latter has to be a fixed
# length pattern (not the case because of the ungreedy .*? pattern).
# My positive lookbehind (?<=^|[^\w-]) can also be replaced by a
# shorter negative lookbehind (?<![\w-])
echo ${commits[$i]} | grep -P -m 1 -o '^.*?\K(?<![\w-])[A-Z]+-\d+'
done
编辑
@InSync,感谢您的聪明解决方案不使用和
通过使用 PCRE 的模式来重置
报告匹配。它被用来以一种不贪婪的方式消耗所有的
第一个 Jira ID 之前的字符。我已将其添加到批处理中
上面的B)点下。head
\K
输出,A) 和 B) :
ABCD-1231 dummy title XYZ-566 (#423)
ABCD-1231
ABCD-1231
[ABCD-1232, XYZ-566] dummy title (#424)
ABCD-1232
ABCD-1232
[ABCD-1233] dummy title (#425)
ABCD-1233
ABCD-1233
ABCD-1234: dummy title (#426)
ABCD-1234
ABCD-1234
XYZ-567 dummy title (#427)
XYZ-567
XYZ-567
(XYZ-568) dummy title (#428)
XYZ-568
XYZ-568
"XYZ-569" dummy title (#429)
XYZ-569
XYZ-569
dummy title XYZ-570 dummy title (#430)
XYZ-570
XYZ-570
DUMMY title XYZ-571 dummy title (#431)
XYZ-571
XYZ-571
'feature/XYZ-572' dummy title (#432)
XYZ-572
XYZ-572
FEATURE|XYZ-573 dummy title (#433)
XYZ-573
XYZ-573
<Feature\XYZ-574> dummy title (#434)
XYZ-574
XYZ-574
dummy title FAKE-XYZ-575 dummy title (#435)
dummy title abcdXYZ-576 dummy title (#436)
评论
0赞
Patrick Janser
9/19/2023
非常感谢@InSync!非常好的简化建议!我已将其添加到答案中。我保留了这两种解决方案,因为对于一些不太高级的人来说,可能会感觉有点复杂。\K
0赞
P. Luo
9/20/2023
非常感谢您的帮助!受到您的回答的启发,现在我还可以使用 提取 PR 编号(例如,“433”),并使用 提取 PR 编号部分之前的描述。grep -oP -m 1 '^.+?\(#\K\d+(?=\))
grep -oP -m 1 '^.+?(?=$| \(#\d+\))'
0赞
Patrick Janser
9/20/2023
@P.罗:太好了!很高兴您设法完全解决了您的问题!
评论