将 n> 0 的 a^n b^n c^n 与 PCRE 匹配-解网

问：

如何将 n > 0 的 a^n b^n c^n 与 PCRE 匹配？

以下情况应匹配：

abc
aabbcc
aaabbbccc

以下情况不应匹配：

abbc
aabbc
aabbbccc

这是我“尝试过”的; 但这与 n > 0 的 a^n b^n 匹配：/^(a(?1)?b)$/gmx

ab
aabb
aaabbb

Online demo

注意：这个问题与这个问题相同，只是语言发生了变化。

正则表达式 PCRE

^                    # Assert begin of line
    (?=              # First zero-with lookahead
        (            # Capturing group 1
            a        # Match a
            (?1)?    # Recurse group 1 optionally
            b        # Match b
        )            # End of group 1
        c+           # Match c one or more times
    )                # End of the first lookahead

    (?=              # Second zero-with lookahead
        a+           # Match a one or more times
        (            # Capturing group 2
            b        # Match b
            (?2)?    # Recurse group 2 optionally
            c        # Match c
        )            # End of group 2
    )                # End of the second lookahead
a+b+c+               # Match each of a,b and c one or more times
$                    # Assert end of line

Online demo

基本上，我们首先断言存在 a^n b^n，然后我们断言 b^n c^n，这将导致 a^n b^n c^n。

Qtax技巧

（强大的自参捕捕获组）

^(?:a(?=a*(\1?+b)b*(\2?+c)))+\1\2$

这个解决方案也被命名为“Qtax 技巧”，因为它使用的技术与 Qtax 在 ASCII“图像”中的“垂直”正则表达式匹配相同的技术。

所讨论的问题归结为需要断言三个组是相同长度的匹配。作为简化版本，为了匹配：

xyz

其中和实际上只是一个具有匹配长度为的变量的子模式。使用具有自引用捕获组的 lookahead 的表达式，我们指定的字符会添加到 lookahead 的每次重复中，该字符可以有效地用于“计数”：xyznabc

aaabbbccc
 ^  ^  ^

这是通过以下方式实现的：

(?:a…)+子模式的字符匹配。使用，我们直接跳到“计数器”。a(?=a*
(\1?+b)捕获组（）有效地消耗以前匹配的任何内容（如果存在），并使用不允许回溯的所有格匹配，如果计数器不同步，则匹配失败 - 也就是说，子模式多于子模式。在第一次迭代中，这不存在，并且没有匹配的内容。然后，匹配子模式的字符。它被添加到捕获组中，有效地“计数”组中的 b 之一。使用，我们直接跳到下一个“计数器”。\1babb*
(\2?+c)捕获组（）会有效地消耗之前匹配的任何内容，就像上面一样。由于此附加字符捕获的工作方式与上一组相同，因此允许字符在这些字符组中同步长度。假设连续序列为：\2a..b..c..

_{（请原谅我的艺术。}

第一次迭代：

| The first 'a' is matched by the 'a' in '^(?:a…)'.
| The pointer is stuck after it as we begin the lookahead.
v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
 ^^^   |^^^       ^
skipped| skipped  Matched by c in (\2?+c);
by a*  | by b*         \2 was "nothing",
       |               now it is "c".
       Matched by b
       in (\1?+b).
     \1 was "nothing", now it is "b".

第二次迭代：

 | The second 'a' is matched by the 'a' in '^(?:a…)'.
 | The pointer is stuck after it as we begin the lookahead.
 v,- Matcher pointer
aaaa...bbbbbbbb...cccc...
       /|^^^      |^
eaten by| skipped |Matched by c in (\2?+c);
\1?+    | by b*   |     '\2' was "nothing",
  ^^    |      \2?+     now it is "cc".
 skipped|
 by a*  \ Matched by b
          in (\1?+b).
          '\1' was "nothing", now it is "bb".

由于上面讨论的三个组分别“消耗”了、中的一个，因此它们以循环方式匹配，并分别由和组“计数”。通过额外的锚定和捕获我们开始的内容，我们可以断言我们分别匹配、和和和（表示上面的每个组）。abc(?:a…)+(\1?+b)(\2?+c)xyzxyzaⁿbⁿcⁿ

作为奖励，要“数”更多，可以这样做：

Pattern: ^(?:a(?=a*(\1?+b)b*(\2?+c)))+\1{3}\2$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

Pattern: ^(?:a(?=a*(\1?+bbb)b*(\2?+c)))+\1\2$
Matches: abbbc
aabbbbbbcc
aaabbbbbbbbbccc

[伎俩命名咆哮]这真的叫Qtax Trick吗？Qtax的回答参考了PolygeneLubricants的答案作为来源。无论哪种方式，我认为“自引用捕获组”更清晰。[/诡计命名咆哮][致谢与尊重]尊重所有相关方，包括您 Unihedron - 很好的答案！[/承认和尊重]

上一个：Numpy 矢量化小于/大于比较

下一个：找不到要加载 libjscexecutor.so 的 DSO，原因如下：dlopen 失败：找不到符号“--emutls_get_address

将 n> 0 的 a^n b^n c^n 与 PCRE 匹配

Matching a^n b^n c^n for n > 0 with PCRE

评论

评论

Qtax技巧

评论

将 n&gt; 0 的 a^n b^n c^n 与 PCRE 匹配

Matching a^n b^n c^n for n > 0 with PCRE

评论

评论

Qtax技巧

评论

将 n> 0 的 a^n b^n c^n 与 PCRE 匹配