ANTLR4 语法未正确匹配字符串中的转义引号-解网

问：

我正在尝试为一种语言创建一种语法，该语言对字符串使用双引号并允许使用反斜杠转义引号。我正在使用 ANTLR4 来解析输入。

我定义了以下规则来匹配字符串：

STRING:
    '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
fragment
ESC_SEQ
    :   '\\'
        (   // The standard escaped character set such as tab, newline, etc.
            [btnfr"'\\]
            |
        |   // A Java style Unicode escape sequence
            UNICODE_ESC
        |   // Invalid escape
            .
        |   // Invalid escape at end of file
            EOF
        )
    ;

fragment
UNICODE_ESC
    :   'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;

但是，此规则似乎无法正确匹配字符串末尾包含转义引号的字符串。例如，字符串被正确解析，但当我的字符串像这样时，这个规则不起作用。它也适用于 \n 和其他转义字符。"test \"string\" that works""test string that does \"not work\""

（我期待看到输出）"test string that "works""

我尝试修改规则以转义引号字符中的反斜杠，如下所示：

STRING:
    '"' ( ESC_SEQ | ~('\\'|'"') )* '"' | ('\\' '"'))
fragment
ESC_SEQ
    :   '\\'
        (   // The standard escaped character set such as tab, newline, etc.
            [btnfr"'\\]
            |
        |   // A Java style Unicode escape sequence
            UNICODE_ESC
        |   // Invalid escape
            .
        |   // Invalid escape at end of file
            EOF
        )
    ;

fragment
UNICODE_ESC
    :   'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;
    ;

但这仍然行不通。

我做错了什么？如何修改语法以正确匹配带有转义引号的字符串？

解析 ANTLR ANTLR4 EBNF

提示：仅靠 ANTLR 是不可能做到的，您需要回退以实现各种用例的代码处理程序......例如，\u1234 变成一个 unicode 字符需要你告诉它应该如何解析：'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT？{ setText（Character.toString（（char） Integer.parseInt（getText（）.substring（2）， 16）））;

0赞 Oguzhan Kose 4/4/2023

你说ESC_SEQ不“取消转义”序列是什么意思？Sory我无法理解。

0赞 Cine 4/14/2023

@OguzhanKose 逐字逐句地说，你有一个名为 ESC_SEQ 的规则，它解析一个“\”，然后是其他东西，例如“\”，但它只是解析它，它没有将其解释为含义，因此 INPUT 中的“\”' 是 OUTPUT 中的 '\“'。

0赞 Bart Kiers 4/3/2023 #2

我无法重现这一点。以下 4 个字符串全部匹配：

""
"simple"
"test \"string\" that works"
"test string that does \"not work\""

用语法测试：

lexer grammar StringLexer;

STRING
    :   '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

SPACE
    : [ \t\r\n] -> skip
    ;

fragment
ESC_SEQ
    :   '\\'
        (   // The standard escaped character set such as tab, newline, etc.
            [btnfr"'\\]
            |
        |   // A Java style Unicode escape sequence
            UNICODE_ESC
        |   // Invalid escape
            .
        |   // Invalid escape at end of file
            EOF
        )
    ;

fragment
UNICODE_ESC
    :   'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
    ;

fragment
HEX_DIGIT
    :   [0-9a-fA-F]
    ;

和 Java 代码：

String source = "\"\" \"simple\" \"test \\\"string\\\" that works\" \"test string that does \\\"not work\\\"\"";
StringLexer lexer = new StringLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

for (Token t : stream.getTokens()) {
    System.out.printf("%-20s '%s'%n",
            StringLexer.VOCABULARY.getSymbolicName(t.getType()),
            t.getText().replace("\n", "\\n"));
}

打印：

STRING               '""'
STRING               '"simple"'
STRING               '"test \"string\" that works"'
STRING               '"test string that does \"not work\""'
EOF                  '<EOF>'

ANTLR4 语法未正确匹配字符串中的转义引号

ANTLR4 grammar not correctly matching escaped quotes in strings

评论

评论

评论