将任何也包含转义字符和换行符的字符串与 Go 匹配-解网

问：

文件（C 或 C++ 代码）中的任何（格式）字符串，甚至包含转义字符或换行符，都需要通过用 Go 编写的工具找到。例子：

..."foo"...
...`foo:"foo"`...
..."foo
foo"...
..."foo\r\nfoo"...
...`foo"foo-

lish`

C/C++ 解析也允许在注释或停用代码中完成，因此无需排除该部分。

我成功了

/(["'`])(?:(?=(\?))\2.)*？\1/gms

在寻找解决方案 https://regex101.com/r/FDhldb/1。

不幸的是，这不能在 Go 中编译：

const (
patFmtString = `(?Us)(["'])(?:(?=(\\?))\2.)*?\1`
)
var (
matchFmtString = regexp.MustCompile(patFmtString)
)

即使是简化的模式也提供了“错误解析正则表达式：无效的转义序列：”。(?Us)(["'])(?:(\\?).)*?\1\1

我如何在 Go 中正确实现它，希望运行速度也很快？

C 字符串 Go 解析

import "bufio"

var stringLiterals bufio.SplitFunc = func(data []byte, atEOF bool) (advance int, token []byte, err error) {
    scanning := false
    var delim byte
    var i int
    var start, end int
    for i < len(data) {
        b := data[i]
        switch b {
        case '\\': // skip escape sequences
            i += 2
            continue
        case '"':
            fallthrough
        case '\'':
            fallthrough
        case '`':
            if scanning && delim == b {
                end = i + 1
                token = data[start:end]
                advance = end
                return
            } else if !scanning {
                scanning = true
                start = i
                delim = b
            }
        }
        i++
    }
    if atEOF {
        return len(data), nil, nil
    }
    return start, nil, nil
}

并像这样使用它

func main() {
    input := /* some reader */
    scanner := bufio.NewScanner(input)
    scanner.Split(stringLiterals)
    for scanner.Scan() {
        stringLit := scanner.Text()
        // do something with `stringLit`
    }
}

对于您的示例，这完全返回了您的正则表达式所做的匹配项，尽管我不确定这是否真的对应于语法中定义 C++ 字符串文字的方式。

您可以在操场上尝试一下。

将任何也包含转义字符和换行符的字符串与 Go 匹配

Match any string also containing escaped characters and newlines with Go

评论

评论