如何将带有注释的 cmake 头文件转换为 TSV/CSV 文件？-解网

问：

我有许多 cmake 头文件，我想从中提取注释和，到 CSV（或 TSV）文件中。cmakedefine

典型输入如下所示：

/**
 * 1st Multi-line brief description of what the following 
 * cmakedefine does.
 *
 * Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * Possibly additional lines of full description.
 */
#cmakedefine SOMETHING

第一步输出是得到这样的：

SOMETHING
1st Multi-line brief description of what the following cmakedefine does.
Second more complicated multi-line full description , of <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description.
...

最终，我希望得到的输出是这样的：

SOMETHING, "1st Multi-line brief description of what the following cmakedefine does.", "Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description."

SOMETHING_ELSE, "Brief description", "Long Description"

（列标题可以暗示为：。cmakedefine, Brief_Description, Long_Description

我曾尝试过在 sed 中做到这一点，但没有成功，这不是消磨时间的好方法。我也尝试过awk，但没有成功。在这一点上，我不在乎使用什么工具，只想完成工作。但我认为也许 Python 可以更好地用于此。

注意事项：

所有注释都以空 ./**
所有注释都以开头，可以在多行上。Brief\ * \
Brief和注释用空注释行分隔。Long\ *
所有注释都可以有多个段落（如图所示），以类似的方式。Long
相关人员正在关注评论。cmakedefine

更新：

cmake 文件比我想象的要复杂，因为：

有许多不相关的独立注释与前面的注释无关。#cmakedefine
有一些评论后面跟着几个.#cmakedefine
有时注释甚至包括字符串、逗号和其他字符，例如 .#cmakedefine<c>
有时评论中有单引号（）和双引号（）。'"

更复杂的文件可能如下所示：

/**
 * Only a "Brief" comment
 */ 
#cmakedefine SIMPLE

/**
 * 1st multi-line Brief description of what the following 
 * cmakedefine does.
 *
 * 2nd more complicated multi-line Full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * [Sometimes] additional paragraph-1 of full description, 
 * going on several lines.
 *
 * [Sometimes] additional paragraph-2 of full "description", 
 * going on several lines. (double quoted)
 *
 * ...
 * [Sometimes] additional paragraph-N of full 'description', 
 * going on several lines. (single quoted)
 */
#cmakedefine SOMETHING

/**
 * Some useless unrelated comment
 */ 

/**
 * 1st Multi-line brief description of what the following 
 * cmakedefine does.
 *
 * Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * Possibly additional lines of full description.
 */
#cmakedefine SOMETHING_ELSE
#cmakedefine ANOTHER_SOMETHING

python c bash 解析 cmakelists-options

awk '
gsub(/^ \* */, "") {
    if ($0 == "/") {
        // end - to nothing
    } else if ($0) {
        comment = comment (comment ? " " : "") $0;
    } else if (!firstline) {
        firstline = comment
        comment = ""
    }
}
gsub(/^#cmakedefine /, ""){
    print $0 ", \"" firstline "\", \"" comment "\"";
    fistline = 0
    comment = ""
}
' <<EOF
/**
 * 1st Multi-line brief description of what the following 
 * cmakedefine does.
 *
 * Second more complicated multi-line full description <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * Possibly additional lines of full description.
 */
#cmakedefine SOMETHING
EOF

输出：

SOMETHING, "1st Multi-line brief description of what the following  cmakedefine does.", "Second more complicated multi-line full description <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description."

这是一个不错的awk教程 https://www.grymoire.com/Unix/Awk.html

下一步是添加一些状态，比如不要省略空注释，也许用一些正则表达式清除输入，并正确引用 csv 或 tsv 或您想要的格式，并且随着复杂性的增加，回答为什么不是 python 和 json。

这几乎奏效了。但是，它不喜欢在行中使用逗号（），然后将下一行放在新的 CSV 字段中。对不起，我的例子没有显示这一点，但我现在添加了它。此外，它没有在长描述中连接后续行（由空注释行分隔），并将它们视为单独的 CSV 列。也许它只是从之前的行中混淆了逗号？,

0赞 KamilCuk 8/24/2023

“下一步将是添加......正确引用 CSV 或 TSV ....回答为什么不是 python 和 json”。还有一个指向 awk 教程的链接。Awk 手册也可以在网上找到。较新的 awk 还能够从 gawkextlib 加载 csv 等库。（然后，为什么不使用带有导入 CSV 的 Python。

1赞 Timeless 8/25/2023 #2

如果您想将 python 与正则表达式一起使用：

import re
from itertools import groupby
import pandas as pd #pip install pandas

pat = r"/\*\*\n(.*?)\*/\n(?=#cmakedefine (\w+))"

with open("input.cmake", "r") as f:
    d = {
        m.group(2): [
            comment.strip("* ")
            for comment in m.group(1).split("\n") if m
        ] for m in re.finditer(pat, f.read(), re.DOTALL)
    }

# to concatenate the splitted comments
for cmake,com in d.items():
    d[cmake] = [
        " ".join(g) for k, g in groupby(com, key=lambda x: x!= "") if k
    ]

# making the [TC]SV
(
    pd.DataFrame.from_dict(d, orient="index")
        .apply(lambda x: x.add('"').radd('"')) # could be optional
        .to_csv("output.csv", sep=",", header=False, quoting=3, escapechar="\\")
) # use `sep="\t"` and change the extension of the file if you need a TSV

正则表达式：[演示]

输出（表格格式）：

	0	1	2
东西	第一个多行简要说明以下 cmakedefine 的作用。	第二个更复杂的多行完整描述，SOMETHING要在配置中启用。	可能还有完整描述的附加行。
SOMETHING_ELSE	快速描述	详细说明

谢谢！非常酷，尤其是正则表达式演示站点。我注意到你的脚本跳过了部分，这没关系，但我想知道为什么会这样？此外，还更新了 OP 以提供更完整的示例。我不是要你处理这个问题，但也许可以更详细地描述你的代码，因为我不太擅长大脑编译 lambda。我怎样才能把所有额外的行作为一个长描述？<c>

1赞 Timeless 8/25/2023

在我输入答案的输出中，我使用了，这就是为什么标签是隐藏的，但不要担心，在 csv/tsv 中，标签会被保留（您也可以尝试编辑答案以显示它们）。关于您的第二个问题，您可以通过使用并删除以 You'll have a single column contains all the descriptions 开头的第二个部分来稍微调整第一个 bloc 中的 listcomp。df.to_markdown()<c>" ".join(comment.strip("* ")... for cmake,com in d.items():..

0赞 not2qubit 8/25/2023

不，这不太管用。我希望第 0 列（如上所述）保持原样，第 1 列显示现在在 1-N 列中的内容。您建议的修改将所有内容（几乎！）放在第 0 列中，而只有少数放在第 1 列中。或者也许我把连接结束放在错误的地方？)

0赞 not2qubit 8/25/2023

另外，我刚刚注意到它跳过了第一个，来自我更新的示例。single comment + #cmakedefine SIMPLE

上一个：当我运行后验函数时，函数中间的 Segfault

下一个：我应该指望换行符是什么字符？

如何将带有注释的 cmake 头文件转换为 TSV/CSV 文件？

How to convert a cmake header file with comments into a TSV/CSV file?

评论

评论

评论