有没有一个 Python 解析库可以解析类似 TOML 的格式，该格式使用 [ParentHeader_ChildSection] 指定嵌套字段？

Is there a Python parsing library that can parse a TOML-like format that specifies nested fields with [ParentHeader_ChildSection]?

提问人：markfickett 提问时间：8/19/2023 更新时间：8/20/2023 访问量：63

问：

我想在 Python 中解析外部定义（和未记录）的文件格式。它看起来有点类似于 TOML，但文本样式不同，并且没有引用。例如：

[Schedule_Step122]
m_nMaxCurrent=0
m_szAddIn=Relay OFF
m_szLabel=06 - End Charge
m_uLimitNum=2

[Schedule_Step122_Limit0]
Equation0_szCompareSign=>=
Equation0_szRight=F_05_Charge_Capacity
Equation0_szLeft=PV_CHAN_Charge_Capacity
m_bStepLimit=1
m_szGotoStep=End Test


[Schedule_Step122_Limit1]
Equation0_szCompareSign=>=
Equation0_szLeft=PV_CHAN_Voltage
Equation0_szRight=3
m_bStepLimit=1
m_szGotoStep=End Test

（这是 Arbin 的测试计划格式。

我希望解析的结构是这样的：

"steps": [
  {
    "max_current": 0,
    "add_in": RELAY_OFF,
    "label": "09 - End Charge",
    "limits": [
      {
        "equations": [
          {
            "left": PV_CHAN_CHARGE_CAPACITY,
            "compare_sign": ">=",
            "right": F_05_CHARGE_CAPACITY
          }
        ],
        "step_limit": 1,
        "goto_step": END_TEST
      },
      {
        "equations": [
          {
            "left": PV_CHAN_VOLTAGE,
            "compare_sign": ">=",
            "right": 6
          }
        ],
        "step_limit": 1,
        "goto_step": END_TEST
      }
    ]
  }
]

从表面上看，格式似乎与 TOML 相似，包括一些嵌套，但字符串处理不同。我还想将某些值捕获为命名常量。

我还在研究定义一个与上下文无关的语法，并使用词法分析器/解析器，如 ANTLR、PLY、pyparsing 或 Lark。我熟悉阅读文档中的语法，但以前从未编写过或使用过解析器。但是，我不知道如何表示嵌套结构（例如成为的成员）或相关键（如 Equation0_szLeft' 等）之间缺乏保证顺序。Schedule_Step122_Limit0Schedule_Step122Equation0_szCompareSign

有没有一个通用的解析工具可以为我编写定义，它会为我提供解析/结构化的输出？或者这里是编写自定义解析逻辑的最佳方法？

python 解析 ply lark-parser

0赞 Nick ODell 8/19/2023

看起来有点像标准库中配置解析器的格式。

0赞 MegaIng 8/19/2023

你可以使用类似或自定义语法的东西，但要将其翻译成你想要的 excact 格式，你需要进行后处理，没有现成的库会处理这种风格的命名空间名称。对于 Lark，请查看变形金刚或访客。configparser

答：

0赞 Michael Dyck 8/20/2023 #1

像 ANTLR、PLY、pyparsing 或 Lark 这样的工具几乎不会给你带来任何帮助。configparser 可能会有所帮助，但我怀疑它会比它的价值更麻烦。

以下代码接近您想要的代码。您需要根据您对输入格式的发现以及您对输出结构的需求来调整它。

import re, json

def main():
    obj = parse('input.txt')
    print(json.dumps(obj, indent=2))

def parse(filename):
    root_object = {}
    current_object = None
    for line in open(filename):
        # trim trailing whitespace:
        line = line.rstrip()

        if line == '':
            # blank line
            pass

        elif mo := re.fullmatch(r'\[(\w+)\]', line):
            # header line
            # This identifies, via a 'path' from the root object,
            # the object that subsequent name-value lines are talking about.
            header_path = mo.group(1)
            header_pieces = header_path.split('_')
            current_object = get_nested_object(root_object, header_pieces)

        elif mo := re.fullmatch(r'([^=]+)=(.*)', line):
            # name-value line
            (name_part, value_str) = mo.groups()
            # The {name_part} identifies a field in {current_object}
            # or some object nested within {current_object}.
            # The {value_str} encodes the value to be assigned to that field.
            name_pieces = name_part.split('_')
            prefix_pieces = name_pieces[:-1]
            field_name_piece = name_pieces[-1]

            if prefix_pieces == ['m']:
                # This is an 'immediate' field of {current_object}
                obj_w_field = current_object
            else:
                # This is a field of some object nested within {current_object}
                obj_w_field = get_nested_object(current_object, prefix_pieces)

            mo = re.fullmatch(r'([a-z]+)([A-Z][a-zA-Z]*)', field_name_piece)
            (type_indicator, field_name_pc) = mo.groups()

            field_name = to_snake_case(field_name_pc)
            field_value = value_str

            obj_w_field[field_name] = field_value

        else:
            assert 0, line
    return root_object

def get_nested_object(base_object, header_pieces):
    if header_pieces == []:
        return base_object
    else:
        prefix_pieces = header_pieces[:-1]
        last_piece = header_pieces[-1]

        obj = get_nested_object(base_object, prefix_pieces)

        if mo := re.fullmatch(r'[A-Za-z]+', last_piece):
            # e.g. "Schedule"
            # This identifies a field/property/member of {obj}
            field_name = to_snake_case(last_piece)
            # That field might or might not exist already.
            if field_name not in obj:
                # It doesn't exist yet.
                # We assume that the value of the field is an object
                obj[field_name] = {}
            return obj[field_name]
            
        elif mo := re.fullmatch(r'([A-Za-z]+)(\d+)', last_piece):
            # e.g., "Step122", "Limit0"
            # This identifies an element of an array that is a field of {obj}
            # e.g., "Step122" implies that {obj} has a field named "steps",
            # whose value is an array,
            # and this identifies the element at index 122 in that array.
            (array_field_name_pc, index_str) = mo.groups()

            array_field_name = to_snake_case(array_field_name_pc) + 's'
            index = int(index_str)

            if array_field_name not in obj:
                obj[array_field_name] = {}
                # In practice, you might want to make this a list.
            array = obj[array_field_name]

            if index not in array:
                array[index] = {}
            return array[index]

        else:
            assert 0, last_piece

        assert 0

# "_pc" suffix denotes a Pascal-cased name, e.g. "MaxCurrent"

def to_snake_case(name_pc):
    assert '_' not in name_pc
    def replfunc(mo):
        cap_letter = mo.group(0)
        low_letter = cap_letter.lower()
        if mo.start() == 0:
            return low_letter
        else:
            return '_' + low_letter
    return re.sub(r'[A-Z]', replfunc, name_pc)

main()

对于示例输入，它打印：

{
  "schedule": {
    "steps": {
      "122": {
        "max_current": "0",
        "add_in": "Relay OFF",
        "label": "06 - End Charge",
        "limit_num": "2",
        "limits": {
          "0": {
            "equations": {
              "0": {
                "compare_sign": ">=",
                "right": "F_05_Charge_Capacity",
                "left": "PV_CHAN_Charge_Capacity"
              }
            },
            "step_limit": "1",
            "goto_step": "End Test"
          },
          "1": {
            "equations": {
              "0": {
                "compare_sign": ">=",
                "left": "PV_CHAN_Voltage",
                "right": "3"
              }
            },
            "step_limit": "1",
            "goto_step": "End Test"
          }
        }
      }
    }
  }
}

0赞 markfickett 9/7/2023

感谢您确认现有解析器在这里没有用处。

上一个：使用 pcl 可视化工具的点云

下一个：使用 python ply 进行部分解析

有没有一个 Python 解析库可以解析类似 TOML 的格式，该格式使用 [ParentHeader_ChildSection] 指定嵌套字段？

Is there a Python parsing library that can parse a TOML-like format that specifies nested fields with [ParentHeader_ChildSection]?

评论

评论