需要解析文件并从中创建数据结构 [已关闭]-解网

问：

想改进这个问题吗？更新问题，使其仅通过编辑这篇文章来关注一个问题。

上个月关闭。

改进此问题

我们想解析一个文件并创建一个某种数据结构供以后使用（在 Python 中）。文件的内容如下所示：

plan HELLO
   feature A 
       measure X :
          src = "Type ,N ame"
       endmeasure //X

       measure Y :
        src = "Type ,N ame"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,N ame"
           endmeasure //AaX

           measure AaY :
              src = "Type ,N ame"
           endmeasure //AaY
           
           feature Aab
              .....
           endfeature // Aab
         
       endfeature //Aa
 
   endfeature // A
   
   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA

因此，有一个文件包含一个或多个计划，然后每个计划包含一个或多个功能，此外，每个功能都包含一个包含信息（src、type、name）的度量值，并且功能可以进一步包含更多功能。

我们需要解析文件并创建一个数据结构，该结构将具有

                     plan (HELLO) 
            ------------------------------
             ↓                          ↓ 
          Feature A                  Feature B
  ----------------------------          ↓
   ↓           ↓             ↓           ........
Measure X    Measure Y    Feature Aa
                         ------------------------------
                            ↓           ↓             ↓ 
                       Measure AaX   Measure AaY   Feature Aab
                                                        ↓
                                                        .......

我正在尝试逐行解析文件并创建一个列表列表，其中包含计划 -> 功能 ->度量、功能

python 列表数据结构 readlines fileparse

您必须在此处构建一个自定义解析器。不是很复杂，但问题的一个主要来源是词法部分：关键字（feature、endfeature 等）可以出现在带引号的字符串中吗？带引号的字符串可以包含具有特定语法的引号吗？如果答案是肯定的，那么在分析其他任何内容之前，您确实必须仔细检测引号字符串的开头和结尾。pyparsing或PLY等工具可以在这方面提供帮助。当然，如果语法要简单得多（字符串中只有一组有限的元素，并且没有接近关键字的内容），则可以手动构建解析器。

答：

-1赞 gog 10/1/2023 #1

为了快速而肮脏的解析，您可以进行一些正则表达式替换，例如

text = re.sub(
    r'(?mx)^ \s* (plan|feature|measure) \s+ (\w+) .*',
    r'<\1 name="\2">',
    text)
text = re.sub(
    r'(?mx)^ \s* end (plan|feature|measure) .*',
    r'</\1>',
    text)
text = re.sub(
    r'(?mx)^ \s* (\w+) \s*=\s* (.*)',
    r'<\1>\2</\1>',
    text)

这会将其转换为 XML，您可以使用内置工具（例如 ETree）对其进行解析。

0赞 trincot 10/1/2023 #2

下面是一个函数，可以将字符串转换为字典：

def getplans(s):
    stack = [{}]
    for line in s.splitlines():
        if "=" in line:  # leaf
            key, value = line.split("=", 1)
            stack[-1][key.strip()] = value.strip(' "')
        elif line.strip()[:3] == "end":
            stack.pop()
        elif line.strip():
            collection, name, *_ = line.split()
            stack.append({})
            stack[-2].setdefault(collection + "s", {})[name] = stack[-1]
    return stack[0]

下面是一个示例调用：

s = """plan HELLO
   feature A 
       measure X :
          src = "Type, Name"
       endmeasure //X

       measure Y :
        src = "Type, Name"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type, Name"
           endmeasure //AaX

           measure AaY :
              src = "Type, Name"
           endmeasure //AaY
           
           feature Aab
                measure Car :
                  src = "Model, Make"
               endmeasure //car
           endfeature // Aab
         
       endfeature //Aa
 
   endfeature // A
   
   feature B
       measure Hotel :
          src = "Stars, Reviews"
       endmeasure //Hotel
    endfeature //B
endplan

plan HOLA
endplan //HOLA
"""

import json
print(json.dumps(getplans(s), indent=4))

输出：

{
    "plans": {
        "HELLO": {
            "features": {
                "A": {
                    "measures": {
                        "X": {
                            "src": "Type ,N ame"
                        },
                        "Y": {
                            "src": "Type ,N ame"
                        }
                    },
                    "features": {
                        "Aa": {
                            "measures": {
                                "AaX": {
                                    "src": "Type ,N ame"
                                },
                                "AaY": {
                                    "src": "Type ,N ame"
                                }
                            },
                            "features": {
                                "Aab": {
                                    "measures": {
                                        "Car": {
                                            "src": "Model, Make"
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "B": {
                    "measures": {
                        "Hotel": {
                            "src": "Stars, Reviews"
                        }
                    }
                }
            }
        },
        "HOLA": {}
    }
}

如果你的输入有其他语法 - 不包括在你的问题中 - 你可能需要进一步调整脚本来处理它。

需要解析文件并从中创建数据结构 [已关闭]

Need to parse a file and create a data structure out of it [closed]

评论

评论