在 Python 中解析 Markdown 文件以生成目录时,“lvl”变量的增量不正确

Incorrect incrementation of the 'lvl' variable when parsing Markdown files to generate a table of contents in Python

提问人:Charles2a 提问时间:9/27/2023 更新时间:9/27/2023 访问量:28

问:

我正在创建一个类似于 gitbook 的文档托管应用程序,为了使其具有跨功能性,我正在使用一个 SUMMARY.md 文件来创建我的文件树,如下所示:

SUMMARY.md :


# Table of contents

* [Introduction](./)

## Active Directory <a href="#ad" id="ad"></a>

* [Category X](path/to/category-x/README.md)
  * [Subcategory 1](path/to/category-x/subcategory-1.md)
  * [Subcategory 2](path/to/category-x/subcategory-2.md)
  * [Subcategory 3](path/to/category-x/subcategory-3.md)
  * [🛠️ Tool A](path/to/category-x/tool-a.md)
  * [Subcategory 4](path/to/category-x/subcategory-4.md)
  * [Subcategory 5](path/to/category-x/subcategory-5.md)
  * [Subcategory 6](path/to/category-x/subcategory-6/README.md)
    * [Sub-subcategory 1](path/to/category-x/subcategory-6/sub-subcategory-1.md)
    * [Sub-subcategory 2](path/to/category-x/subcategory-6/sub-subcategory-2.md)
  * [Category Y](path/to/category-y/README.md)
    * [Subcategory 7](path/to/category-y/subcategory-7.md)
    * [Subcategory 8](path/to/category-y/subcategory-8.md)
    * [Subcategory 9](path/to/category-y/subcategory-9.md)
    * [🛠️ Tool B](path/to/category-y/tool-b.md)

基本上,我需要将我的子类别包装在“cat title”类 div 中,以包含类别标题和一个图标,当我单击它时,它将使用事件侦听器下拉子文章和子类别,如下所示。

<div class="cat-title">
  <li><a href="/path/to/category-x/README.md">Category X</a><div class="arrow">→</div></li>
</div>
<ul class="subcategory1">
  <li><a href="/path/to/category-x/subcategory-1.md">Subcategory 1</a></li>
  <li><a href="/path/to/category-x/subcategory-2.md">Subcategory 2</a></li>
  <li><a href="/path/to/category-x/subcategory-3.md">Subcategory 3</a></li>
  <li><a href="/path/to/category-x/tool-a.md">🛠️ Tool A</a></li>
  <li><a href="/path/to/category-x/subcategory-4.md">Subcategory 4</a></li>
  <li><a href="/path/to/category-x/subcategory-5.md">Subcategory 5</a></li>
  <li><a href="/path/to/category-x/subcategory-6/README.md">Subcategory 6</a></li>
  <ul class="subcategory2">
    <li><a href="/path/to/category-x/subcategory-6/sub-subcategory-1.md">Sub-subcategory 1</a></li>
    <li><a href="/path/to/category-x/subcategory-6/sub-subcategory-2.md">Sub-subcategory 2</a></li>
  </ul>
  <li><a href="/path/to/category-y/README.md">Category Y</a></li>
  <ul class="subcategory1">
    <li><a href="/path/to/category-y/subcategory-7.md">Subcategory 7</a></li>
    <li><a href="/path/to/category-y/subcategory-8.md">Subcategory 8</a></li>
    <li><a href="/path/to/category-y/subcategory-9.md">Subcategory 9</a></li>
    <li><a href="/path/to/category-y/tool-b.md">🛠️ Tool B</a></li>
  </ul>
</ul>

为了达到这个结果,我使用这个函数:

def summary():
    regex_link_name = re.compile(r'\[(.*)\]\((.*)\)')
    regex_title = re.compile(r'##\s+([^<]+)')

    with open('SUMMARY.md', 'r') as f:
        lines = f.readlines()

    html = ''
    lvl = 0
    i = 0
    while i < len(lines):
        line = lines[i]
        if not line.strip():
            i += 1
            continue
        if line.strip().startswith('*'):
            space_count = line.split('*')[0].count('  ')
            current_lvl = space_count
            if current_lvl > lvl:
                lvl += 1 #why tf is the first lvl 1 and not 0 when it's the last subcategory of a category, causes last subcategory1 to be a subcategory2
                #TODO : findout
                html += f'<ul class="subcategory{lvl}">\n'
            elif current_lvl < lvl:
                html += '</ul>\n'
                lvl -= 1

            matched = regex_link_name.findall(line)
            name = matched[0][0]
            link = matched[0][1]

            # Check if this line is followed by a sublist
            is_subcategory = False
            if i + 1 < len(lines):
                next_line = lines[i + 1]
                if next_line.strip().startswith('*') and next_line.split('*')[0].count('  ') > space_count:
                    is_subcategory = True

            if is_subcategory:
                html += f'<div class="cat-title">\n<li><a href="/{link}">{name}</a><div class="arrow">→</div></li>\n</div>\n'
            else:
                html += f'<li><a href="/{link}">{name}</a></li>\n'
            print(name, lvl)
        if line.strip().startswith('#'):
            html += '</ul>\n' * lvl
            lvl = 0

            titles = regex_title.findall(line)
            if titles:
                title = titles[0]
                html += f'<span>{title}</span>\n'
        i += 1

    with open('output.html', 'w', encoding='utf-8') as out_file:
        out_file.write(html)
    return html

问题是,每次我等待类别中的最后一个 1 级子类别(应该是类 subcategory1)时,lvl 变量在我的调试器中会频繁地递增一次。

我完全不知道为什么会发生这种情况,因为我从不操纵 lvl 变量。

仅当子类别是类别中的最后一个时,才会发生这种情况,如本例所示:

* [Category X](path/to/category-x/README.md)
  * [🛠️ Tool A](path/to/tool-a.md)
  * [Category Y](path/to/category-y.md)
  * [Category Z](path/to/category-z.md)
  * [Category W](path/to/category-w.md)
  * [Category V](path/to/category-v.md)
  * [Category U](path/to/category-u/README.md)
    * [Subcategory 1](path/to/category-u/subcategory-1.md)
    * [Subcategory 2](path/to/category-u/subcategory-2.md)
  * [Category T](path/to/category-t/README.md)
    * [Subcategory 3](path/to/category-t/subcategory-3.md)
    * [Subcategory 4](path/to/category-t/subcategory-4.md)
    * [Subcategory 5](path/to/category-t/subcategory-5.md)
    * [🛠️ Tool B](path/to/category-t/tool-b.md)

## Web services <a href="#web" id="web"></a>

如您所见,甚至还没有创建子类别标签:

<div class="cat-title">
  <li><a href="/path/to/category-x/README.md">Category X</a><div class="arrow">→</div></li>
</div>
<li><a href="/path/to/category-x/tool-a.md">🛠️ Tool A</a></li>
<li><a href="/path/to/category-x/category-y.md">Category Y</a></li>
<li><a href="/path/to/category-x/category-z.md">Category Z</a></li>
<li><a href="/path/to/category-x/category-w.md">Category W</a></li>
<li><a href="/path/to/category-x/category-v.md">Category V</a></li>

我尝试了很多事情,比如四处寻找一个糟糕的闭包、一个“lvl”变量的坏增量,但我什么也没找到

Python 正则表达式 解析 Markdown

评论


答: 暂无答案