解析脚本中的迭代：值保持不变-解网

问：

我目前正在开发一个解析器，该解析器迭代包含收益电话会议记录的 .txt 文件。目的是提取首席执行官所说的部分。提供的代码片段是负责提取各种信息（例如通话日期和公司）的较大脚本的一部分。您可以在此处找到完整的成绩单，包括正则表达式：https://regex101.com/r/mhKevB/1

    presentation_part = """
--------------------------------------------------------------------------------
Inge G. Thulin,  3M Company - Chairman, CEO & President    [3]
--------------------------------------------------------------------------------

          Thank you, Bruce, and good morning, everyone. Coming off a strong 2017, our team opened the new year with broad-based organic growth across all business groups. We expanded margins and posted a double-digit increase in earnings per share while continuing to invest in our business and return cash to our shareholders.
"""

ceos_lname_clean = ['Thulin', 'Davis']


try:
    ceos_speaches_pres = []
    if len(ceos_lname_clean) != 0: 
        for lname in ceos_lname_clean:
            ceo_pattern = fr'(?m){lname}.*?(?:CEO|Chief Executive Officer)\b(?:(?!\n-+$).)*?\[\d+\]\s+^-+\s+((?s:.*?))(?=\s+^-+|\Z)' #Alternatives pattern wo neben dem Begriff CEO auch auf den Namen des CEO gematched wird
            ceo_textparts_pres = re.findall(ceo_pattern, presentation_part, re.DOTALL | re.IGNORECASE)
            ceo_speech_presentation = " ".join(ceo_textparts_pres)
            ceos_speaches_pres.append(ceo_speech_presentation)
        #Overall_dict[folder][comp_path]["CEO Presentation Speech"] = ceos_speaches_pres ##Add the text to a dict

    else: ##try for COO in case ceos_lname_clean is empty
        coos_speaches_pres = [] 
        for coo_lname in coos_lname_clean:
            coo_pattern = fr'(?m){coo_lname}.*?(?:COO|Chief Operating Officer)\b(?:(?!\n-+$).)*?\[\d+\]\s+^-+\s+((?s:.*?))(?=\s+^-+|\Z)' #Alternatives pattern wo neben dem Begriff COO auch auf den Namen des COO gematched wird
            coo_textparts_pres = re.findall(coo_pattern, presentation_part, re.DOTALL | re.IGNORECASE)
            coo_speech_presentation = " ".join(coo_textparts_pres)
            coos_speaches_pres.append(coo_speech_presentation)
        #Overall_dict[folder][comp_path]["COO Presentation Speech"] = coos_speaches_pres ##Add the text to a dict
except:
    print("PROBLEM")

提供的代码段成功提取了 Thulin 说出的文本。但是，当集成到整个脚本中时，会出现一个问题：ceo_textparts_pres保留了上一次迭代的值。也就是说，即使戴维斯ceo_textparts_pres应该保持空白，它也保存着图林所说的文本。

我花了一整天的时间解决这个问题，但没有成功，并且越来越沮丧。不幸的是，整个脚本太广泛了，无法在此处发布，但即使是可能导致此问题的最小提示或建议也将不胜感激。

提前感谢您的帮助。

Python 正则表达式

import re

presentation_part = """
today Davis
met miss Thulin
they were both CEO
on day number [3]
- bad case"""

ceos_lname_clean = ["Thulin", "Davis"]


ceos_speaches_pres = []
for lname in ceos_lname_clean:
    ceo_pattern = rf"(?m){lname}.*?(?:CEO|Chief Executive Officer)\b(?:(?!\n-+$).)*?\[\d+\]\s+^-+\s+((?s:.*?))(?=\s+^-+|\Z)"
    # ceo_pattern = rf"(?m){lname}(?-s:.*?)(?:CEO|Chief Executive Officer)\b(?:(?!\n-+$).)*?\[\d+\]\s+^-+\s+((?s:.*?))(?=\s+^-+|\Z)"
    ceo_textparts_pres = re.findall(
        ceo_pattern, presentation_part, re.DOTALL | re.IGNORECASE
    )
    ceo_speech_presentation = " ".join(ceo_textparts_pres)
    ceos_speaches_pres.append(ceo_speech_presentation)

print(ceos_speaches_pres)

解析脚本中的迭代：值保持不变

Iteration in Parsing Script: Value remains the same

评论

评论