如何从文本块中提取包含关键字的句子

How can I extract a sentence containing a keyword from a block of text

提问人:Col 提问时间:7/9/2023 最后编辑:MarkCol 更新时间:7/9/2023 访问量:64

问:

我的目标是想出一个脚本,该脚本将在日志文件文件夹中搜索特定关键字,并将文件名、包含关键字的每个文件中的行号、关键字开始的索引以及包含关键字的整行文本输出到结果 .txt 文件。

我创建了一些执行此操作的代码,但它在示例中存在问题,例如:

大家好,我们计划在本周末进行一些通宵维护,这意味着您将无法在即将到来的周五晚上 7 点至上午 10 点之间使用网络上的任何设备/周六早上(23 年 6 月 23 日至 24 年 6 月 23 日)。对于由此造成的不便,我们深表歉意,但这是不可避免的。请确保您在周五晚上 6:30 (23/06/23) 之前退出网络。

它正确地将关键字“device”标识为第 1 行并从字符 114 开始,并且非常正确地将整个文本块显示为包含关键字“device”,而我希望它只显示它出现的句子。

我在想:

  • 对于每个“设备”,在上一个句号之后和下一个句号之前查找文本,或者
  • 获取“设备”前后的 n 个字符

这是我到目前为止编写的代码:

#Import os module
import os
fname2 = "D:\X250\Python_Scripts\Search_File_for_Keyword_and_Print_Line\Results.txt"

# String to search
search_path = input("Enter directory path to search : ")
file_type = input("File Type : ")
search_str = input("Enter the search string : ")

#**Create Output File**
fw = open(fname2, 'w')

# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ): 
        search_path = search_path + "/"
                                                          
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
        search_path ="."

# Repeat for each file in the directory  
for fname in os.listdir(path=search_path):

   # Apply file type filter   
   if fname.endswith(file_type):

        # Open file for reading
        fo = open(search_path + fname)

        # Read the first line from the file
        line = fo.readline()

        # Initialize counter for line number
        line_no = 1

        # Loop until EOF
        while line != '' :
                # Search for string in line
                index = line.find(search_str)
                if ( index != -1) :
                    print(fname, "[", line_no, ",", index, "] ", line, sep="")
                    #Write Output File
                    fw.write(fname + " " + str(line_no) + " " + str(index)+"  ")
                    fw.write(line)

               

                # Read next line
                line = fo.readline()  

                # Increment line counter
                line_no += 1

                

        # Close the files
        fo.close()
Python 正则表达 式文件 文本

评论


答:

0赞 Mark 7/9/2023 #1

这样的事情应该可以工作:

text = "Hi everyone, we've planned some overnight maintenance this weekend so that means you will not be able to use any device on the network between 7pm and 10am on this coming Friday evening/ Saturday morning (23/06/23 to 24/06/23). We apologise for the inconvenience this will cause but it is unavoidable. Please ensure you have logged out of the network by 6.30pm on Friday evening (23/06/23)."

#split text into sentences
sentences = text.split(".")

# filter to only sentences with "device" in them 
sentences_with_device = [sentence for sentence in sentences if "device" in sentence]

# using regex
import re
# this looks for, in order, all of the following:
# 1. anything that is not a period (.) 0 or more times
# 2. the word "device"
# 3. anything that is not a period (.) 0 or more times
# 4. a period (.)
sentences_with_device = re.findall(r'([^.]*?device[^.]*\.)', text)

评论

0赞 Col 7/26/2023
对于延迟确认您的回复,我们深表歉意。我一直在度假。我尝试了您的建议,但它引发了以下错误:NameError:未定义名称“句子”。我没有足够的专业知识来解决这个问题。你能帮忙吗?谢谢