如何从文本块中提取包含关键字的句子-解网

问：

我的目标是想出一个脚本，该脚本将在日志文件文件夹中搜索特定关键字，并将文件名、包含关键字的每个文件中的行号、关键字开始的索引以及包含关键字的整行文本输出到结果 .txt 文件。

我创建了一些执行此操作的代码，但它在示例中存在问题，例如：

大家好，我们计划在本周末进行一些通宵维护，这意味着您将无法在即将到来的周五晚上 7 点至上午 10 点之间使用网络上的任何设备/周六早上（23 年 6 月 23 日至 24 年 6 月 23 日）。对于由此造成的不便，我们深表歉意，但这是不可避免的。请确保您在周五晚上 6：30 （23/06/23）之前退出网络。

它正确地将关键字“device”标识为第 1 行并从字符 114 开始，并且非常正确地将整个文本块显示为包含关键字“device”，而我希望它只显示它出现的句子。

我在想：

对于每个“设备”，在上一个句号之后和下一个句号之前查找文本，或者
获取“设备”前后的 n 个字符

这是我到目前为止编写的代码：

#Import os module
import os
fname2 = "D:\X250\Python_Scripts\Search_File_for_Keyword_and_Print_Line\Results.txt"

# String to search
search_path = input("Enter directory path to search : ")
file_type = input("File Type : ")
search_str = input("Enter the search string : ")

#**Create Output File**
fw = open(fname2, 'w')

# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ): 
        search_path = search_path + "/"
                                                          
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
        search_path ="."

# Repeat for each file in the directory  
for fname in os.listdir(path=search_path):

   # Apply file type filter   
   if fname.endswith(file_type):

        # Open file for reading
        fo = open(search_path + fname)

        # Read the first line from the file
        line = fo.readline()

        # Initialize counter for line number
        line_no = 1

        # Loop until EOF
        while line != '' :
                # Search for string in line
                index = line.find(search_str)
                if ( index != -1) :
                    print(fname, "[", line_no, ",", index, "] ", line, sep="")
                    #Write Output File
                    fw.write(fname + " " + str(line_no) + " " + str(index)+"  ")
                    fw.write(line)

               

                # Read next line
                line = fo.readline()  

                # Increment line counter
                line_no += 1

                

        # Close the files
        fo.close()

Python 正则表达式文件文本

text = "Hi everyone, we've planned some overnight maintenance this weekend so that means you will not be able to use any device on the network between 7pm and 10am on this coming Friday evening/ Saturday morning (23/06/23 to 24/06/23). We apologise for the inconvenience this will cause but it is unavoidable. Please ensure you have logged out of the network by 6.30pm on Friday evening (23/06/23)."

#split text into sentences
sentences = text.split(".")

# filter to only sentences with "device" in them 
sentences_with_device = [sentence for sentence in sentences if "device" in sentence]

# using regex
import re
# this looks for, in order, all of the following:
# 1. anything that is not a period (.) 0 or more times
# 2. the word "device"
# 3. anything that is not a period (.) 0 or more times
# 4. a period (.)
sentences_with_device = re.findall(r'([^.]*?device[^.]*\.)', text)

如何从文本块中提取包含关键字的句子

How can I extract a sentence containing a keyword from a block of text

评论

评论