在第一个正则表达式行匹配中保留第二个正则表达式行-解网

问：

我的目录中有大量的txt列表文件，以下是我的一个文件的示例：E:\Desktop\Linux_distro\asliiiii

95
ROSA
139
96
Chakra
137
97
AV Linux
135
98
LibreELEC
134
99
Simplicity
131
100
Kodachi
130
20200301020449
79776361952441

现在我需要一个脚本，该脚本首先找到正则表达式行，然后在找到的行中仅保留正则表达式行。
这意味着必须向我提供以下结果：\d{14}20(?:0[0-9]|1[0-9]|20)[0-1][0-9]

95
ROSA
139
96
Chakra
137
97
AV Linux
135
98
LibreELEC
134
99
Simplicity
131
100
Kodachi
130
20200301020449

我写了以下python脚本，但我不知道为什么它不适合我的列表！

import os
import re

def process_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Find lines matching \d{14}
    regex_pattern_1 = re.compile(r'\d{14}')
    matching_lines = [line.strip() for line in lines if regex_pattern_1.search(line)]

    # Keep only matches of the second regex in the found lines
    regex_pattern_2 = re.compile(r'20(?:0[0-9]|1[0-9]|20)[0-1][0-9]\d{8}')
    filtered_lines = []
    for line in matching_lines:
        matches = regex_pattern_2.findall(line)
        filtered_lines.extend(matches)

    # Write the filtered lines back to the file
    with open(file_path, 'w') as file:
        file.write('\n'.join(filtered_lines))

def process_files_in_directory(directory_path):
    for filename in os.listdir(directory_path):
        if filename.endswith('.txt'):
            file_path = os.path.join(directory_path, filename)
            process_file(file_path)

if __name__ == "__main__":
    directory_path = r'E:\Desktop\Linux_distro\asliiiii'
    process_files_in_directory(directory_path)
    print("Processing complete.")

但是这个脚本为我提供了以下结果！

20200301020449

这个脚本问题在哪里？

Python 正则表达式

def process_file(fn):
    fin = open(fn)
    fout = open(fn+'.out','w')

    for line in fin:
        line = line.strip()
        print(line, file=fout)
        if len(line) == 14 and line.isdigit():
            break

    for line in fin:
        line = line.strip()
        if len(line) == 14 and line.isdigit() and line.startswith('20'):
            print(line, file=out)

process_file('x.txt')

现在，我做了一个假设，检查“以'20'开头的 14 位数字”足以找到您的时间戳，但如果您真的需要查找有效日期，您可以在此处使用正则表达式。

请注意，我复制到具有特殊名称的新文件中。如果你愿意，你可以做一个和最后。deleterename

import os
import re

def process_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Keep lines that match the second regex or do not match any regex
    regex_pattern_2 = re.compile(r'20(?:0[0-9]|1[0-9]|20)[0-1][0-9]\d{8}')
    filtered_lines = [line.strip() for line in lines if regex_pattern_2.search(line) or not re.search(r'\d{14}', line)]

    # Write the filtered lines back to the file
    with open(file_path, 'w') as file:
        file.write('\n'.join(filtered_lines))

def process_files_in_directory(directory_path):
    for filename in os.listdir(directory_path):
        if filename.endswith('.txt'):
            file_path = os.path.join(directory_path, filename)
            process_file(file_path)

if __name__ == "__main__":
    directory_path = r'E:\Desktop\Linux_distro\asliiiii'
    process_files_in_directory(directory_path)
    print("Processing complete.")

上一个：正则表达式和 unicode

下一个：没有编写正确的正则表达式集

在第一个正则表达式行匹配中保留第二个正则表达式行

Keep second regex lines in first regex lines matches

评论

评论

评论