用于从日志文件中提取日志行的正则表达式

Regex to pull log lines from a log file

提问人:Eamon M 提问时间:11/14/2023 最后编辑:Eamon M 更新时间:11/14/2023 访问量:83

问:

我有一个日志文件,我想从中提取一个特定的日志行。 我不熟悉正则表达式,但对以下格式的成功有限。 我应该使用什么正则表达式来查找带有文本“= START BACKUP DETAILS END ==”和“= START BACKUP DETAILS END =”的日志行

06/11/2023 13:41 LocalFileCacheHashDb filterExisting, items contains 577 items
06/11/2023 13:41 LocalFileCacheHashDb < filterExisting(15)
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS ===============
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS END ===============
06/11/2023 13:41 BackUpHelper prepareSyncData
06/11/2023 13:42 AssetScannerSdkManager getAndFilterPhotoVideoFolderItem, descriptionItemList contains 570 items
06/11/2023 13:42 AssetScannerSdkManager getLocalMusic, descriptionItemList contains 1 items
06/11/2023 13:42 AssetScannerSdkManager getLocalDocs, descriptionItemList contains 6 items
06/11/2023 13:42 AssetScannerSdkManager getAssets, isRestore = false, descriptionItemList contains 577 items
import re
with open('cached_logs.txt', 'r') as text_file:
    text_file=text_file.read()
    pattern = r'([M-c])'
    matches = re.findall(pattern, text_file)
with open('cacheOut.txt', 'w') as out:
    out.write('\n'.join(matches))
Python 正则表达式

评论

0赞 user19077881 11/14/2023
正则表达式不是必需的。您可以只测试每个文本行,以检查行中是否有特定的子字符串(例如 START BACKUP DETAILS),并使用 such s 开发逻辑来获取所需的行。if "START BACKUP DETAILS" in text_line:
0赞 sln 11/15/2023
为什么这不是“如何学习正则表达式”的副本?

答:

0赞 Olivier Dulac 11/14/2023 #1

我相信你想提取打印细节吗?

我在 Python 方面还不够好,所以我会给你一个大致的想法,并提供一个简单的 awk 实现:

  • 逐行进入日志文件
  • 当脚本遇到带有“= START BACKUP DETAILS =”的行时:将打印设置为 1,但不打印该行。
  • 当脚本遇到“= START BACKUP DETAILS END =”的行时:将打印设置为 0,并且不打印该行。
  • 当“printing”设置为“1”时:打印相应的行。

awk 基本实现:

awk '
/= START BACKUP DETAILS =/ { printing=1 ; next }
/= START BACKUP DETAILS END =/ { printing=0 ; next }
( printing == 1 )
' < logfile

如果您确实想看到 2 个正则表达式行:删除“下一个”。

0赞 Andrej Kesely 11/14/2023 #2

下面是使用 (regex101) 的 Python 示例:re

import re

text = """\
06/11/2023 13:41 LocalFileCacheHashDb filterExisting, items contains 577 items
06/11/2023 13:41 LocalFileCacheHashDb < filterExisting(15)
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS ===============
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS END ===============
06/11/2023 13:41 BackUpHelper prepareSyncData
06/11/2023 13:42 AssetScannerSdkManager getAndFilterPhotoVideoFolderItem, descriptionItemList contains 570 items
06/11/2023 13:42 AssetScannerSdkManager getLocalMusic, descriptionItemList contains 1 items
06/11/2023 13:42 AssetScannerSdkManager getLocalDocs, descriptionItemList contains 6 items
06/11/2023 13:42 AssetScannerSdkManager getAssets, isRestore = false, descriptionItemList contains 577 items
"""

pat = r"(?<=START BACKUP DETAILS ===============\n).*?(?=\s*^[^\n]+START BACKUP DETAILS END)"

for block in re.findall(pat, text, flags=re.S | re.M):
    print("-" * 80)
    print(block)
    print("-" * 80)

指纹:

--------------------------------------------------------------------------------
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
--------------------------------------------------------------------------------