提问人:Eamon M 提问时间:11/14/2023 最后编辑:Eamon M 更新时间:11/14/2023 访问量:83
用于从日志文件中提取日志行的正则表达式
Regex to pull log lines from a log file
问:
我有一个日志文件,我想从中提取一个特定的日志行。 我不熟悉正则表达式,但对以下格式的成功有限。 我应该使用什么正则表达式来查找带有文本“= START BACKUP DETAILS END ==”和“= START BACKUP DETAILS END =”的日志行
06/11/2023 13:41 LocalFileCacheHashDb filterExisting, items contains 577 items
06/11/2023 13:41 LocalFileCacheHashDb < filterExisting(15)
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS ===============
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS END ===============
06/11/2023 13:41 BackUpHelper prepareSyncData
06/11/2023 13:42 AssetScannerSdkManager getAndFilterPhotoVideoFolderItem, descriptionItemList contains 570 items
06/11/2023 13:42 AssetScannerSdkManager getLocalMusic, descriptionItemList contains 1 items
06/11/2023 13:42 AssetScannerSdkManager getLocalDocs, descriptionItemList contains 6 items
06/11/2023 13:42 AssetScannerSdkManager getAssets, isRestore = false, descriptionItemList contains 577 items
import re
with open('cached_logs.txt', 'r') as text_file:
text_file=text_file.read()
pattern = r'([M-c])'
matches = re.findall(pattern, text_file)
with open('cacheOut.txt', 'w') as out:
out.write('\n'.join(matches))
答:
0赞
Olivier Dulac
11/14/2023
#1
我相信你想提取打印细节吗?
我在 Python 方面还不够好,所以我会给你一个大致的想法,并提供一个简单的 awk 实现:
- 逐行进入日志文件
- 当脚本遇到带有“= START BACKUP DETAILS =”的行时:将打印设置为 1,但不打印该行。
- 当脚本遇到“= START BACKUP DETAILS END =”的行时:将打印设置为 0,并且不打印该行。
- 当“printing”设置为“1”时:打印相应的行。
awk 基本实现:
awk '
/= START BACKUP DETAILS =/ { printing=1 ; next }
/= START BACKUP DETAILS END =/ { printing=0 ; next }
( printing == 1 )
' < logfile
如果您确实想看到 2 个正则表达式行:删除“下一个”。
0赞
Andrej Kesely
11/14/2023
#2
下面是使用 (regex101) 的 Python 示例:re
import re
text = """\
06/11/2023 13:41 LocalFileCacheHashDb filterExisting, items contains 577 items
06/11/2023 13:41 LocalFileCacheHashDb < filterExisting(15)
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS ===============
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
06/11/2023 13:41 BackUpLauncher =============== START BACKUP DETAILS END ===============
06/11/2023 13:41 BackUpHelper prepareSyncData
06/11/2023 13:42 AssetScannerSdkManager getAndFilterPhotoVideoFolderItem, descriptionItemList contains 570 items
06/11/2023 13:42 AssetScannerSdkManager getLocalMusic, descriptionItemList contains 1 items
06/11/2023 13:42 AssetScannerSdkManager getLocalDocs, descriptionItemList contains 6 items
06/11/2023 13:42 AssetScannerSdkManager getAssets, isRestore = false, descriptionItemList contains 577 items
"""
pat = r"(?<=START BACKUP DETAILS ===============\n).*?(?=\s*^[^\n]+START BACKUP DETAILS END)"
for block in re.findall(pat, text, flags=re.S | re.M):
print("-" * 80)
print(block)
print("-" * 80)
指纹:
--------------------------------------------------------------------------------
06/11/2023 13:41 BackUpLauncher startBackup called with 68
06/11/2023 13:41 BackUpLauncher startBackup called with isScheduledSync = true
06/11/2023 13:41 BackUpLauncher startBackup called with isApplicationForeground = false
--------------------------------------------------------------------------------
评论
if "START BACKUP DETAILS" in text_line: