提问人:humbleStrength 提问时间:11/2/2023 最后编辑:humbleStrength 更新时间:11/3/2023 访问量:53
如何使用awk将文本文件中的多个模式捕获到多个文本块中,并将每个块打印到一个新文件中
How to use awk to capture multiple patterns in text file into several blocks of text and print each block to a new file
问:
我有这个绑定dns stats文本文件,:sample_data.txt
+++ Statistics Dump +++ (1698804161)
++ Incoming Requests ++
34199522 QUERY
2 STATUS
12 UPDATE
++ Incoming Queries ++
2 RESERVED0
19539834 A
203203 NS
239215 CNAME
25636 SOA
235650 PTR
96 HINFO
922800 MX
616897 TXT
5 RP
13 AFSDB
8 SIG
7 KEY
9112095 AAAA
15 LOC
18 EID
339894 SRV
75 NAPTR
7 KX
11 CERT
232 A6
55 DNAME
5 APL
2172 DS
14 SSHFP
6 IPSECKEY
35 RRSIG
183 NSEC
135429 DNSKEY
3 DHCID
8 NSEC3
6 NSEC3PARAM
196 TLSA
27 TYPE53
21 HIP
28 TYPE59
20 TYPE60
28 TYPE61
3 TYPE62
73 TYPE63
156 TYPE64
2815625 TYPE65
2297 SPF
7 TYPE108
11 TYPE109
752 AXFR
1115 ANY
4 DLV
5530 Others
++ Outgoing Queries ++
[View: default]
[View: _bind]
++ Name Server Statistics ++
34199536 IPv4 requests received
33035183 requests with EDNS(0) received
1433 requests with TSIG received
74232 TCP requests received
20645922 auth queries rejected
4604 recursive queries rejected
730 transfer requests rejected
12 update requests rejected
34199536 responses sent
71843 truncated responses sent
33035183 responses with EDNS(0) sent
1433 responses with TSIG sent
24625387 queries resulted in successful answer
33852582 queries resulted in authoritative answer
135913 queries resulted in non authoritative answer
135913 queries resulted in referral answer
3911181 queries resulted in nxrrset
2 queries resulted in SERVFAIL
5316014 queries resulted in NXDOMAIN
210273 other query failures
++ Zone Maintenance Statistics ++
234 IPv4 notifies sent
++ Resolver Statistics ++
[Common]
[View: default]
[View: _bind]
++ Cache DB RRsets ++
[View: default]
[View: _bind (Cache: _bind)]
++ Socket I/O Statistics ++
27 UDP/IPv4 sockets opened
3 TCP/IPv4 sockets opened
25 UDP/IPv4 sockets closed
74330 TCP/IPv4 sockets closed
74338 TCP/IPv4 connections accepted
42 TCP/IPv4 recv errors
++ Per Zone Query Statistics ++
[sampledomain1.com]
1898118 auth queries rejected
77 recursive queries rejected
16 transfer requests rejected
12 update requests rejected
5125667 queries resulted in successful answer
10890351 queries resulted in authoritative answer
79163 queries resulted in non authoritative answer
79163 queries resulted in referral answer
2997088 queries resulted in nxrrset
2767596 queries resulted in NXDOMAIN
[sampledomain2.com]
18026742 auth queries rejected
1945 recursive queries rejected
10 transfer requests rejected
18773892 queries resulted in successful answer
20863228 queries resulted in authoritative answer
56644 queries resulted in non authoritative answer
56644 queries resulted in referral answer
778332 queries resulted in nxrrset
1311004 queries resulted in NXDOMAIN
--- Statistics Dump --- (1698804161)
我尝试做的是使用 awk 捕获每个记录分隔符之间的文本块,不包括它,并将该块输出到新文件。因此,新文件 file1.txt 和 file2.txt 将包含:[anydomainname]
文件1.txt
1898118 auth queries rejected
77 recursive queries rejected
16 transfer requests rejected
12 update requests rejected
5125667 queries resulted in successful answer
10890351 queries resulted in authoritative answer
79163 queries resulted in non authoritative answer
79163 queries resulted in referral answer
2997088 queries resulted in nxrrset
2767596 queries resulted in NXDOMAIN
文件2.txt
18026742 auth queries rejected
1945 recursive queries rejected
10 transfer requests rejected
18773892 queries resulted in successful answer
20863228 queries resulted in authoritative answer
56644 queries resulted in non authoritative answer
56644 queries resulted in referral answer
778332 queries resulted in nxrrset
1311004 queries resulted in NXDOMAIN
分别。
现在,这是我的工作:
awk '/^\[[[:lower:]]/ {p=1; next};
/^\[[[:lower:]]/ {p=0};
{if (p==1) {print last} {last=$0}}' sample_data.txt | tail -n+2
这让我明白了这一点:
1898118 auth queries rejected
77 recursive queries rejected
16 transfer requests rejected
12 update requests rejected
5125667 queries resulted in successful answer
10890351 queries resulted in authoritative answer
79163 queries resulted in non authoritative answer
79163 queries resulted in referral answer
2997088 queries resulted in nxrrset
2767596 queries resulted in NXDOMAIN
18026742 auth queries rejected
1945 recursive queries rejected
10 transfer requests rejected
18773892 queries resulted in successful answer
20863228 queries resulted in authoritative answer
56644 queries resulted in non authoritative answer
56644 queries resulted in referral answer
778332 queries resulted in nxrrset
1311004 queries resulted in NXDOMAIN
但正如你所看到的,我有两个问题。
- 我仍然需要将每个块拆分到其各自的域部分
- 然后,我需要将该文本块输出到新文件。
我可以通过扩展我当前的 awk 命令来做到这一点,使用 ,和条件,然后为每个块打印到文件?我现在才知道我是否可以在我思考时用 awk 做到这一点。TIA。BEGIN
for
编辑:扩展我的问题,还包括如何输出包含行前文本块的第三个文件,因此这将在第一个模式匹配之前,现在将是第二个文件文本块的第二个入口点。++ Per Zone Query Statistics ++
[anydomain]
文件3.txt
+++ Statistics Dump +++ (1698804161)
++ Incoming Requests ++
34199522 QUERY
2 STATUS
12 UPDATE
++ Incoming Queries ++
2 RESERVED0
19539834 A
203203 NS
239215 CNAME
25636 SOA
235650 PTR
96 HINFO
922800 MX
616897 TXT
5 RP
13 AFSDB
8 SIG
7 KEY
9112095 AAAA
15 LOC
18 EID
339894 SRV
75 NAPTR
7 KX
11 CERT
232 A6
55 DNAME
5 APL
2172 DS
14 SSHFP
6 IPSECKEY
35 RRSIG
183 NSEC
135429 DNSKEY
3 DHCID
8 NSEC3
6 NSEC3PARAM
196 TLSA
27 TYPE53
21 HIP
28 TYPE59
20 TYPE60
28 TYPE61
3 TYPE62
73 TYPE63
156 TYPE64
2815625 TYPE65
2297 SPF
7 TYPE108
11 TYPE109
752 AXFR
1115 ANY
4 DLV
5530 Others
++ Outgoing Queries ++
[View: default]
[View: _bind]
++ Name Server Statistics ++
34199536 IPv4 requests received
33035183 requests with EDNS(0) received
1433 requests with TSIG received
74232 TCP requests received
20645922 auth queries rejected
4604 recursive queries rejected
730 transfer requests rejected
12 update requests rejected
34199536 responses sent
71843 truncated responses sent
33035183 responses with EDNS(0) sent
1433 responses with TSIG sent
24625387 queries resulted in successful answer
33852582 queries resulted in authoritative answer
135913 queries resulted in non authoritative answer
135913 queries resulted in referral answer
3911181 queries resulted in nxrrset
2 queries resulted in SERVFAIL
5316014 queries resulted in NXDOMAIN
210273 other query failures
++ Zone Maintenance Statistics ++
234 IPv4 notifies sent
++ Resolver Statistics ++
[Common]
[View: default]
[View: _bind]
++ Cache DB RRsets ++
[View: default]
[View: _bind (Cache: _bind)]
++ Socket I/O Statistics ++
27 UDP/IPv4 sockets opened
3 TCP/IPv4 sockets opened
25 UDP/IPv4 sockets closed
74330 TCP/IPv4 sockets closed
74338 TCP/IPv4 connections accepted
42 TCP/IPv4 recv errors
答:
1赞
anubhava
11/2/2023
#1
这应该适合您:awk
awk -v hdr="file3.txt" '
/^\+\+ Per Zone Query Statistics/ {
hdr = ""
}
hdr {
print > hdr
}
/^\[[[:lower:]]/ { # indicates start domain [...]
close(fn)
fn = "file" ++f ".txt" # construct output filename `fn`
next
}
/^[^[:blank:]]/ { # indicates end of block
fn = ""
}
fn {
print > fn # prints each record to fn
}' file
评论
0赞
humbleStrength
11/2/2023
这行得通!您能否将其扩展为也输出另一个文件,该文件在第一个域匹配之前具有文本块?还是这样更复杂?那么所有高于记录分隔符的东西?++ Per Zone Query Statistics ++
0赞
anubhava
11/2/2023
可以做到,您能否编辑您的问题并显示此附加文件的预期内容?
0赞
anubhava
11/2/2023
查看我更新的答案
0赞
humbleStrength
11/2/2023
工程!将接受作为解决方案。您是否介意说出 awk 在匹配模式上方打印文本(如在 file3 输出中)与在匹配模式之后打印文本(如在 file1 和 file2 输出中)的区别是什么?我正在努力更好地理解。我不明白 awk 如何捕获上面的文本,而对于其他部分,它按照匹配模式捕获文本。/^\+\+ Per Zone Query Statistics/
/^\[[[:lower:]]/
1赞
anubhava
11/2/2023
因为我们在命令行中设置了一个变量并继续打印,直到我们命中行。对于其他情况,当我们找到一条带有图案的线时,我们开始打印,当我们找到一条非黑色的起始线时,我们停止打印。file3.txt
hdr
++ Per Zone Query Statistics
^\[[[:lower:]]
0赞
potong
11/3/2023
#2
这可能对你有用 (GNU csplit):
csplit -f file -b '%d.txt' --sup file '/^\[\w\+\.\w\+\]$/' '{*}'
在开始/结束且包含至少一个 .[
]
.
命名从 0 开头的文件。filen.txt
n
注意第一个文件 () 将包含第一个域之前的所有行。file0.txt
评论
> variable
print