如何使用正则表达式在文件中搜索一个或多个字符串，并分别计算每个字符串的数量？-解网

问：

因此，我试图在文件的每一行中找到一个或多个字符串，并计算每个字符串在文件中出现的次数。在某些行中只有一个字符串，但在其他行中，如果这有意义，可能会有多个目标字符串。我正在尝试使用正则表达式来执行此操作。

所以我尝试的如下（已经读取了文件并使用 .readlines 将其分隔成行）：

1count=0
2count=0
3count=0

Pattern=r'(?i)(\bString1\b)|(\bString2\b)|(\bString3\b)'

i=0
while i!=len(lines) 
    match=re.search(pattern, lines[i]) 

    if match:
        if match.group(1):
            1count=1count+1
        elif match.group(2):
            2count=2count+1
        elif match.group(3):
            3count=3count+1
    i=i+1

当行中没有多个匹配项时，这有效，但是当有匹配项时，它显然只计算第一个匹配项，然后继续前进。有没有办法让我扫描整条线？我知道 re.findall 会找到所有匹配项，但它会将它们放入一个数组中，我不知道如何可靠地计算每个单词的匹配项数，因为 findall 中的匹配项在数组中具有不同的索引每次循环。

Python 正则表达式搜索字符串匹配

import re
count1=0
count2=0
count3=0
data = "String1 String2 String2 String3\nString1 String1\nString3"
Pattern=r'(?i)(\bString1\b)|(\bString2\b)|(\bString3\b)'
lines = data.split('\n')
all_matches = []
i = 0
while i!=len(lines): 
    match=re.findall(Pattern, lines[i])
    all_matches.extend(match)
    i += 1
count1 = len([el for el in all_matches if el[0] == 'String1'])
count2 = len([el for el in all_matches if el[1] == 'String2'])
count3 = len([el for el in all_matches if el[2] == 'String3'])
    
print(count1, count2, count3)

注意：将返回元组列表，其中元组的第一项对应于第一组，依此类推。findall

all_matches将是元组列表，如果没有匹配的元组，则每个元组的形状如下：(matched item for string1, matched item for string2, matched item for string3)''

[('String1', '', ''), ('', 'String2', ''), ('', 'String2', ''), ...]

例如，在计算时，我们正在创建一个与 String1 匹配的元素列表（这里我们看到的条件，元组的第一个元素等于 'String1'），如下所示：count1

first_group = [el for el in all_matches if el[0] == 'String1']

然后我们将其长度返回为这些元素的 count1length 值：

count1 = len(first_group)

如果你需要提取可能包含变体的匹配项，比如或，你不能将匹配的字符串用作字典键（因为这样会将每个唯一的字符串视为一个单独的实体;所以你会得到“12 次出现 123”和“1 次出现 234”而不是“13 次出现”）;在这种情况下，我可能会尝试使用命名子组。r"c[ei]*ling"r"\d+"Counter\d+

    for match in re.finditer(r"(?P<ceiling>c[ei]*ling)|(?P<number>\d+)", line):
        matches = match.groupdict()
        for key in matches.keys():
            if matches[key] is not None:
                count.update(key)

我已经得到了第一个在我的编码中工作的例子，谢谢！我想这实际上是两者的结合！我更改为，它适用于更新每个匹配项的计数，然后刚刚添加了用于打印结果的 for 循环！count.update(match.group(0))count.update(match.groups())if k is not None:

1赞 Armali 4/19/2023 #3

只是另一个变体正在使用及其方法。由于没有必要将数据分成几行，让我们假设它全部在：numpycount_nonzerodata

import numpy as np
# count non-empty strings along axis 0 (the matches for each word)
count = np.count_nonzero(np.array(re.findall(Pattern, data)), 0)

上一个：（斯普伦克）如何使用 rex 命令对双引号括起来的通配符进行模式匹配？

下一个：条件正则表达式匹配前缀（和 or）后缀，但不能匹配两者都不带的单词

如何使用正则表达式在文件中搜索一个或多个字符串，并分别计算每个字符串的数量？

How to search for one or more strings in a file using regex, and count the number of each string separately?

评论

评论

评论