测试字符串是否包含重复字符-解网

问：

我正在尝试找出最轻的方法，以尽可能轻的方式确定字符串是否有任何重复字符。我尝试过搜索类似的问题，但找不到任何问题。它还需要尽可能的短路方式，因为我将检查相当多的字符串（我可以处理将其放入循环中等）

例如：

a = "12348546478"
#code to check multiple characters
print(result)

结果：重复 8 次，重复 4 次

该代码将检查重复的字符并打印出重复的字符。我不需要知道它重复了多少次，只需要知道它是否重复了。

python-3.x 集合

评论

0赞 Armageddon80 8/19/2015

我自己也试过了，但我只是不太擅长python，我是新手。代码有太多的步骤，使循环变得很大。我一直在寻找一种更短的pythonic方式。

答：

5赞 Mazdak 8/19/2015 #1

您可以使用集合。计数器 ：

>>> from collections import Counter
>>> [i for i,j in Counter(a).items() if j>1]
['4', '8']

或者你可以使用一个自定义函数：

>>> def finder(s):
...    seen,yields=set(),set()
...    for i in s:
...      if i in seen:
...         if i not in yields:
...            yield i
...            yields.add(i)
...         else :
...            yields.add(i)
...      else:
...          seen.add(i)
... 
>>> list(finder(a))
['4', '8']

或者使用集合理解的方法：str.count

>>> set(i for i in a if a.count(i)>1)
set(['8', '4'])

所有方法的基准测试，显示最后 2 种方法（自定义函数和集合推导式比）：Counter

from timeit import timeit


s1="""
a = "12348546478"
[i for i,j in Counter(a).items() if j>1]

"""
s2="""
def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i in seen:
         if i not in yields:
            yield i
            yields.add(i)
         else :
            yields.add(i)
      else:
          seen.add(i)

a = "12348546478"
list(finder(a))

"""

s3="""
a = "12348546478"
set(i for i in a if a.count(i)>1)
"""

print '1st: ' ,timeit(stmt=s1, number=100000,setup="from collections import Counter")
print '2nd : ',timeit(stmt=s2, number=100000)
print '3rd : ',timeit(stmt=s2, number=100000)

结果：

1st:  0.726881027222
2nd :  0.265578985214
3rd :  0.26243185997

我也尝试了长字符串（），但仍然得到相同的结果：a = "12348546478"*10000

1st:  25.5780302721341
2nd :  11.8482989001177
3rd :  11.926538944245

无论如何，我的建议是使用更pythonic的集合推导：

set(i for i in a if a.count(i)>1)

评论

0赞 muddyfish 8/19/2015

Intead of ，我会使用一个集合的理解，而不是使用set(i for i in a if a.count(i)>1){i for i in a if a.count(i)>1}

0赞 Mazdak 8/19/2015

@muddyfish 这没有任何区别！

0赞 Daniel Hao 2/26/2021

性能衡量有更新 - 请参阅下文（最后一篇文章）。

30赞 muddyfish 8/19/2015 #2

或者你可以做

len(set(x)) == len(x)

如果字符串没有重复字符，则返回布尔值，否则返回布尔值。TrueFalse

该类型不能有任何重复项，因此当字符串变成一个字符串时，它会分解为字符。长度的差异显示了有多少重复的字符（但不是字符本身）set

评论

0赞 Armageddon80 8/19/2015

这是一个很好的答案，但在我的情况下，这不是我需要的。

0赞 muddyfish 8/19/2015

对于其他任何可能想要这个的人，我的回答会比@Kasramvd的答案更快，但他们的答案会更灵活。

0赞 Mazdak 8/19/2015

这根本不是这个问题的答案。

3赞 Abhi 8/19/2015 #3

您还可以使用字典来获取唯一字符的计数，因为字典中的键始终是唯一的。

import collections

d = collections.defaultdict(int)
for c in a:
    d[c] += 1

d 将包含 {'1'： 1， '3'： 1， '2'： 1， '5'： 1， '4'： 3， '7'： 1， '6'： 1， '8'： 2}

Kasramvd 给出的答案是一个很好的方法。

1赞 ASB 9/30/2018 #4

您可以使用以下功能来检查字符重复。如果没有重复的字符，则返回 True，否则返回 False。

Python 代码

def isThereRepitition(x):
   for char in x: #copies and iterates passing a value to char everytime
       x=x[1:] #deletes the first character in the string x
       if char in x: #checks if there is char in x string
           return False
return True

0赞 ambati charishma 9/30/2018 #5

import collections


 a = "12348546478"
 countOfWords = collections.Counter(a)
 result = [i for i in countOfWords if countOfWords[i]>1]
 result

试试这个

0赞 Daniel Hao 1/27/2021 #6

从 .（2021年1月26日）`future`

只是出于好奇，今晚在 Python3.8 中进行了 2 次更改，重新运行此测试，并得到了非常不同的结果：

change 1- from collections import Counter # 先导入 this
更改 2 - 制作更大的数字字符串： a = “123485464781233299345355234234355234458”

results:
1st:  0.4764095
2nd :  0.6692353
3rd :  0.6512726000000002

评论

1赞 1/28/2021

所以看起来 Counter 是更快的。首先加载 lib 确实有所作为。谢谢更新。

1赞 Subham 2/26/2021 #7

简化@Kasravnd的第二个答案，

第一种方法：

def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i not in seen:
         seen.add(i)
         
      elif i not in yields:
         yield i
         yields.add(i)
         
a = "12348546478"
print(list(finder(a)))

第二种方法

def finder(s):
    seen,yields=set(),set()
    for i in s:
      if i in seen and i not in yields:
          yield i
          yields.add(i)
      else:
          seen.add(i)
    
a = "12348546478"
print(list(finder(a)))

第三种方法

def finder(s):
    yield from {i for i, v in enumerate(s) if v in s[i+1:]}

a = "12348546478"
print(list(set(a[i] for i in finder(a))))

都产生重复的东西

['4', '8']

[Program finished]

@muddyfish是检查是否有重复项的最简单方法。

0赞 schotti 11/29/2021 #8

或者用的：numpyunique

import numpy as np
chars, times = np.unique(list("12348546478"), return_counts = True)
chars[times > 1]

上一个：python csv 写入了太多小数

下一个：有没有更好的方法来计算智利 RUT 验证位？