Python 列表推导有时很慢

Python list comprehension sometimes slow

提问人:Luchian Grigore 提问时间:2/12/2021 最后编辑:Luchian Grigore 更新时间:2/17/2021 访问量:205

问:

我不久前写的一些 python 代码又回来困扰着我。它运行缓慢,我将问题隔离出来以创建列表。我正在处理一些相当大的列表,在我进行一些重大的重构之前(可能不会完成),我想知道是否有专家可能会推荐的东西。

如何提高此代码的性能?

完整 ideone 代码: https://ideone.com/KX39t2

代码如下所示:

OBJECTS_NUM = 200

if __name__ == "__main__":  
    allLists = []
    for i in range(0, 500):
        starttime = currentTimeMicro()

        newlist = [Obj() for k in range(0, OBJECTS_NUM)]

        endtime = currentTimeMicro()
        elapsed = endtime - starttime
        print('Elapsed ' + str(elapsed))
        allLists.append(newlist)

输出片段为:

Elapsed 242
Elapsed 280
Elapsed 286
Elapsed 292
Elapsed 301
Elapsed 295
Elapsed 287
Elapsed 236
Elapsed 303
Elapsed 282
Elapsed 278
Elapsed 902
Elapsed 8909
Elapsed 167
Elapsed 129
Elapsed 164
Elapsed 183
Elapsed 160
Elapsed 166
Elapsed 159
Elapsed 158
Elapsed 127
Elapsed 158
Elapsed 158
Elapsed 157
Elapsed 169
Elapsed 538
Elapsed 155
Elapsed 128
Elapsed 169
Elapsed 156
Elapsed 157
Elapsed 156
Elapsed 161
Elapsed 157
Elapsed 127
Elapsed 168
Elapsed 158
Elapsed 172
Elapsed 154
Elapsed 546
Elapsed 156
Elapsed 128
Elapsed 159

因此,大多数情况下,创建列表大约需要 200-300 个,但有时它会高达 500 甚至 8900 个。

我假设这是某种与内存相关的行为,但我远未精通 Python 来查明问题所在。

enter image description here

Python 优化

评论

3赞 shahkalpesh 2/12/2021
该怎么办?Obj()
0赞 user202729 2/12/2021
确实,有时 Python 中的垃圾收集器可能会运行,但您对 Python 有什么期望?
0赞 Luchian Grigore 2/12/2021
@shahkalpesh查看链接的 ideone 代码段 - 创建一个大小适中的对象(不太小也不太大)
0赞 user202729 2/12/2021
您知道最小可重现示例的典型情况... -- 禁用垃圾回收器有帮助吗?(尽管它可能有其他问题 - 但是,如果您需要实时性能,这可能是唯一的方法) - 您也可以尝试不分配新对象。
0赞 Luchian Grigore 2/12/2021
@user202729有什么方法可以验证它是 GC?(最小重现示例在 IDEONE 上链接)

答:

0赞 Lohith 2/12/2021 #1

如果它只是创建相同的 Obj。 然后考虑

取代:

    newlist = [Obj() for k in range(0, OBJECTS_NUM)]

跟:

newlist = [copy.deepcopy(Obj)]*OBJECTS_NUM

deepcopy()有助于创建独特的对象,这意味着如果我们更改一个对象,它不会影响其他对象。 随着这种变化。

import time
import copy
import gc
gc.disable()
def currentTimeMicro(): return int(round(time.time() * 1000000))


x = 0


class Obj(object):
    def __init__(self):
        self.dummy = 0
        self.dumb = 42
        self.dumber = 'ftw'
        self.dummy1 = 0
        self.dumb1 = 42
        self.dumber1 = 'ftw'
        self.dummy2 = 0
        self.dumb2 = 42
        self.dumber2 = 'ftw'
        self.testList = [66, 55, x, 13, 31, 55, x, 13, 31, 55]
        #x += 1


OBJECTS_NUM = 200

if __name__ == "__main__":
    allLists = []
    for i in range(0, 500):
        starttime = currentTimeMicro()
        #newlist = [Obj() for k in range(0, OBJECTS_NUM)]
        newlist = [copy.deepcopy(Obj)]*OBJECTS_NUM
        endtime = currentTimeMicro()
        elapsed = endtime - starttime
        print('Elapsed ' + str(elapsed))
        allLists.append(newlist)
    print(str(len(allLists)))

输出:

    Elapsed 18
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 2
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 9
Elapsed 4
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 14
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 7
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 7
Elapsed 3
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 7
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 14
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 4
Elapsed 8
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 8
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 14
Elapsed 4
Elapsed 14
Elapsed 3
Elapsed 3
Elapsed 13
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 7
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 7
Elapsed 3
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 12
Elapsed 6
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 2
Elapsed 5
Elapsed 3
Elapsed 7
Elapsed 4
Elapsed 3
Elapsed 6
Elapsed 4
Elapsed 4
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 6
Elapsed 4
Elapsed 4
Elapsed 5
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 6
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 7
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 7
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 12
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 7
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 2
Elapsed 3
Elapsed 5
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 7
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 2
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 2
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 7
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 2
Elapsed 2
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 7
Elapsed 3
Elapsed 5
Elapsed 2
Elapsed 2
Elapsed 4
Elapsed 2
Elapsed 4
Elapsed 3
Elapsed 2
Elapsed 5
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 4
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 4
Elapsed 5
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 6
Elapsed 3
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 7
Elapsed 4
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 5
Elapsed 3
Elapsed 4
Elapsed 3
Elapsed 3
Elapsed 3
500

评论

1赞 Alain T. 2/16/2021
这将创建一个列表,其中包含对重复 OBJECTS_NUM 次的类的引用。我相信 OP 正在寻找该类的不同实例。
0赞 Lohith 2/17/2021
感谢您指出这一点,我已经使用 deepcopy 功能更新了相同的代码。
2赞 Alain T. 2/16/2021 #2

当测量非常小的时间间隔时,您会受到系统多进程调度的干扰。您的程序从来都不是系统上运行的唯一内容,并且会经常中断以将时间分配给其他进程(尽管它们可能很短)。为了获得更好的比较基础,您需要测量至少需要几毫秒的东西。

为了加快列表创建速度,可以通过将初始化时间延迟到列表中每个对象实例的首次使用来分散初始化时间。这可以通过创建一个列表类来实现,该类在首次引用对象时“及时”实例化对象。

class ObjectList(list):
    def __init__(self,aClass,count):
        self.aClass = aClass
        self[:] = [None]*count
        
    def __getitem__(self,index):
        if isinstance(index,slice):
            return [self[i] for i in range(len(self))[index]]
        item =  super().__getitem__(index)
        if item is None:
            self[index] = item = self.aClass()
        return item

用法:

X = ObjectList(Obj,1000)

print(X[500])      # <__main__.Obj object at 0x7fa7ac805550>
print(X[502])      # <__main__.Obj object at 0x7fa7ac805748>
print(X[499:504])
# [<__main__.Obj object at 0x7fa7aaee2da0>, 
   <__main__.Obj object at 0x7fa7ac805550>, 
   <__main__.Obj object at 0x7fa7ac864a58>, 
   <__main__.Obj object at 0x7fa7ac805748>,
   <__main__.Obj object at 0x7fa7ac864a90>]

性能(在此示例中大约快 60 倍):

No = 1000000

from timeit import timeit

t = timeit(lambda:[Obj() for _ in range(No)],number=1)
print("comprehension",t) # 0.886055106

t = timeit(lambda:ObjectList(Obj,No),number=1)
print("ObjectList",t)  # 0.013847651000000072

请注意,如果对象创建顺序很重要,这可能会产生一些不良的副作用。