提问人:jespern 提问时间:11/23/2008 最后编辑:Mateen Ulhaqjespern 更新时间:10/21/2023 访问量:1621072
如何将列表拆分为大小相等的块?
How do I split a list into equally-sized chunks?
答:
下面是一个生成器,可以生成大小均匀的块:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
对于 Python 2,使用 而不是 :xrange
range
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]
下面是一个列表理解单行。不过,上述方法更可取,因为使用命名函数使代码更易于理解。对于 Python 3:
[lst[i:i + n] for i in range(0, len(lst), n)]
对于 Python 2:
[lst[i:i + n] for i in xrange(0, len(lst), n)]
评论
itertools.islice(iterator, chunk_size)
islice
如果您知道列表大小:
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
如果你不这样做(迭代器):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
在后一种情况下,如果您可以确定序列始终包含给定大小的整数块(即没有不完整的最后一个块),则可以以更漂亮的方式重新表述它。
下面是一个处理任意可迭代对象的生成器:
def split_seq(iterable, size):
it = iter(iterable)
item = list(itertools.islice(it, size))
while item:
yield item
item = list(itertools.islice(it, size))
例:
>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
呵呵,一行版
In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step))
In [49]: chunk(range(1,100), 10)
Out[49]:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
[41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
[71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
评论
def chunk
chunk=lambda
直接来自(旧的)Python 文档(itertools 的配方):
from itertools import izip, chain, repeat
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
当前版本,如 J.F.Sebastian 所建议:
#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
我猜圭多的时间机器工作——工作过——会工作——会工作——又开始工作了。
这些解决方案之所以有效,是因为(或早期版本中的等效解决方案)在列表中重复创建了一个迭代器。 然后有效地执行“每个”迭代器的循环;因为这是同一个迭代器,所以它由每个这样的调用推进,导致每个这样的 zip-roundrobin 生成一个项目的元组。[iter(iterable)]*n
n
izip_longest
n
Python ≥3.12
def split_seq(seq, num_pieces):
start = 0
for i in xrange(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
用法:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for seq in split_seq(seq, 3):
print seq
def chunk(lst):
out = []
for x in xrange(2, len(lst) + 1):
if not len(lst) % x:
factor = len(lst) / x
break
while lst:
out.append([lst.pop(0) for x in xrange(factor)])
return out
>>> def f(x, n, acc=[]): return f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>>
如果你在括号里 - 我拿起了一本关于 Erlang 的书:)
超级简单的东西:
def chunks(xs, n):
n = max(1, n)
return (xs[i:i+n] for i in range(0, len(xs), n))
对于 Python 2,请使用 代替 .xrange()
range()
无需调用 len(),这适用于大型列表:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
这是针对可迭代的:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
上述功能性风味:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
或:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
或:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))
评论
len()
def chunk(input, size):
return map(None, *([iter(input)] * size))
评论
return map(lambda *x: x, *([iter(input)] * size))
简约而优雅
L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]
或者,如果您愿意:
def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)
例如,如果您的区块大小为 3,则可以执行以下操作:
zip(*[iterable[i::3] for i in range(3)])
来源: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
当我的块大小是我可以输入的固定数字时,我会使用它,例如“3”,并且永远不会改变。
评论
zip_longest
itertools
考虑使用 matplotlib.cbook pieces
例如:
import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
print s
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
我意识到这个问题很老了(在谷歌上偶然发现了它),但可以肯定的是,像下面这样的东西比任何庞大的复杂建议都简单明了得多,并且只使用切片:
def chunker(iterable, chunksize):
for i,c in enumerate(iterable[::chunksize]):
yield iterable[i*chunksize:(i+1)*chunksize]
>>> for chunk in chunker(range(0,100), 10):
... print list(chunk)
...
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...
请参阅此参考资料
>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>>
蟒蛇3
评论
zip(*[iter(range(7))]*3)
[(0, 1, 2), (3, 4, 5)]
6
我知道这有点老了,但还没有人提到numpy.array_split
:
import numpy as np
lst = range(50)
np.array_split(lst, 5)
结果:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
评论
np.split(lst, np.arange(0, len(l), chunk_size))
- 适用于任何可迭代的
- 内部数据是生成器对象(不是列表)
- 一个衬垫
In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n)) In [260]: list(list(x) for x in get_in_chunks(range(30),7)) Out[260]: [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27], [28, 29]]
评论
我非常喜欢 tzot 和 J.F.Sebastian 提出的 Python 文档版本, 但它有两个缺点:
- 它不是很明确
- 我通常不希望最后一个块中有填充值
我在我的代码中经常使用这个:
from itertools import islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield tuple(islice(iterable, n)) or iterable.next()
更新:惰性块版本:
from itertools import chain, islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield chain([next(iterable)], islice(iterable, n-1))
toolz 库具有以下功能:partition
from toolz.itertoolz.core import partition
list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
如何将列表拆分为大小均匀的块?
对我来说,“大小均匀的块”意味着它们都是相同的长度,或者除了该选项之外,长度差异最小。例如,5 个篮子装 21 个项目可能会产生以下结果:
>>> import statistics
>>> statistics.variance([5,5,5,5,1])
3.2
>>> statistics.variance([5,4,4,4,4])
0.19999999999999998
选择后一种结果的一个实际理由是:如果你使用这些函数来分配工作,你已经内置了一个可能比其他函数更早完成的前景,所以当其他人继续努力工作时,它会无所事事。
在这里对其他答案的批评
当我最初写这个答案时,其他答案都不是大小均匀的块——它们都在末尾留下一个短块,所以它们没有很好地平衡,并且长度的方差高于必要的方差。
例如,当前最热门的答案以以下方式结尾:
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
其他的,如 ,并且都返回:。在我看来,这只是填充物,相当不优雅。它们没有均匀地分块可迭代对象。list(grouper(3, range(7)))
chunk(range(7), 3)
[(0, 1, 2), (3, 4, 5), (6, None, None)]
None
为什么我们不能更好地划分这些?
循环解决方案
使用 的高级平衡解决方案,这是我今天可能做的方式。设置如下:itertools.cycle
from itertools import cycle
items = range(10, 75)
number_of_baskets = 10
现在我们需要我们的列表来填充元素:
baskets = [[] for _ in range(number_of_baskets)]
最后,我们将要分配的元素与篮子的循环一起压缩,直到我们用完元素,从语义上讲,这正是我们想要的:
for element, basket in zip(items, cycle(baskets)):
basket.append(element)
结果如下:
>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
[11, 21, 31, 41, 51, 61, 71],
[12, 22, 32, 42, 52, 62, 72],
[13, 23, 33, 43, 53, 63, 73],
[14, 24, 34, 44, 54, 64, 74],
[15, 25, 35, 45, 55, 65],
[16, 26, 36, 46, 56, 66],
[17, 27, 37, 47, 57, 67],
[18, 28, 38, 48, 58, 68],
[19, 29, 39, 49, 59, 69]]
为了生产这个解决方案,我们编写了一个函数,并提供类型注释:
from itertools import cycle
from typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
baskets = [[] for _ in range(min(maxbaskets, len(items)))]
for item, basket in zip(items, cycle(baskets)):
basket.append(item)
return baskets
在上面,我们采用我们的项目列表,以及篮子的最大数量。我们创建一个空列表列表,以循环方式附加每个元素。
片
另一个优雅的解决方案是使用切片 - 特别是不太常用的切片的 step 参数。即:
start = 0
stop = None
step = number_of_baskets
first_basket = items[start:stop:step]
这特别优雅,因为切片并不关心数据有多长——结果,我们的第一个篮子,只在它需要的长度上。我们只需要增加每个篮子的起点。
事实上,这可能是一行,但为了可读性并避免代码行过长,我们将使用多行代码:
from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
n_baskets = min(maxbaskets, len(items))
return [items[i::n_baskets] for i in range(n_baskets)]
并且从 itertools 模块将提供一种延迟迭代的方法,就像问题中最初要求的方法一样。islice
我不认为大多数用例会受益匪浅,因为原始数据已经完全体现在列表中,但对于大型数据集,它可以节省近一半的内存使用量。
from itertools import islice
from typing import List, Any, Generator
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
n_baskets = min(maxbaskets, len(items))
for i in range(n_baskets):
yield islice(items, i, None, n_baskets)
查看结果:
from pprint import pprint
items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])
更新了以前的解决方案
这是另一个平衡的解决方案,它改编自我过去在生产中使用的函数,它使用模运算符:
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in range(maxbaskets)]
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
我创建了一个生成器,如果你把它放到一个列表中,它也会做同样的事情:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in range(baskets):
yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
最后,由于我看到上述所有函数都以连续的顺序返回元素(正如它们给定的那样):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in range(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in range(length)]
输出
要测试它们,请执行以下操作:
print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))
打印出:
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]
请注意,连续生成器提供与其他两个生成器具有相同长度模式的块,但这些项目都是有序的,并且它们与划分离散元素列表一样均匀。
我很惊讶没有人想到使用双参数形式:iter
from itertools import islice
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
这适用于任何可迭代对象,并延迟生成输出。它返回元组而不是迭代器,但我认为它仍然具有一定的优雅性。它也没有填充;如果您想要填充,上述的简单变体就足够了:
from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):
it = chain(iter(it), repeat(padval))
return iter(lambda: tuple(islice(it, size)), (padval,) * size)
演示:
>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
与基于 的解决方案一样,上述内容始终是垫子。据我所知,对于一个可以选择填充的函数,没有一行或两行的迭代工具配方。通过结合上述两种方法,这种方法非常接近:izip_longest
_no_padding = object()
def chunk(it, size, padval=_no_padding):
if padval == _no_padding:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(padval))
sentinel = (padval,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
我相信这是建议的最短的分块器,它提供了可选的填充。
正如 Tomasz Gandor 所观察到的,如果两个填充块器遇到一长串填充值,它们将意外停止。这是以合理的方式解决该问题的最后一个变体:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
演示:
>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
评论
我专门为此目的写了一个小库,可以在这里找到。该库的功能特别高效,因为它是作为生成器实现的,因此在某些情况下可以节省大量内存。它也不依赖于切片表示法,因此可以使用任意迭代器。chunked
import iterlib
print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]
就像@AaronHall我来到这里寻找大小大致均匀的块一样。对此有不同的解释。就我而言,如果所需的大小为 N,我希望每个组的大小为 >=N。 因此,在上述大多数情况下创建的孤儿应重新分配给其他组。
这可以使用以下方法完成:
def nChunks(l, n):
""" Yield n successive chunks from l.
Works for lists, pandas dataframes, etc
"""
newn = int(1.0 * len(l) / n + 0.5)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
(来自将列表拆分为长度大致相等的 N 个部分),只需将其称为 nChunks(l,l/n) 或 nChunks(l,floor(l/n))
让 r 是块大小,L 是初始列表,你可以这样做。
chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)]
使用列表推导式:
l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]
另一个更明确的版本。
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
我在这个问题的副本中看到了最棒的 Python 式答案:
from itertools import zip_longest
a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
您可以为任何 n 创建 n 元组。如果 ,则结果为:a = range(1, 15)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]
如果列表是平均分配的,则可以替换为 ,否则三元组将丢失。上面使用了 Python 3。对于 Python 2,请使用 .zip_longest
zip
(13, 14, None)
izip_longest
上面的答案(由 koffein 提供)有一个小问题:列表总是被拆分为相同数量的拆分,而不是每个分区的相同数量的项目。这是我的版本。“// chs + 1” 考虑到项目数可能无法完全除以分区大小,因此最后一个分区将仅被部分填充。
# Given 'l' is your list
chs = 12 # Your chunksize
partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]
法典:
def split_list(the_list, chunk_size):
result_list = []
while the_list:
result_list.append(the_list[:chunk_size])
the_list = the_list[chunk_size:]
return result_list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print split_list(a_list, 3)
结果:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
评论
我想出了以下解决方案,而无需创建temorary列表对象,它应该适用于任何可迭代对象。请注意,Python 2.x 的此版本:
def chunked(iterable, size):
stop = []
it = iter(iterable)
def _next_chunk():
try:
for _ in xrange(size):
yield next(it)
except StopIteration:
stop.append(True)
return
while not stop:
yield _next_chunk()
for it in chunked(xrange(16), 4):
print list(it)
输出:
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[]
如您所见,如果 len(iterable) % size == 0,那么我们还有额外的空迭代器对象。但我不认为这是个大问题。
由于我必须做这样的事情,这是我给出的生成器和批量大小的解决方案:
def pop_n_elems_from_generator(g, n):
elems = []
try:
for idx in xrange(0, n):
elems.append(g.next())
return elems
except StopIteration:
return elems
在这一点上,我认为我们需要一个递归生成器,以防万一......
在 python 2 中:
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
在 python 3 中:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
此外,在外星人大规模入侵的情况下,装饰的递归生成器可能会派上用场:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
@dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
在这一点上,我认为我们需要强制性的匿名递归函数。
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
这是一行:
[AA[i:i+SS] for i in range(len(AA))[::SS]]
详。AA 是数组,SS 是块大小。例如:
>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
要扩展 py3 中的范围,请执行
(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
根据这个答案,得票最高的答案在最后留下一个“小字”。这是我的解决方案,可以真正获得尽可能均匀大小的块,没有小块。它基本上试图准确地选择它应该拆分列表的小数点,但只是将其四舍五入到最接近的整数:
from __future__ import division # not needed in Python 3
def n_even_chunks(l, n):
"""Yield n as even chunks as possible from l."""
last = 0
for i in range(1, n+1):
cur = int(round(i * (len(l) / n)))
yield l[last:cur]
last = cur
示范:
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71, 72],
[73, 74, 75, 76, 77, 78, 79, 80, 81],
[82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
与得票最高的答案进行比较:chunks
>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
[77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
评论
[[0, 1], [2], [3, 4]]
因为这里的每个人都在谈论迭代器。博尔顿
有完美的方法,称为iterutils.chunked_iter
。
from boltons import iterutils
list(iterutils.chunked_iter(list(range(50)), 11))
输出:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49]]
但是,如果你不想怜悯内存,你可以使用旧方法,首先用 iterutils.chunked
存储完整的内存。list
您可以使用 numpy 的 array_split 函数,例如,拆分成 20 个大小几乎相等的块。np.array_split(np.array(data), 20)
要确保块的大小完全相等,请使用 .np.split
我在下面有一个解决方案,它确实有效,但比该解决方案更重要的是对其他方法的一些评论。首先,一个好的解决方案不应该要求一个循环按顺序遍历子迭代器。如果我运行
g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)
最后一个命令的相应输出是
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
不
[]
正如这里大多数基于 itertools 的解决方案一样。这不仅仅是关于按顺序访问迭代器的通常无聊限制。想象一下,一个消费者试图清理输入不当的数据,这些数据颠倒了 5 块的适当顺序,即数据看起来像 [B5, A5, D5, C5],应该看起来像 [A5, B5, C5, D5](其中 A5 只是五个元素,而不是子列表)。这个消费者会查看分组函数的声明行为,并毫不犹豫地编写一个循环,例如
i = 0
out = []
for it in paged_iter(data,5)
if (i % 2 == 0):
swapped = it
else:
out += list(it)
out += list(swapped)
i = i + 1
如果你偷偷摸摸地假设子迭代器总是按顺序完全使用,这将产生神秘的错误结果。如果你想从块中交错元素,情况会变得更糟。
其次,相当多的建议解决方案隐含地依赖于迭代器具有确定性顺序的事实(例如它们没有设置),虽然一些使用 islice 的解决方案可能没问题,但这让我很担心。
第三,itertools 分组器方法有效,但配方依赖于 zip_longest(或 zip)函数的内部行为,这些行为不属于其发布行为的一部分。特别是,分组器函数仅在 zip_longest(i0...in) 下一个函数始终按 next(i0), next(i1), ...next(in) 重新开始之前。当 grouper 传递同一迭代器对象的 n 个副本时,它依赖于此行为。
最后,虽然下面的解决方案可以改进,但如果你做出上面批评的假设,即子迭代器是按顺序访问的,并且完全仔细阅读,而没有这个假设,那么必须隐式(通过调用链)或显式(通过deques或其他数据结构)将每个子迭代器的元素存储在某个地方。所以不要浪费时间(就像我一样)假设一个人可以通过一些聪明的技巧来解决这个问题。
def paged_iter(iterat, n):
itr = iter(iterat)
deq = None
try:
while(True):
deq = collections.deque(maxlen=n)
for q in range(n):
deq.append(next(itr))
yield (i for i in deq)
except StopIteration:
yield (i for i in deq)
您也可以将 utilspie
库get_chunks
函数用作:
>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]
您可以通过 pip 安装 utilspie
:
sudo pip install utilspie
免责声明:我是utilspie库的创建者。
以下是使用 itertools.groupby 的一个想法:
def chunks(l, n):
c = itertools.count()
return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))
这将返回生成器的生成器。如果您想要列表列表,只需将最后一行替换为
return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]
返回列表列表的示例:
>>> chunks('abcdefghij', 4)
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]
(所以是的,这属于“欠缺问题”,在特定情况下可能是也可能不是问题。
又一个解决方案
def make_chunks(data, chunk_size):
while data:
chunk, data = data[:chunk_size], data[chunk_size:]
yield chunk
>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
... print chunk
...
[1, 2]
[3, 4]
[5, 6]
[7]
>>>
这在 v2/v3 中有效,是内联的、基于生成器的,并且仅使用标准库:
import itertools
def split_groups(iter_in, group_size):
return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
评论
(list(x) for x in split_groups('abcdefghij', 4))
没有魔法,但简单而正确:
def chunks(iterable, n):
"""Yield successive n-sized chunks from iterable."""
values = []
for i, item in enumerate(iterable, 1):
values.append(item)
if i % n == 0:
yield values
values = []
if values:
yield values
我想我没有看到这个选项,所以只是添加另一个:)):
def chunks(iterable, chunk_size):
i = 0;
while i < len(iterable):
yield iterable[i:i+chunk_size]
i += chunk_size
我对不同方法的性能感到好奇,这里是:
在 Python 3.5.1 上测试
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
结果:
slice
31.18285083770752
index
0.02184295654296875
batches 1
0.03503894805908203
batches 2
0.22681021690368652
chunks
0.019841909408569336
grouper
0.006506919860839844
我不喜欢按块大小拆分元素的想法,例如脚本可以将 101 到 3 个块分解为 [50, 50, 1]。为了满足我的需要,我需要按比例拆分,并保持秩序不变。首先,我编写了自己的脚本,它运行良好,而且非常简单。但我后来看到了这个答案,如果脚本比我的好,我推荐它。 这是我的脚本:
def proportional_dividing(N, n):
"""
N - length of array (bigger number)
n - number of chunks (smaller number)
output - arr, containing N numbers, diveded roundly to n chunks
"""
arr = []
if N == 0:
return arr
elif n == 0:
arr.append(N)
return arr
r = N // n
for i in range(n-1):
arr.append(r)
arr.append(N-r*(n-1))
last_n = arr[-1]
# last number always will be r <= last_n < 2*r
# when last_n == r it's ok, but when last_n > r ...
if last_n > r:
# ... and if difference too big (bigger than 1), then
if abs(r-last_n) > 1:
#[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
# we need to give unnecessary numbers to first elements back
diff = last_n - r
for k in range(diff):
arr[k] += 1
arr[-1] = r
# and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
return arr
def split_items(items, chunks):
arr = proportional_dividing(len(items), chunks)
splitted = []
for chunk_size in arr:
splitted.append(items[:chunk_size])
items = items[chunk_size:]
print(splitted)
return splitted
items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)
并输出:
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]
不要重新发明轮子。
更新:在 Python 3.12+ itertools.batched
中可以找到完整的解决方案。
鉴于
import itertools as it
import collections as ct
import more_itertools as mit
iterable = range(11)
n = 3
法典
list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
详
在 Python 3.12 之前,建议采用以下非本机方法:
list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]
list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
(或DIY,如果你愿意的话)
标准库
list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
d.setdefault(i//n, []).append(x)
list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
dd[i//n].append(x)
list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
引用
more_itertools.chunked
(相关发布))more_itertools.sliced
more_itertools.grouper
(相关文章)more_itertools.windowed
(另见stagger
、zip_offset
)more_itertools.chunked_even
zip_longest
(相关文章,相关文章)setdefault
(有序结果需要 Python 3.6+)collections.defaultdict
(有序结果需要 Python 3.6+)
+实现 itertools、配方等的第三方库。> pip install more_itertools
++包含在 Python 标准库 3.12+ 中。batched
类似于 more_itertools.chunked
。
延迟加载版本
import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [range(10, 20), range(20, 30), range(30, 40), range(40, 50), range(50, 60), range(60, 70), range(70, 75)]
将此实现的结果与已接受答案的示例使用结果相结合。
上述许多函数都假定整个可迭代对象的长度是预先已知的,或者至少计算起来很便宜。
对于某些流对象,这意味着首先将完整数据加载到内存中(例如下载整个文件)以获取长度信息。
但是,如果您还不知道完整大小,则可以改用以下代码:
def chunks(iterable, size):
"""
Yield successive chunks from iterable, being `size` long.
https://stackoverflow.com/a/55776536/3423324
:param iterable: The object you want to split into pieces.
:param size: The size each of the resulting pieces should have.
"""
i = 0
while True:
sliced = iterable[i:i + size]
if len(sliced) == 0:
# to suppress stuff like `range(max, max)`.
break
# end if
yield sliced
if len(sliced) < size:
# our slice is not the full length, so we must have passed the end of the iterator
break
# end if
i += size # so we start the next chunk at the right place.
# end while
# end def
这之所以有效,是因为如果你传递了可迭代对象的末尾,slice 命令将返回较少/没有元素:
"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''
现在,我们使用切片的结果,并计算生成的块的长度。如果它小于我们的预期,我们知道我们可以结束迭代。
这样,除非访问,否则不会执行迭代器。
python 包可能是一个不错的选择。pydash
from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]
欲了解更多信息,请查看 Pydash Chunk 列表
评论
这个问题让我想起了 Raku(以前称为 Perl 6)方法。它将字符串分解为大小的块。(还有更多内容,但我会省略细节。.comb(n)
n
在 Python3 中实现一个类似的函数作为 lambda 表达式很容易:
comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))
那么你可以这样称呼它:
some_list = list(range(0, 20)) # creates a list of 20 elements
generator = comb(some_list, 4) # creates a generator that will generate lists of 4 elements
for sublist in generator:
print(sublist) # prints a sublist of four elements, as it's generated
当然,您不必将生成器分配给变量;你可以直接遍历它,就像这样:
for sublist in comb(some_list, 4):
print(sublist) # prints a sublist of four elements, as it's generated
作为奖励,此函数还对字符串进行操作:comb()
list( comb('catdogant', 3) ) # returns ['cat', 'dog', 'ant']
一种不需要迭代工具但仍适用于任意生成器的老式方法:
def chunks(g, n):
"""divide a generator 'g' into small chunks
Yields:
a chunk that has 'n' or less items
"""
n = max(1, n)
buff = []
for item in g:
buff.append(item)
if len(buff) == n:
yield buff
buff = []
if buff:
yield buff
使用 Python 3.8 中的赋值表达式,它变得非常不错:
import itertools
def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item
这适用于任意可迭代对象,而不仅仅是列表。
>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
更新
从 Python 3.12 开始,此确切实现以 itertools.batched 的形式提供
def main():
print(chunkify([1,2,3,4,5,6],2))
def chunkify(list, n):
chunks = []
for i in range(0, len(list), n):
chunks.append(list[i:i+n])
return chunks
main()
我认为这很简单,可以给你一个数组的块。
任何可迭代对象的通用分块器,它使用户可以选择如何在最后处理部分块。
在 Python 3 上测试。
chunker.py
from enum import Enum
class PartialChunkOptions(Enum):
INCLUDE = 0
EXCLUDE = 1
PAD = 2
ERROR = 3
class PartialChunkException(Exception):
pass
def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
"""
A chunker yielding n-element lists from an iterable, with various options
about what to do about a partial chunk at the end.
on_partial=PartialChunkOptions.INCLUDE (the default):
include the partial chunk as a short (<n) element list
on_partial=PartialChunkOptions.EXCLUDE
do not include the partial chunk
on_partial=PartialChunkOptions.PAD
pad to an n-element list
(also pass pad=<pad_value>, default None)
on_partial=PartialChunkOptions.ERROR
raise a RuntimeError if a partial chunk is encountered
"""
on_partial = PartialChunkOptions(on_partial)
iterator = iter(iterable)
while True:
vals = []
for i in range(n):
try:
vals.append(next(iterator))
except StopIteration:
if vals:
if on_partial == PartialChunkOptions.INCLUDE:
yield vals
elif on_partial == PartialChunkOptions.EXCLUDE:
pass
elif on_partial == PartialChunkOptions.PAD:
yield vals + [pad] * (n - len(vals))
elif on_partial == PartialChunkOptions.ERROR:
raise PartialChunkException
return
return
yield vals
test.py
import chunker
chunk_size = 3
for it in (range(100, 107),
range(100, 109)):
print("\nITERABLE TO CHUNK: {}".format(it))
print("CHUNK SIZE: {}".format(chunk_size))
for option in chunker.PartialChunkOptions.__members__.values():
print("\noption {} used".format(option))
try:
for chunk in chunker.chunker(it, chunk_size, on_partial=option):
print(chunk)
except chunker.PartialChunkException:
print("PartialChunkException was raised")
print("")
输出test.py
ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised
ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
抽象是
l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
outList.append(l[i-n:i])
print(outList)
这将打印:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
我创建了这两个花哨的单行代码,它们既高效又懒惰,输入和输出都是可迭代的,它们也不依赖于任何模块:
首先,单行代码是完全懒惰的,这意味着它返回迭代器生成迭代器(即生成的每个块都是迭代块元素的迭代器),如果块非常大或元素一个接一个地缓慢生成,并且应该在生成时立即可用,则此版本适用于这种情况:
chunk_iters = lambda it, n: ((e for i, g in enumerate(((f,), cit)) for j, e in zip(range((1, n - 1)[i]), g)) for cit in (iter(it),) for f in cit)
第二个单行返回生成列表的迭代器。一旦整个块的元素通过输入迭代器可用,或者到达最后一个块的最后一个元素,就会生成每个列表。如果输入元素生成速度快或所有元素立即可用,则应使用此版本。应该使用其他明智的第一个更懒惰的单行版本。
chunk_lists = lambda it, n: (l for l in ([],) for i, g in enumerate((it, ((),))) for e in g for l in (l[:len(l) % n] + [e][:1 - i],) if (len(l) % n == 0) != i)
此外,我还提供了第一个单行代码的多行版本,它返回迭代器,生成另一个迭代器(遍历每个块的元素):chunk_iters
def chunk_iters(it, n):
cit = iter(it)
def one_chunk(f):
yield f
for i, e in zip(range(n - 1), cit):
yield e
for f in cit:
yield one_chunk(f)
一个简单的解决方案
OP 请求“大小相等的块”。我将“相等尺寸”理解为“平衡”尺寸:如果不可能有相同尺寸(例如,23/5),我们正在寻找尺寸大致相同的物品组。
此处的输入是:
- 项目列表:(例如,23 个数字的列表)
input_list
- 拆分这些项目的组数:(例如)
n_groups
5
输入:
input_list = list(range(23))
n_groups = 5
连续元素组:
approx_sizes = len(input_list)/n_groups
groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)]
for i in range(n_groups)]
“每N个”元素组:
groups_leap = [input_list[i::n_groups]
for i in range(n_groups)]
结果
print(len(input_list))
print('Contiguous elements lists:')
print(groups_cont)
print('Leap every "N" items lists:')
print(groups_leap)
将输出:
23 Contiguous elements lists: [[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]] Leap every "N" items lists: [[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]
可以使用已接受答案中的生成器轻松完成此任务。我正在添加实现长度方法的类实现,这可能对某些人有用。我需要知道进度(使用 ),所以生成器应该返回块数。tqdm
class ChunksIterator(object):
def __init__(self, data, n):
self._data = data
self._l = len(data)
self._n = n
def __iter__(self):
for i in range(0, self._l, self._n):
yield self._data[i:i + self._n]
def __len__(self):
rem = 1 if self._l % self._n != 0 else 0
return self._l // self._n + rem
用法:
it = ChunksIterator([1,2,3,4,5,6,7,8,9], 2)
print(len(it))
for i in it:
print(i)
评论
__len__
divmod()
senderle回答的单行版本:
from itertools import islice
from functools import partial
seq = [1,2,3,4,5,6,7]
size = 3
result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ()))
assert result == [(1, 2, 3), (4, 5, 6), (7,)]
假设列表是lst
import math
# length of the list len(lst) is ln
# size of a chunk is size
for num in range ( math.ceil(ln/size) ):
start, end = num*size, min((num+1)*size, ln)
print(lst[start:end])
简单地用于生成类似的循环 zip 并返回剩余的元素(不能形成“整体”子列表)应该可以解决问题。zip()
lst
def chunkify(lst, n):
for tup in zip(*[iter(lst)]*n):
yield tup
rest = tuple(lst[len(lst)//n*n: ])
if rest:
yield rest
list(chunkify(range(7), 3)) # [(0, 1, 2), (3, 4, 5), (6,)]
从 Python 3.12 开始,在标准库中实现了执行相同操作的批处理方法。例如itertools
from itertools import batched
list(batched(range(7), 3)) # [(0, 1, 2), (3, 4, 5), (6,)]
这两种方法的内存效率至少与本页上执行相同操作的其他答案中的任何函数一样高(峰值内存使用量是批处理的大小),它们也是最快的方法。下表是对 1,000,000 个元素的列表进行分块的运行时表(第一列是块大小=3 时,第二列是块大小=910 时)。1
Chunk size 3 910
Functions
cottontail 20.1ms 7.5ms
it_batched 22.1ms 8.3ms
NedBatchelder 72.8ms 8.4ms
nirvana_msu 140.4ms 18.8ms
pylang1 173.7ms 19.0ms
senderle 184.6ms 15.7ms
单行版本(Python >=3.8):
list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])
1 用于生成表的代码。只考虑了以下函数,因为 @NedBatchelder、@oremj、@RianRizvi、@Mars 和 @atzz 的答案中定义的函数是相同的;@MarkusJarderot、@nirvana_msu 和 @RaymondHettinger 中的那些是相同的,因此每组只选择一个。在 Python 3.12.0 上测试。
from timeit import repeat
setup = """
import itertools
import more_itertools as mit
def cottontail(lst, n):
for tup in zip(*[iter(lst)]*n): tup
rest = tuple(lst[len(lst)//n*n: ])
if rest: rest
def it_batched(it, n):
for x in itertools.batched(it, n): x
def NedBatchelder(lst, n):
for i in range(0, len(lst), n): lst[i:i + n]
def pylang1(iterable, n):
for x in mit.chunked(iterable, n): x
def senderle(it, size):
it = iter(it)
for x in iter(lambda: tuple(itertools.islice(it, size)), ()): x
def nirvana_msu(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
item
lst = list(range(1_000_000))
"""
out = {}
for f in ("NedBatchelder", "pylang1", "senderle",
"nirvana_msu", "cottontail", "it_batched"):
for k in (3, 910):
tm = min(repeat(f"{f}(lst, {k})", setup, number=100))
out.setdefault(f, {})[k] = tm*10
out = dict(sorted(out.items(), key=lambda xy: xy[1][3]))
print(' Chunk size 3 910\nFunctions')
for func, val in out.items():
print("{:<15} {:>5.1f}ms {:>5.1f}ms".format(func, val[3], val[910]))
评论
您可以与 一起使用。可能是最容易推理的?more_itertools.chunked_even
math.ceil
from math import ceil
import more_itertools as mit
from pprint import pprint
pprint([*mit.chunked_even(range(19), ceil(19 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18]]
pprint([*mit.chunked_even(range(20), ceil(20 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]
pprint([*mit.chunked_even(range(21), ceil(21 / 5))])
# [[0, 1, 2, 3, 4],
# [5, 6, 7, 8],
# [9, 10, 11, 12],
# [13, 14, 15, 16],
# [17, 18, 19, 20]]
pprint([*mit.chunked_even(range(3), ceil(3 / 5))])
# [[0], [1], [2]]
评论
itertools 模块中的配方提供了两种方法来执行此操作,具体取决于您希望如何处理最终的奇数手(保留它、填充填充它、忽略它或引发异常):
from itertools import islice, izip_longest
def batched(iterable, n):
"Batch data into tuples of length n. The last batch may be shorter."
# batched('ABCDEFG', 3) --> ABC DEF G
it = iter(iterable)
while True:
batch = tuple(islice(it, n))
if not batch:
return
yield batch
def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')
评论
grouper()
batched()
要将列表拆分为大小相等的块,我们可以使用循环遍历列表,并在每次迭代时使用该函数提取列表的一部分。slice()
def chunkify(lst, size):
"""Split a list into equally-sized chunks."""
chunks = []
for i in range(0, len(lst), size):
chunks.append(lst[i:i+size])
return chunks
这里是要拆分的列表,是每个块的大小。
该函数用于生成一系列索引以对列表进行切片。该函数从索引到索引提取列表的一部分。lst
size
range()
slice()
i
i+size
评论
这是一个简短易读的答案(与之前的所有答案不同):
- 无套餐
- 当列表不能均匀地分割成块时有效
n
- 易于更换为发电机
import math
def chunk(lst, n):
chunk_size = math.ceil(len(lst) / n)
return [lst[i: min(i+chunk_size, len(lst))] for i in range(0, len(lst), chunk_size)]
例子:
chunk(lst=list(range(9)), n=3)
给[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
chunk(lst=list(range(10)), n=3)
给[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]
chunk(lst=list(range(10)), n=3)
给[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
def devideChunks(x, n):
newList = []
for i in range(0, len(x), n):
newList.append(x[i:i + n])
print(newList)
评论
在 Python 3.12 中,itertools.batched
现在原生支持此功能
from itertools import batched
flattened_data = ['roses', 'red', 'violets', 'blue', 'sugar']
unflattened = list(batched(flattened_data, 2))
assert unflattened == [('roses', 'red'), ('violets', 'blue'), ('sugar',)]
这完全是懒惰的 - 迭代器只消耗到足以填充当前块。
评论