提问人:Ferdi 提问时间:10/29/2023 最后编辑:tripleeeFerdi 更新时间:10/30/2023 访问量:108
如何从数组/列表中切片字节以获取位大小的条目?
How to slice bytes from an array/list to get bit-sized entries instead?
问:
有非常长的数组,其条目大小为字节。我想使每个字节的每个位都成为自己的数组条目。
我想做的一个例子是操作以下数组:
[0b00001111, 0b01010101, 0b11110000]
要获取此数组,请执行以下操作:
[0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
我知道可以通过循环来做到这一点,但我想先看看是否有更快的方法(时间复杂度更低)。
答:
我认为你所说的使用循环的意思是这样的:
l = [0b00001111, 0b01010101, 0b11110000]
l_expanded = []
for n in l:
l_expanded.extend(format(n, "08b"))
l_expanded = [int(n) for n in l_expanded]
print(l_expanded)
输出:
[0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
你可以像这样在一行中做到这一点:
l = [0b00001111, 0b01010101, 0b11110000]
l = list(map(int, (b for num in l for b in format(num, "08b"))))
print(l)
但是,如果速度是你最关心的问题,你可能应该使用 numpy:
import numpy as np
arr = np.array([0b00001111, 0b01010101, 0b11110000], dtype=np.uint8)
arr = np.unpackbits(arr)
print(arr)
输出:
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0], dtype=uint8)
您可能还想重塑 ndarray 的形状,以便每个数字都有自己的行:
import numpy as np
arr = np.array([0b00001111, 0b01010101, 0b11110000], dtype=np.uint8)
arr = np.unpackbits(arr).reshape(-1, 8)
print(arr)
输出:
array([[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 1, 1, 1, 0, 0, 0, 0]], dtype=uint8)
评论
map(int, iterable)
map()
l = list(map(int, (b for num in l for b in format(num, "08b"))))
具有 10^5 个随机字节的基准测试:
3.89 ± 0.08 ms bits_Kelly_juanpa
5.84 ± 0.06 ms bits_Kelly_4
6.24 ± 0.16 ms bits_Kelly_3
6.44 ± 0.05 ms bits_Kelly_2
6.64 ± 0.06 ms bits_Kelly_1
177.09 ± 1.33 ms bits_sbottingota_1
177.28 ± 1.83 ms bits_sbottingota_3
183.16 ± 2.39 ms bits_sbottingota_2
Python: 3.11.4 (main, Sep 9 2023, 15:09:21) [GCC 13.2.1 20230801]
基准测试代码:
from timeit import timeit
from statistics import mean, stdev
from random import shuffle
import numpy as np
import os
import sys
def bits_Kelly_1(a):
bitss = [[*map(int, f'{i:08b}')] for i in range(256)]
bits = []
for byte in a:
bits += bitss[byte]
return bits
def bits_Kelly_2(a):
bitss = [[]]
for _ in range(8):
bitss = [bits + [bit] for bits in bitss for bit in (0, 1)]
bits = []
for byte in a:
bits += bitss[byte]
return bits
bitss = [[]]
for _ in range(8):
bitss = [bits + [bit] for bits in bitss for bit in (0, 1)]
def bits_Kelly_3(a):
bitss_ = bitss
bits = []
for byte in a:
bits += bitss_[byte]
return bits
def bits_Kelly_4(a):
i = int.from_bytes(bytes(a))
s = f'{i:0{len(a)*8}b}'
s = s.translate(s.maketrans('01', '\0\1'))
return list(s.encode())
# Using unpackbits() from juanpa's answer
def bits_Kelly_juanpa(a):
a = np.frombuffer(bytes(a), dtype=np.uint8)
return list(np.unpackbits(a).tobytes())
def bits_sbottingota_1(l):
l_expanded = []
for n in l:
l_expanded.extend(format(n, "08b"))
return [int(n) for n in l_expanded]
def bits_sbottingota_2(l):
l = map(lambda n: format(n, "08b"), l)
return [int(item) for sublist in l for item in sublist]
def bits_sbottingota_3(l):
return list(map(int, (b for num in l for b in format(num, "08b"))))
funcs = [bits_Kelly_1, bits_Kelly_2, bits_Kelly_3, bits_Kelly_4, bits_Kelly_juanpa, bits_sbottingota_1, bits_sbottingota_2, bits_sbottingota_3]
expect = [0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]
for f in funcs:
result = f([0b00001111, 0b01010101, 0b11110000])
print(result == expect or result, f.__name__)
a = os.urandom(10**5)
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ms '
for _ in range(50):
shuffle(funcs)
for f in funcs:
t = timeit(lambda: f(a), number=1)
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print('\nPython:', sys.version)
使用 Kelly 的程序,我添加了一些 sbottingota 的变体
def bits_sbottingota_4(l):
table = [format(num, "08b") for num in range(256)]
return list(map(int, (b for num in l for b in table[num])))
def bits_sbottingota_5(l):
table = [list(map(int, format(num, "08b"))) for num in range(256)]
result = []
for b in l:
result.extend(table[b])
return result
def bits_sbottingota_5f(l):
table = [list(map(int, f'{num:08b}')) for num in range(256)]
result = []
for b in l:
result.extend(table[b])
return result
bitstable = [list(map(int, format(num, "08b"))) for num in range(256)]
def bits_sbottingota_6(l):
result = []
for b in l:
result.extend(bitstable[b])
return result
def bits_sbottingota_7(l):
result = []
for b in l:
result += bitstable[b]
return result
def bits_juanpa_1(l):
return [b for b in np.unpackbits(np.array(list(l), dtype=np.uint8))]
def bits_juanpa_2(l):
return list(np.unpackbits(np.array(list(l), dtype=np.uint8)).tobytes())
我机器上的结果(比凯利的慢一点)
快速版本慢 4 倍,但慢速版本与凯利的差不多快
12.91 ± 0.19 ms bits_juanpa_2
23.30 ± 0.35 ms bits_Kelly_3
23.59 ± 0.33 ms bits_Kelly_2
23.80 ± 0.20 ms bits_Kelly_1
24.38 ± 0.12 ms bits_sbottingota_7
25.50 ± 0.34 ms bits_sbottingota_5f
26.23 ± 0.10 ms bits_sbottingota_5
26.35 ± 0.31 ms bits_sbottingota_6
75.57 ± 0.43 ms bits_juanpa_1
142.38 ± 0.75 ms bits_sbottingota_4
170.92 ± 0.59 ms bits_sbottingota_2
172.71 ± 0.62 ms bits_sbottingota_3
184.91 ± 0.98 ms bits_sbottingota_1
花费大部分时间的不是从数字到位()的转换,而是对列表的操作。format()
使用 Kelly 的方法,查找表的构造更加清晰。list(map(int, format(num, "08b")))
f 字符串比 快。format()
我尝试过是否为查找表创建本地引用会使事情变得更快,但它在几次运行中有所不同,因此很可能可以忽略不计,就像局部变量名称的长度一样。Python 已经知道这是全局的,因为它使用指令。它不会首先在当地人中寻找。全局字典比本地字典大得多,在名称冲突的情况下,这可能会对查找产生很小的影响。bitstable
LOAD_GLOBAL
我很惊讶它比(见拆卸)更快list::+=
list::extend
对于大小为 100'000 的数组,是否预先计算为全局变量并不重要bitstable
numpy版本(juanpa)是最快的,如果你知道numpy(偷看Kelly的代码:to_bytes()
)
字节码 :result.extend(bitstable[b])
10 12 LOAD_FAST 1 (result)
14 LOAD_METHOD 0 (extend)
16 LOAD_GLOBAL 1 (bitstable)
18 LOAD_FAST 2 (b)
20 BINARY_SUBSCR
22 CALL_METHOD 1
24 POP_TOP
字节码 :result += bitstable[b]
16 12 LOAD_FAST 1 (result)
14 LOAD_GLOBAL 0 (bitstable)
16 LOAD_FAST 2 (b)
18 BINARY_SUBSCR
20 INPLACE_ADD
22 STORE_FAST 1 (result)
看起来列表比 快,或者是指令数 3 与 2 相比。INPLACE_ADD
CALL_METHOD (extend)
评论
+=
extend
+=
所以,我只想指出,如果性能在这里真的很重要,那么使用numpy可能是最好的服务。您应该从一个无符号的 8 位整数数组开始:
>>> import numpy as np
>>> arr = np.array([0b00001111, 0b01010101, 0b11110000], dtype=np.uint8)
然后,您可以执行以下操作:
>>> np.unpackbits(arr)
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0], dtype=uint8)
艺术
>>> np.unpackbits(arr).reshape(-1, 8)
array([[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 1, 1, 1, 0, 0, 0, 0]], dtype=uint8)
或者也许更有效:
>>> np.unpackbits(arr[..., None], axis=1)
array([[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 1, 1, 1, 0, 0, 0, 0]], dtype=uint8)
评论