如何对浮点值使用火炬量化，将位数从 FP64 减少到 8 位？-解网

问：

我正在尝试使用 torch.quantization 模块（量化链接）中的量化库来降低浮点值的精度。我的示例数组是用numpy编写的，这是我的代码：

import numpy as np
import torch
import torch.quantization
dtype = torch.qint8
test1 = np.array([0.23999573, 0.04214323, 0.03814219, 0.13811627, 0.5416026])
print(test1)
print(test1.dtype)

t = torch.from_numpy(test1)
print(t)
print(t.dtype)
t.to(dtype=dtype)

我首先将向量转换为火炬张量，然后尝试导入所有量化库并将精度更改为 torch.qint8。我收到的错误：

[0.23999573 0.04214323 0.03814219 0.13811627 0.5416026 ]
float64
tensor([0.2400, 0.0421, 0.0381, 0.1381, 0.5416], dtype=torch.float64)
torch.float64

    ---------------------------------------------------------------------------
    
    RuntimeError                              Traceback (most recent call last)
    
    <ipython-input-17-7ac6609713a1> in <cell line: 22>()
         20 print(t)
         21 print(t.dtype)
    ---> 22 t.to(dtype=dtype)
    
    RuntimeError: empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540

我的默认数组是用numpy编写的，标准代码是用tensorflow编写的。我将数组转换为火炬，因为我没有找到任何可以转换为较低精度并节省存储空间的量化库。

我在想是否有一个库可以实现较低精度的数据类型，例如 Float16 或 numpy 的 8 位，我可以使用。

关于如何解决这个问题的任何想法？

额外的问题：在许多研究论文中，我观察到神经网络上的位数可以减少到 8 位甚至更低，直到 2 位。有没有办法我们可以采用现有的向量并将每个浮点数的位数（对数据应用量化）减少到较低的位，例如 4 位、2 位等。

参考论文：Quatization paper（论文第 20 页）

python numpy pytorch 精度量化

如何对浮点值使用火炬量化，将位数从 FP64 减少到 8 位？

How to use torch quantization on the float values to reduce the number of bits from FP64 to 8 bits?

评论