提问人:Daniel Rohrbach 提问时间:7/3/2023 最后编辑:Robert CrovellaDaniel Rohrbach 更新时间:7/11/2023 访问量:111
Arrayfire C++ 稀疏矩阵乘法导致访问冲突
Arrayfire C++ sparse matrix multiplication causes access violation
问:
我正在尝试在 arrayfire 中使用稀疏矩阵,但我在 array fire dll 中的某个地方遇到了访问冲突。我用了几个例子,但总是得到相同的结果。我可以在 CUDA 和 CPU 后端看到错误。
法典:
这是我尝试运行的代码
af::info();
af_print(af::randu(5, 4));
float v[] = {5, 8, 3, 6};
int r[] = {0, 0, 2, 3, 4};
int c[] = {0, 1, 2, 1};
const int M = 4, N = 4, nnz = 4;
af::array vals = af::array(af::dim4(nnz), v);
af::array row_ptr = af::array(af::dim4(M + 1), r);
af::array col_idx = af::array(af::dim4(nnz), c);
af_print(vals);
af_print(row_ptr);
af_print(col_idx);
// Create sparse array (CSR) from af::arrays containing values,
// row pointers, and column indices.
auto sparseM = af::sparse(M, N, vals, row_ptr, col_idx, AF_STORAGE_CSR);
af_print(sparseM);
auto res = sparseM * sparseM;
af_print(res);
我的配置:
在 Windows 11 上运行。在 VSCode 中使用 cmake 使用 Visual Studio Build 工具 2022 进行编译。(有关系统的详细信息可以在下面的 ArrayFire 输出中找到)。我的系统有两个 GPU,NVIDIA 和一个 Intel
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(217) ] Found 5 OpenCL platforms
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform NVIDIA CUDA
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device NVIDIA GeForce GTX 1050 Ti with Max-Q Design on platform NVIDIA CUDA
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) OpenCL HD Graphics
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) UHD Graphics 630 on platform Intel(R) OpenCL HD Graphics
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) OpenCL
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz on platform Intel(R) OpenCL
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) FPGA Emulation Platform for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) FPGA Emulation Device on platform Intel(R) FPGA Emulation Platform for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 0 devices on platform Intel(R) FPGA SDK for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(239) ] Found 4 OpenCL devices
它将在倒数第二行失败。当与另一个非稀疏矩阵相乘时,这也将失败。auto res = sparseM * sparseM;
这是上面的输出:
[unified][1688317011][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(146) ] Found: afcuda.dll
Loaded 'C:\Windows\System32\DriverStore\FileRepository\nvdmi.inf_amd64_893ed8ff453738db\nvcuda64.dll'.
[unified][1688317012][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(153) ] Device Count: 1.
[unified][1688317012][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(208) ] AF_DEFAULT_BACKEND: cuda
[platform][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(101) ] Attempting to load: forge.dll
[platform][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[mem][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DefaultMemoryManager.cpp(128) ] memory[0].max_bytes: 14.7 GB
ArrayFire v3.8.3 (CPU, 64-bit Windows, build 987d5675a)
[0] Intel: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz[mem][1688317044][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cpu\memory.cpp(147) ] nativeAlloc: 1 KB 0x2182dabc380
2
3
4
col_idx
[4 1 1 1]
0
1
2
1
sparseM
Storage Format : AF_STORAGE_CSR
[4 4 1 1]
sparseM: Values
[4 1 1 1]
5.0000
8.0000
3.0000
6.0000
sparseM: RowIdx
[5 1 1 1]
2
0
2
3
4
sparseM: ColIdx
[4 1 1 1]
0
1
2
1
2
Exception thrown at 0x00007FFEF118ADCC (af.dll) in EigenSim.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
在查看调试堆栈跟踪时,我得到:
af.dll!af_get_last_error(char * * str, __int64 * len) Line 46 (f:\buildbot\worker\win10-cuda-installer\build\src\api\unified\error.cpp:46)
af.dll!af::operator*(const af::array & lhs, const af::array & rhs) Line 939 (f:\buildbot\worker\win10-cuda-installer\build\src\api\cpp\array.cpp:939)
EigenSim.exe!testBackend() Line 27 (c:\Development\Eigensim\src\main\RunTestEigenSim.cpp:27)
EigenSim.exe!main() Line 37 (c:\Development\Eigensim\src\main\RunTestEigenSim.cpp:37)
EigenSim.exe!invoke_main() Line 79
这是使用 CUDA 后端时的输出
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(146) ] Found: afcuda.dll
Loaded 'C:\Windows\System32\DriverStore\FileRepository\nvdmi.inf_amd64_893ed8ff453738db\nvcuda64.dll'.
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(153) ] Device Count: 1.
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(208) ] AF_DEFAULT_BACKEND: cuda
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(101) ] Attempting to load: forge.dll
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(494) ] CUDA Driver supports up to CUDA 12.2.0 ArrayFire CUDA Runtime 12.0.0
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(479) ] CUDA driver version(12.2.0) not part of the CudaToDriverVersion array. Please create an issue or a pull request on the ArrayFire repository to update the CudaToDriverVersion variable with this version of the CUDA runtime.
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(562) ] Found 1 CUDA devices
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(590) ] Found device: NVIDIA GeForce GTX 1050 Ti with Max-Q Design (sm_61) (4 GB | ~2076.416015625 GFLOPs | 6 SMs)
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(625) ] AF_CUDA_DEFAULT_DEVICE:
[platform][1688317722][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(644) ] Default device: 0(NVIDIA GeForce GTX 1050 Ti with Max-Q Design)
ArrayFire v3.8.3 (CUDA, 64-bit Windows, build 987d5675a)
Platform: CUDA Runtime 12.0, Driver: 12020
[0] NVIDIA GeForce GTX 1050 Ti with Max-Q Design, 4096 MB, CUDA Compute 6.1
[mem][1688317723][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DefaultMemoryManager.cpp(128) ] memory[0].max_bytes: 3 GB
我尝试过的事情
我尝试了所有后端 CUDA、GPU 和 OpenCL,但没有得到任何好的结果。对稀疏矩阵使用不同的存储格式将产生略微不同的效果。有时在创建稀疏矩阵时会失败,有时在对稀疏矩阵执行操作时会失败。
答:
3赞
Edwin Solis
7/11/2023
#1
从你的措辞来看,我假设你的意图是执行矩阵-矩阵乘法。
如果是这种情况,则应使用函数完成操作的代码。对于稀疏矩阵,该操作仅适用于稀疏矩阵作为第一个参数,而将密集矩阵作为第二个参数。您可以在 ArrayFire 文档中了解更多信息。这意味着您尝试实现的目标目前是不可能的。你必须使第二个成为密集矩阵;像这样的东西:af::matmul
auto res = af::matmul(sparseM, af::dense(sparseM));
您的代码出现错误,因为它执行的是按元素乘法,而不是矩阵乘法,并且稀疏矩阵不支持按元素乘法操作。
评论
0赞
Daniel Rohrbach
7/12/2023
非常感谢您回复我。你说的有道理。事实上,你是对的,我想要一个带有密集向量的 matmul,我让它真正起作用了。我认为更大的问题实际上是,我没有收到来自 arrayfire 的错误消息,或者后端它只是因访问违规而崩溃。我能够使用其他显然不允许的条件重现它,但会导致 arrayfire 崩溃并在我的配置上出现访问冲突。
1赞
Edwin Solis
7/12/2023
为避免崩溃,应正确处理异常。当 ArrayFire 出现错误时,它将引发异常。使用块来捕获错误并避免崩溃。c++ try {} catch(const af::exception&) {}
2赞
arrayfire
7/20/2023
GitHub 中正在跟踪同样的问题,此处为:github.com/arrayfire/arrayfire/issues/3460
评论