提问人:phinz 提问时间:4/25/2022 最后编辑:phinz 更新时间:4/26/2022 访问量:185
arrayfire 中的多线程 fft 卷积
Multithreaded fft convolution in arrayfire
问:
我正在尝试在多个 CPU 线程上并行化 arrayfire 中的 fft 卷积:
#include <arrayfire.h>
#include <iostream>
#include <omp.h>
using namespace af;
void printarray(const std::vector<float>& f, size_t N=10)
{
const size_t bound=std::min(N,f.size());
for (size_t i=0; i< bound; ++i){
using namespace std;
cout<<f[i];
if (i+1<bound) cout<<", ";
else cout<<endl;
}
}
using namespace std;
int main() {
std::vector<float> vec{2.0,1.0};
cout<<"vec: "<<endl;
printarray(vec);
std::vector<float> kernel(10000000,5.0);
cout<<"kernel: "<<endl;
printarray(kernel);
try {
#pragma omp parallel
{
#pragma omp master
{
cout<<"Threads: "<<omp_get_num_threads()<<endl;
}
af::array af_in(vec.size(), vec.data());
af::array af_kernel(kernel.size(), kernel.data());
af::array tmp = af::fftConvolve(af_in, af_kernel, AF_CONV_EXPAND);
std::vector<float> out;
float *h = tmp.host<float>();
size_t entries = tmp.bytes() / sizeof(float);
for (size_t i = 0; i < entries; ++i) {
out.push_back(h[i]);
}
af::freeHost(h);
int thr_num=omp_get_thread_num();
cout<<"Thread "<<thr_num<<" finished"<<endl;
}
} catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what());
throw;
}
return 0;
}
这个最小的示例程序可以使用 进行编译。但不知何故,它只按顺序运行卷积,而不是并行运行。例如,可以通过在程序运行时观察 CPU 负载或测量运行时来检查这一点:example.cpp
g++ example.cpp -lafcpu -fopenmp
$ time OMP_NUM_THREADS=1 ./a.out
vec:
2, 1
kernel:
5, 5, 5, 5, 5, 5, 5, 5, 5, 5
Threads: 1
Thread 0 finished
real 0m1,745s
user 0m1,654s
sys 0m0,069s
$ time OMP_NUM_THREADS=8 ./a.out
vec:
2, 1
kernel:
5, 5, 5, 5, 5, 5, 5, 5, 5, 5
Threads: 8
Thread 2 finished
Thread 5 finished
Thread 6 finished
Thread 1 finished
Thread 0 finished
Thread 3 finished
Thread 7 finished
Thread 4 finished
real 0m11,944s
user 0m14,552s
sys 0m0,544s
我想函数内部一定有一些锁定机制,尽管我什至在各个线程中构造了单独的变量,但它可以防止并行执行。af::fftConvolve
af::array
如何在 CPU 上并行化这些卷积?af::fftConvolve
答: 暂无答案
评论
af::fftConvolve
libarrayfire-cpu-dev