在 CUDA Thrust 中并行化向后传播：重构神经网络实现-解网

问：

描述：我正在使用 CUDA Thrust 并行化神经网络实现，并在将向后传播功能集成到我的 Unit 类中时遇到问题，这是该实现的核心。代码在没有向后传播功能的情况下运行，但当包含时，生成失败，没有明确的错误指示。

当前实施：我有一个 Unit 类，它表示神经网络单元，包括 value、grad、reverse（用于向后传递的匿名函数）和子向量（用于跟踪计算图中的依赖关系）等字段。但是，将这些功能集成到 CUDA Thrust 环境中会导致构建失败。


class Unit {
public:
    std::function<void()> reverse;
    double value;
    mutable double grad;
    thrust::host_vector<Unit*> children;

    __host__ __device__ Unit(double val = 0.0, thrust::host_vector<Unit*> chi = {}) : reverse([]()
    {
    }), value(val), grad(0), children(std::move(chi)) {}

    __host__ __device__ Unit operator+(Unit& other)
    {
        Unit result(value + other.value, { this, &other });

        result.reverse = [this, &other, &result]() {
            for (const auto& child : children)
            {
                child->grad += result.grad;
            }
            };
        return result;
    }
    __host__ __device__ Unit operator*(const Unit& other) const 
    {
        Unit result(value * other.value, { this, &other });
        result.reverse = [this, &result, &other]() {
            this->grad += result.grad * other.value;
            other.grad += result.grad * this->value;
            };
        return result;
    }

// other operators ... 
    __host__ __device__ void backward(){
        this->grad = 1.0;
        std::queue<Unit*> q;
        q.push(this);
        while (!q.empty()) {
            Unit* current = q.front();
            q.pop();
            current->reverse();
            if (!current->children.empty()) {
                for (Unit* child : current->children) {
                    q.push(child);
                }
            }
        }
    }
};

面临的挑战：

将 Unit 类功能（尤其是子向量和匿名函数）合并到 CUDA Thrust 中。了解具有 CUDA Thrust 功能的自定义类的兼容性限制。探索潜在的重构或替代方法，以便在不牺牲功能的情况下实现高效的并行化。所做的努力：我试图通过将 std：：vector 替换为 thrust：：host_vector 和 thrust：:d evice_vector 来修改 Unit 类。但是，这种方法没有解决构建失败，我不确定接下来的步骤。

寻求以下方面的指导：

重构 Unit 类的建议或更符合 CUDA Thrust 对并行处理要求的替代方法。

其他信息：

我已经成功地实现了一个用于基本矩阵运算的 Tensor 类，该类在 CUDA Thrust 环境中独立工作。但是，集成 Unit 类的复杂功能似乎会导致兼容性问题。

我感谢任何见解、建议或指导，以解决这些集成问题，并使用 CUDA Thrust 优化我的神经网络实现以并行执行。谢谢！

C++ 神经网络 CUDA 推力自动微分

您的设计似乎以面向对象的编程范式为中心，而这种范式通常很难移植到 GPU 上。即使你让它工作，它也可能会给你带来糟糕的性能。您可能希望研究面向数据的编程，更具体地说，研究 Thrust 示例，以了解什么是可能的以及它是如何工作的。

答： 暂无答案

上一个：Kaggle 上的快速 Ai 错误和问题

下一个：如何在图特征编码中使用PyTorch帧编码

在 CUDA Thrust 中并行化向后传播：重构神经网络实现

Parallelizing Backward Propagation in CUDA Thrust: Refactoring a Neural Network Implementation

评论