当使用与处理器关联的不同工作负载时，MPI 会在发送后阻止执行-解网

问：

我在使用 MPI 代码时遇到了一些问题（由我编写，用于测试另一个程序，其中不同的工作负载与不同的处理器相关联）。问题是，当我使用与 1 或 arraySize（在本例中为 4）不同数量的处理器时，程序在MPI_Send期间被阻塞，特别是当我运行程序时，程序在调用期间被阻塞。我现在没有使用任何调试器，我只是想了解为什么它适用于 1 个和 4 个处理器，但它不适用于 2 个处理器（每个处理器在阵列中有 2 个位置），代码如下：mpirun -np 2 MPItest

#include <mpi.h>
#include <iostream>

int main(int argc, char** argv) {
    int rank, size;
    const int arraySize = 4;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // every processor have a different workload (1 or more spots on the array to send to the other processors)
    // every processor sends to every other processor its designated spots


    int* sendbuf = new int[arraySize];
    int* recvbuf = new int[arraySize];

    int istart = arraySize/size * rank;
    int istop = (rank == size) ? arraySize : istart + arraySize/size;

    for (int i = istart; i < istop; i++) {
        sendbuf[i] = i;
    }

    std::cout << "Rank " << rank << " sendbuf :" << std::endl;
    //print the sendbuf before receiving its other values
    for (int i = 0; i < arraySize; i++) {
        std::cout << sendbuf[i] << ", ";
    }
    std::cout << std::endl;

    // sending designated spots of sendbuf to other processors
    for(int i = istart; i < istop; i++){
        for(int j = 0; j < size; j++){
            MPI_Send(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD);
        }
    }

    // receiving the full array
    for(int i = 0; i < arraySize ; i++){
        int recvRank = i/(arraySize/size);
        MPI_Recv(&recvbuf[i], 1, MPI_INT, recvRank, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    }


    // print the recvbuf after receiving its other values
    std::cout << "Rank " << rank << " recvbuf :" << std::endl;
    for (int i = 0; i < arraySize; i++) {
        std::cout << recvbuf[i] << ", ";
    }
    std::cout << std::endl;

    delete[] sendbuf;
    delete[] recvbuf;

    MPI_Finalize();
    return 0;
}

我正在使用标签来区分数组中的不同点（也许这就是问题所在？

我尝试使用不同数量的处理器，使用 1 个处理器程序可以工作，使用 4 个处理器也可以使用程序，使用 3 个处理器它会崩溃，使用 2 个处理器程序会被阻止。我也尝试使用MPI_Isend但它也不起作用（标志为 0），带有 MPI_Isend 的修改代码如下：

#include <mpi.h>
#include <iostream>

int main(int argc, char** argv) {
    int rank, size;
    const int arraySize = 4;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // every processor have a different workload (1 or more spots on the array to send to the other processors)
    // every processor sends to every other processor its designated spots


    int* sendbuf = new int[arraySize];
    int* recvbuf = new int[arraySize];

    int istart = arraySize/size * rank;
    int istop = (rank == size) ? arraySize : istart + arraySize/size;

    for (int i = istart; i < istop; i++) {
        sendbuf[i] = i;
    }

    std::cout << "Rank " << rank << " sendbuf :" << std::endl;
    //print the sendbuf before receiving its other values
    for (int i = 0; i < arraySize; i++) {
        std::cout << sendbuf[i] << ", ";
    }
    std::cout << std::endl;

    // sending designated spots of sendbuf to other processors
    for(int i = istart; i < istop; i++){
        for(int j = 0; j < size; j++){
            MPI_Request request;
            //MPI_Send(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD);
            MPI_Isend(&sendbuf[i], 1, MPI_INT, j, i, MPI_COMM_WORLD, &request);
            // control if the send is completed
            int flag = 0;
            MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
            const int numberOfRetries = 10;
            if(flag == 0){ // operation not completed
                std::cerr << "Error in sending, waiting" << std::endl;
                for(int k = 0; k < numberOfRetries; k++){
                    MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
                    if(flag == 1){
                        break;
                    }
                }
                if(flag == 0){
                    std::cerr << "Error in sending, aborting" << std::endl;
                    MPI_Abort(MPI_COMM_WORLD, 1);
                }
                
            }
        }
    }

    // receiving the full array
    for(int i = 0; i < arraySize ; i++){
        int recvRank = i/(arraySize/size);
        MPI_Recv(&recvbuf[i], 1, MPI_INT, recvRank, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    }


    // print the recvbuf after receiving its other values
    std::cout << "Rank " << rank << " recvbuf :" << std::endl;
    for (int i = 0; i < arraySize; i++) {
        std::cout << recvbuf[i] << ", ";
    }
    std::cout << std::endl;

  
    //MPI_Alltoall(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, MPI_COMM_WORLD);

    delete[] sendbuf;
    delete[] recvbuf;

    MPI_Finalize();
    return 0;
}

使用此代码，-np 4 也不起作用

C++ MPI 工作负载

@VictorEijkhout 也许我误解了 MPI 的工作原理或我需要如何发送消息，但是如果我向具有不同标签的单个目标发送多条消息（在调用 MPI_Irecv 之后，因为 stackoverflow 上的其他答案说在调用发送之前调用异步 Irecv），程序将阻止。我编写了另一个程序来测试两个进程相互发送双精度，当我相互调用单个MPI_Send时没有问题，但是当我调用两个或多个连续MPI_Send到同一进程但具有不同标签时，程序会阻止。我做错了什么吗？

0赞 Davis Herring 11/18/2023

MPI_Send可以阻挡;你必须使用或者如果你想做所有的发送，然后所有的接收。MPI_BsendMPI_Isend

答：

0赞 josura 11/18/2023 #1

由于我还没有收到任何问题的答案，我想添加一些关于我的问题的见解，以帮助一些人，如果他们发现自己处于相同的情况。

我测试了另一个代码，看看我的笔记本电脑上的 OpenMPI 标准是否运行良好，因为有太多问题对标准来说没有错，甚至互联网上的代码示例在我的笔记本电脑上不起作用。我测试了以下代码，这是一个非常简单的代码，用于在两个进程之间发送数组的一部分：

#include <mpi.h>
#include <iostream>

int main(int argc, char** argv) {
    int rank, size;
    const int arraySize = 5;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // initialize sendbuf
    int* sendbuf = new int[arraySize];
    for(int iteration = 0; iteration < 3; iteration++){

        if(rank){
            std::cout << "Rank " << rank << " sendbuf :" << std::endl;
            for (int i = 0; i < arraySize; i++) {
                std::cout << sendbuf[i] << ", ";
            }
            std::cout << std::endl;
        }

        // first process send first three elements to second process
        if(rank == 0){
            for(int i = 0; i < 3; i++){
                sendbuf[i] = i;
            }
            MPI_Send(&sendbuf[0], 3, MPI_INT, 1, 0, MPI_COMM_WORLD);
        } else {
            for(int i = 3; i < 5; i++){
                sendbuf[i] = i;
            }
        }

        // receive the full array with MPI_Wait
        if(rank){
            // second process receive the first three elements from first process
            MPI_Recv(&sendbuf[0], 3, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        }

        // print the full array
        if(rank){
            std::cout << "Rank " << rank << " sendbuf after:" << std::endl;
            for (int i = 0; i < arraySize; i++) {
                std::cout << sendbuf[i] << ", ";
            }
            std::cout << std::endl;
        }

        // reset MPI requests and buffers
        for(int i = 0; i < arraySize; i++){
            sendbuf[i] = -1;
        }
        
    }

    MPI_Finalize();


}

我想看看单次发送和单次接收是否会在我的笔记本电脑上循环工作，令我惊讶的是（经过两天的尝试），这是我的笔记本电脑和 OpenMPI 实现的问题。我在我拥有的集群上测试了此代码，并在 MPI 实现的工作位置上测试了此代码，以查看它是否是我的硬件问题。该代码在群集上有效，但不适用于我的笔记本电脑。

总而言之，这是我拥有的硬件：

内核：6.6.1-arch1-1
拱门： x86_64
位数：64
编译器： gcc
型号： Lenovo Legion 7 16IAX7
处理器： 12th Gen Intel（R） Core（TM） i7-12800HX
OpenMPI 版本：4.1.5-5

这不是一个解决方案，但回答了我为什么代码不起作用的问题。

正如 @GillesGouaillardet 所指出的，这似乎是与 mpirun 一起使用的默认网络接口的问题，指定没有防火墙规则的网络接口似乎是解决方案。

@GillesGouaillardet嗨，我尝试运行，但它甚至没有运行（给我一个错误，说 mca 参数 pml 被指定了不止一次），尝试的正确语法是什么？mpirun --mca pml ob1,sm,self -np 2 a.outmpirun --mca pml sm,self -np 2 a.outmpirun --mca pml ob1 --mca pml sm,self -np 2 a.out

0赞 Gilles Gouaillardet 11/18/2023

我的坏，试试，如果它不起作用，试试mpirun --mca pml ob1 --mca btl sm,self -np 2 a.outmpirun --mca pml ob1 --mca btl tcp,self -np 2 a.out

0赞 josura 11/18/2023

@GillesGouaillardet 好的，我试过了，但它没有运行，而在第二次迭代时启动但阻塞（与我在没有其他映射参数的情况下运行时的行为相同），所以还没有运气mpirun --mca pml ob1 --mca btl sm,self -np 2 a.outmpirun --mca pml ob1 --mca btl tcp,self -np 2 a.outmpirun -np 2 a.out

1赞 josura 11/19/2023

好！现在它似乎适用于 .所以看来是网络接口的问题，我会根据这些信息修改答案，@GillesGouaillardet非常感谢，如果出现其他情况，我会及时通知您！此外，如果您想写另一个答案，我可以接受它作为解决方案（如果您想提供有关该问题的更多信息，因为我对 MPI 在后台的工作原理知之甚少）。再次感谢！mpirun --mca pml ob1 --mca btl tcp,self --mca btl_tcp_if_include <interface> -np 2 a.out

上一个：使用 MPI 对未按预期工作的矩阵进行逐列分解

下一个：尝试发送 2D 向量的列时MPI_Scatterv分段错误

当使用与处理器关联的不同工作负载时，MPI 会在发送后阻止执行

MPI blocks execution after send when different workloads associated to a processor are used

评论

评论