提问人:Ajay 提问时间:2/13/2023 更新时间:2/13/2023 访问量:234
MPIRUN仅使用一个内核串行运行作业
mpirun running job serially with only one core
问:
我已经使用 GNU 编译器在 ubuntu 机器上安装了 mpich4.1。一开始,我在“36”个内核上成功运行了一个作业,但现在当我尝试运行相同的作业时,它仅使用一个内核串行运行。现在命令输出是mpirun
mpirun -np 36 ./wrf.exe
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1
starting wrf task 0 of 1
给出错误mpivars
Abort(470406415): Fatal error in internal_Init_thread: Other MPI error, error stack:
internal_Init_thread(67): MPI_Init_thread(argc=0x7fff8044f34c, argv=0x7fff8044f340, required=0, provided=0x7fff8044f350) failed
MPII_Init_thread(222)...: gpu_init failed
但是这台机器没有GPU。 mpi version 命令给出
HYDRA build details:
Version: 4.1
Release Date: Fri Jan 27 13:54:44 CST 2023
CC: gcc
Configure options: '--disable-option-checking' '--prefix=/home/MODULES' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -DNETMOD_INLINE=__netmod_inline_ofi__ -I/home/MODULES/mpich-4.1/src/mpl/include -I/home/MODULES/mpich-4.1/modules/json-c -D_REENTRANT -I/home/MODULES/mpich-4.1/src/mpi/romio/include -I/home/MODULES/mpich-4.1/src/pmi/include -I/home/MODULES/mpich-4.1/modules/yaksa/src/frontend/include -I/home/MODULES/mpich-4.1/modules/libfabric/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs cobalt
Demux engines available: poll select
可能的原因是什么?
提前致谢。
答: 暂无答案
评论
libmpi.so
wrf.exe
ldd wrf.exe
type -p mpirun
libmpi.so
是指向的,而指向 .当我添加此路径并运行它时,它给出了错误libmpi.so -> libmpi.so.12.2.4
ldd wrf.exe
libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40
$LD_LIBRARY_PATH
mpirun -np 36 .wrf.exe
[mpiexec@NM] match_arg (lib/utils/args.c:166): unrecognized argument x [mpiexec@NM] HYDU_parse_array (lib/utils/args.c:181): argument matching returned error [mpiexec@NM] parse_args (mpiexec/get_parameters.c:315): error parsing input array ...
Abort(671732751): Fatal error in internal_Init: Other MPI error, error stack: internal_Init(66)....: MPI_Init(argc=(nil), argv=(nil)) failed MPII_Init_thread(222): gpu_init failed
libmpi.so
wrf.exe
libmpi.so.12 => /home/MODULES/lib/libmpi.so.12
type -p mpirun
/home/MODULES/bin/mpirun
apt-cache search MPI | grep -w MPI| awk '{print $1}' | xargs dpkg -l 2>/dev/null
展示ii intel-oneapi-mpi-2021.8.0 2021.8.0-25329 amd64 Intel® MPI Library ii intel-oneapi-mpi-devel 2021.8.0-25329 amd64 Intel® MPI Library ii intel-oneapi-mpi-devel-2021.8.0 2021.8.0-25329 amd64 Intel® MPI Library ii mpi-default-bin 1.13 amd64 Standard MPI ii mpich 3.3.2-2build1 amd64 Implementation of the MPI Message ii python3-mpi4py 3.0.3-4build2 amd64 bindings of the Message Passing