-
Notifications
You must be signed in to change notification settings - Fork 316
Crash with MPI_Reduce( MPI_IN_PLACE, ...) when destination rank > 0 and two processes on same host #6540
Description
Crash with MPI_Reduce( MPI_IN_PLACE, ... )
Happens when MPI_Reduce destination rank != 0 and running two processes on the same host.
mpich3.2.1 (and openmpi) works as expected
mpich4.0.2 and 4.1.1 crashes
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3e6f)
0 0x0000000000153d69 __memcpy_ssse3_back() :0
1 0x0000000000288d86 MPIR_Typerep_unpack() :0
2 0x00000000002abb8a do_localcopy() utils.c:0
3 0x00000000002abcab MPIR_Localcopy() :0
4 0x00000000001e06bc MPIR_Reduce_intra_reduce_scatter_gather() :0
5 0x000000000025ae8a MPIR_Reduce_allcomm_auto() :0
6 0x000000000025afe4 MPIR_Reduce_impl() :0
7 0x000000000025b936 MPIR_Reduce() :0
8 0x00000000000fadd7 PMPI_Reduce() ???:0
9 0x000000000040150d main() /home/lauri/cpp/mpi_test/cleaned.cpp:29
10 0x00000000000223d5 __libc_start_main() ???:0
11 0x0000000000401239 _start() ???:0
TO REPRODUCE:
mpic++ -O0 -g -o mpi_reduce_test mpi_reduce_test.cpp
mpirun -n 2 ./mpi_reduce_test
// mpi_reduce_test.cpp
#include <vector>
#include <iostream>
#include <sstream>
#include <cstring>
#include <unistd.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int err = MPI_Init( &argc, &argv );
int NoOfProcess, rank;
MPI_Comm_size(MPI_COMM_WORLD, &NoOfProcess);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if ( rank == 0 ) {
char buf[1024 * 8];
int len;
err = MPI_Get_library_version(buf, &len);
std::cout << buf << std::endl;
}
size_t count = 2000;
std::vector<double> buffer(count, (1.0 + (double)rank));
char strBuf[1024];
// MPI_Reduce to process #dst
for ( int dst = 0; dst < 2; dst++ ) {
if ( rank == dst ) {
std::cerr << "In-place MPI_Reduce" << std::endl;
err = MPI_Reduce( MPI_IN_PLACE,
buffer.data(),
count,
MPI_DOUBLE,
MPI_SUM,
dst,
MPI_COMM_WORLD );
} else {
err = MPI_Reduce( buffer.data(),
nullptr,
count,
MPI_DOUBLE,
MPI_SUM,
dst,
MPI_COMM_WORLD );
}
if ( rank == 0 ) {
std::cerr << "After MPI_REDUCE: rank=" << rank << " buffer[0:2]=" << buffer[0] << " " << buffer[1] << std::endl;
MPI_Recv( strBuf, 1024, MPI_CHAR, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE );
std::cerr << strBuf;
std::cerr << "***********" << std::endl;
} else {
std::stringstream ss;
ss << "After MPI_REDUCE: rank=" << rank << " buffer[0:2]=" << buffer[0] << " " << buffer[1] << std::endl;
std::string str = ss.str();
MPI_Send( str.c_str(), str.length(), MPI_CHAR, 0, 0, MPI_COMM_WORLD );
}
}
err = MPI_Finalize();
return 0;
}MPI version details:
Details
MPICH Version: 4.0.2 MPICH Release date: Thu Apr 7 12:34:45 CDT 2022 MPICH ABI: 14:2:2 MPICH Device: ch4:ucx MPICH configure: --prefix=/vols/mmsimP4_t1b_006/ws/lauri/lauri2310_mpi/mmsimMPI/4.0.2/install/64 --with-pm=hydra --with-device=ch4:ucx --disable-checkpointing --disable-libudev --enable-strict --enable-fast=O3 --disable-fortran MPICH CC: gcc -Wall -Wextra -Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL -Wno-unused-parameter -Wshadow -Wmissing-declarations -Wundef -Wpointer-arith -Wbad-function-cast -Wwrite-strings -Wno-sign-compare -Wold-style-definition -Wnested-externs -Winvalid-pch -Wvariadic-macros -Wtype-limits -Werror-implicit-function-declaration -Wstack-usage=262144 -fno-var-tracking -Wno-unused-label -O2 -std=c99 -D_STDC_C99= -D_POSIX_C_SOURCE=200112L -O3 MPICH CXX: g++ -O3 MPICH F77: MPICH FC: