Skip to content

mpich: use libfabric#73062

Closed
iMichka wants to merge 1 commit intoHomebrew:masterfrom
iMichka:mpich
Closed

mpich: use libfabric#73062
iMichka wants to merge 1 commit intoHomebrew:masterfrom
iMichka:mpich

Conversation

@iMichka
Copy link
Copy Markdown
Member

@iMichka iMichka commented Mar 12, 2021

Fixes (on Linux):
configure: error: no ch4 netmod selected

  • Have you followed the guidelines for contributing?
  • Have you checked that there aren't other open pull requests for the same formula update/change?
  • Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
  • Does your build pass brew audit --strict <formula> (after doing brew install <formula>)?

@iMichka iMichka added the linux to homebrew-core Migration of linuxbrew-core to homebrew-core label Mar 12, 2021
@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented Mar 12, 2021

I am a little bit confused as to why this is not an issue on Mac.
See https://lists.mpich.org/pipermail/discuss/2021-January/006092.html

We could also use ucx instead of libfabric; no clue what is the most sensible choice, so I took the first option.

@carlocab
Copy link
Copy Markdown
Member

@bourdin, would you happen to know anything about this?

@carlocab
Copy link
Copy Markdown
Member

carlocab commented Mar 12, 2021

I think you also need this?

  Configure will use an embedded copy of libfabric or ucx if one is
  not found in the user environment. An installation can be specified
  by adding

    --with-libfabric=<path/to/install> or --with-ucx=<path/to/install>

I guess it wasn't an issue because it used an embedded copy. Which we don't like.

@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented Mar 15, 2021

I did not have to set the above flag to specify the path, it picked up the brewed version right away.
And it linked properly against it:

➜  homebrew-core git:(master) brew linkage mpich
System libraries:
  /lib/x86_64-linux-gnu/libc.so.6
  /lib/x86_64-linux-gnu/libm.so.6
  /lib/x86_64-linux-gnu/libpthread.so.0
  /lib/x86_64-linux-gnu/librt.so.1
Homebrew libraries:
  /home/linuxbrew/.linuxbrew/lib/gcc/10/libgcc_s.so.1 (gcc)
  /home/linuxbrew/.linuxbrew/lib/gcc/10/libquadmath.so.0 (gcc)
  /home/linuxbrew/.linuxbrew/lib/gcc/10/libstdc++.so.6 (gcc)
  /home/linuxbrew/.linuxbrew/lib/libgfortran.so.5 (gcc)
  /home/linuxbrew/.linuxbrew/lib/libfabric.so.1 (libfabric)
  /home/linuxbrew/.linuxbrew/Cellar/mpich/3.4.1/lib/libmpi.so.12 (mpich)
  /home/linuxbrew/.linuxbrew/lib/libmpi.so.12 (mpich)

@carlocab
Copy link
Copy Markdown
Member

Maybe it's less smart about library detection on macOS.

@iMichka iMichka force-pushed the mpich branch 2 times, most recently from 88b9b78 to 010deb8 Compare March 15, 2021 17:59
@Bo98
Copy link
Copy Markdown
Member

Bo98 commented Mar 15, 2021

Maybe it's less smart about library detection on macOS.

It probably finds it via pkg-config, but pkg-config isn't a dependency here.

@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented Mar 15, 2021

I added the flag explicitly.

@Bo98
Copy link
Copy Markdown
Member

Bo98 commented Mar 15, 2021

Yeah, the default behaviour seems to be changing in 4.0 so probably safer to have it explicit.

@bourdin
Copy link
Copy Markdown
Contributor

bourdin commented Mar 16, 2021

@wence also has a very valid point that I switched the C compiler from clang to gcc in #72985
This should be reverted. Do you want to do another MR once this one is merged or do it in this one?

@carlocab
Copy link
Copy Markdown
Member

Maybe you need to specify -L and -l flags for libfabric in the test? Just a stab in the dark.

SMillerDev
SMillerDev previously approved these changes Mar 25, 2021
@carlocab
Copy link
Copy Markdown
Member

carlocab commented Apr 6, 2021

Maybe you need to specify -L and -l flags for libfabric in the test? Just a stab in the dark.

I was about to suggest this again. Then noticed I had already said it 😂

It seems to be linking with libfabric just fine:

❯ otool -L libmpi.dylib
libmpi.dylib:
        @@HOMEBREW_PREFIX@@/opt/mpich/lib/libmpi.12.dylib (compatibility version 14.0.0, current version 14.10.0)
        @@HOMEBREW_CELLAR@@/mpich/3.4.1_2/lib/libpmpi.12.dylib (compatibility version 14.0.0, current version 14.10.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.60.1)
        @@HOMEBREW_PREFIX@@/opt/libfabric/lib/libfabric.1.dylib (compatibility version 17.0.0, current version 17.0.0)
        /System/Library/Frameworks/OpenCL.framework/Versions/A/OpenCL (compatibility version 1.0.0, current version 1.0.0)

@carlocab
Copy link
Copy Markdown
Member

carlocab commented Apr 7, 2021

This also builds its own hwloc, when it should use the brewed one.

@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented Apr 15, 2021

Just to be clear, I am not trying to fix a Mac issue. It's just that the linux build asked me to add these flags and add a path to libfabric.

I think that depending on the brewed libfabric is a good thing, especially because it looks like mpich and libfabric conflict right now (and nobody noticed). You can try to brew install libfabric && brew install mpich and you'll see that's it's already broken.

@carlocab
Copy link
Copy Markdown
Member

Using a vendored hwloc is a Linux issue too.

@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented Apr 22, 2021

I added brew hwloc too and it links against that hwloc now too.

The error we are getting is the following:

2021-04-20T22:28:16.6698030Z /usr/bin/sandbox-exec -f /private/tmp/homebrew20210420-49101-1b9cxxb.sb ruby -W1 -- /usr/local/Homebrew/Library/Homebrew/test.rb /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/mpich.rb --verbose
2021-04-20T22:28:16.6700650Z �[34m==>�[0m �[1m/usr/local/Cellar/mpich/3.4.1_2/bin/mpicc hello.c -o hello�[0m
2021-04-20T22:28:16.6702510Z �[34m==>�[0m �[1m./hello�[0m
2021-04-20T22:28:16.6703910Z Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in MPI_Init: Other MPI error, error stack:
2021-04-20T22:28:16.6705260Z MPIR_Init_thread(152).......: 
2021-04-20T22:28:16.6706220Z MPID_Init(597)..............: 
2021-04-20T22:28:16.6707260Z MPIDI_OFI_mpi_init_hook(674): 
2021-04-20T22:28:16.6708640Z create_vni_context(964).....: OFI resource bind failed (ofi_init.c:964:create_vni_context:No message available on STREAM)
2021-04-20T22:28:16.6711560Z [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1615247
2021-04-20T22:28:16.6712800Z :
2021-04-20T22:28:16.6713970Z system msg for write_line failure : Bad file descriptor
2021-04-20T22:28:16.6715760Z Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in MPI_Init: Other MPI error, error stack:
2021-04-20T22:28:16.6717170Z MPIR_Init_thread(152).......: 
2021-04-20T22:28:16.6718270Z MPID_Init(597)..............: 
2021-04-20T22:28:16.6719190Z MPIDI_OFI_mpi_init_hook(674): 
2021-04-20T22:28:16.6720910Z create_vni_context(964).....: OFI resource bind failed (ofi_init.c:964:create_vni_context:No message available on STREAM)
2021-04-20T22:28:16.6724080Z [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1615247
2021-04-20T22:28:16.6725140Z :
2021-04-20T22:28:16.6726030Z system msg for write_line failure : Bad file descriptor

My proposal is the following: add the change I did here for Linux only, so it fixes the Linux build. And then open an issue upstream and ask them for help. I have no clue how to solve this issue.

Does this strategy seem reasonable for you?

@carlocab
Copy link
Copy Markdown
Member

Sounds good to me!

@carlocab
Copy link
Copy Markdown
Member

Just realised this, sorry: let's add appropriate conflicts_with lines for libfabric and mpich in an on_macos block, since they actually do conflict while mpich is using its own libfabric.

@iMichka
Copy link
Copy Markdown
Member Author

iMichka commented May 3, 2021

Done. I also added a link to the email I sent to upstream.

Fixes (on Linux):
configure: error: no ch4 netmod selected
This was referenced May 3, 2021
@BrewTestBot
Copy link
Copy Markdown
Contributor

:shipit: @iMichka has triggered a merge.

@carlocab
Copy link
Copy Markdown
Member

carlocab commented May 3, 2021

Oops. I rebased #74843 thinking this had already been merged earlier. Now I need to rebase again 😄

@iMichka iMichka deleted the mpich branch May 3, 2021 17:06
carlocab added a commit to carlocab/homebrew-core that referenced this pull request May 3, 2021
@carlocab carlocab mentioned this pull request May 3, 2021
5 tasks
iMichka pushed a commit that referenced this pull request May 3, 2021
# https://lists.mpich.org/pipermail/discuss/2020-January/005863.html
args << "FFLAGS=-fallow-argument-mismatch"
args << "CXXFLAGS=-Wno-deprecated"
args << "CFLAGS=-fgnu89-inline -Wno-deprecated"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iMichka I am tracking down a downstream issue building a C++ CMake project using FindMPI.cmake against Brew's MPICH on macOS-10.15.

Since this update, it looks like this line's -fgnu89-inline for some reason ends on the downstream compiler C++ line ending in a:

[  3%] Building CXX object CMakeFiles/<project>.dir/src/<file>.cpp.o
  clang: warning: -framework -std=c++14: 'linker' input unused [-Wunused-command-line-argument]
  error: invalid argument '-fgnu89-inline' not allowed with 'C++'

I posted an issue & reproducer in #80465

@github-actions github-actions bot added the outdated PR was locked due to age label Aug 2, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

linux to homebrew-core Migration of linuxbrew-core to homebrew-core outdated PR was locked due to age

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants