Skip to content

Add new PMIX_GROUP_FINAL_MEMBERSHIP_ORDER attribute#3716

Merged
rhc54 merged 1 commit intoopenpmix:masterfrom
rhc54:topic/ord
Nov 5, 2025
Merged

Add new PMIX_GROUP_FINAL_MEMBERSHIP_ORDER attribute#3716
rhc54 merged 1 commit intoopenpmix:masterfrom
rhc54:topic/ord

Conversation

@rhc54
Copy link
Contributor

@rhc54 rhc54 commented Nov 5, 2025

The PMIx_Group_construct API itself is not order sensitive in the provided proc array - i.e., we ignore that order when executing the operation. This allows each caller to provide the proc array in arbitrary order - which is advantageous for most libraries.

However, some users may need us to return a specific order of the procs in the final membership. Allow them to specify the order in a new attribute, This can be specified by individual processes or by namespace (with a wildcard rank). If multiple participants provide this attribute, then the values must match - i.e., the desired final membership order must be identical.

The PMIx_Group_construct API itself is not order sensitive in the provided proc array - i.e., we ignore that order when executing the operation. This allows each caller to provide the proc array in arbitrary order - which is advantageous for most libraries.

However, some users may need us to return a specific order of the procs in the final membership. Allow them to specify the order in a new attribute, This can be specified by individual processes or by namespace (with a wildcard rank). If multiple participants provide this attribute, then the values must match - i.e., the desired final membership order must be identical.

Signed-off-by: Ralph Castain <rhc@pmix.org>
@rhc54 rhc54 merged commit 06a3e48 into openpmix:master Nov 5, 2025
26 checks passed
@rhc54 rhc54 deleted the topic/ord branch November 5, 2025 23:59
@rhc54
Copy link
Contributor Author

rhc54 commented Nov 6, 2025

@sonjahapp I understand that MPI may have a requirement to control the ordering of procs in the final membership. I added this to help provide that control - see the accompanying PRRTE change openpmix/prrte#2303.

Occurred to me that if you specify an ordering using an nspace/wildcard combination, then you might want any specific ranks to appear in rank order - true? If so, that is easily provided over in PRRTE.

Let me know what you think.

@sonjahapp
Copy link

Hi Ralph!

Occurred to me that if you specify an ordering using an nspace/wildcard combination, then you might want any specific ranks to appear in rank order - true?

I would say yes. For any other rank order, wildcards/nspace cannot be used but rather the explicit order must be provided.

A few (perhaps stupid) questions: Does the process order in the membership array have any influence on how OpenPMIx or PRRTE treat the PMIx Group? Are there any benefits or downsides for PMIx or the host server implementation that could originate from a specific proc order in the membership array? Or is this feature "just" for convenience on PMIx client side - so that the clients would not have to sort the processes by themselves?

I would assume that for MPI this feature is useful for optimization, but not really strictly required, or is it? As long as the group membership array is identical in all procs of the group, MPI should be able to deal with it and sort the procs in whatever order is required.

@rhc54
Copy link
Contributor Author

rhc54 commented Nov 6, 2025

Does the process order in the membership array have any influence on how OpenPMIx or PRRTE treat the PMIx Group?

None whatsoever.

Are there any benefits or downsides for PMIx or the host server implementation that could originate from a specific proc order in the membership array?

Not that I am aware of - we don't care about the order

Or is this feature "just" for convenience on PMIx client side - so that the clients would not have to sort the processes by themselves?

The question arose in terms of efficiency. According to @abouteiller, MPI requires that the order of the procs in the eventual communicator be set based on how connect/accept is called. If we want to use PMIx_Group_construct to assemble the procs into a group, and then translate that group to the MPI communicator, then the final group membership should be in the required communicator order. True, you could inject a sorting algorithm into the mix, but it might be nice to simplify the procedure.

The PMIx API treats the provided membership array as order insensitive. This was done because the OMPI folks hit a problem where the participating procs were calling the API with different orders - basically, they would have had to inject a collective to agree upon the order prior to calling the API, but the procs don't have each others endpoint info until after the PMIx operation completes. Bit of a chicken/egg problem.

So we made the membership input order insensitive and then insure that the final membership array returned to each participant is identical. Note that other programming models don't care about order, and so this is strictly an MPI problem - and may be even more strictly an OMPI problem. Not entirely sure.

With this new feature, at least the OMPI connect/accept can simply call the group construct API and pass the new attribute to indicate that members from nspace A should come first, followed by members from nspace B. Other programming models might get more atomistic and specify individual ordering. However, I do admit that one could instead sort the final membership array to get whatever you wanted.

Ultimately a question for you folks: is this useful? 🤷‍♂️ Easy enough to remove if not.

@rhc54
Copy link
Contributor Author

rhc54 commented Nov 6, 2025

Oh, just FWIW: note that the host (and not the PMIx library) assumes the responsibility for performing the sort as it is the only one that knows the final membership.

@bosilca
Copy link
Contributor

bosilca commented Nov 6, 2025

MPI connect/accept is order agnostic. Moreover, all MPI dynamic processes functionality is order agnostic., as none of it create an intra communicator (where all processes are part of the same group and the order is important), but an intracomm (where each side of the connect/accept is on their own group with their own original ranking). The only MPI function that need to know the ranking is MPI_Intercomm_merge which collapses an intercomm into an intracomm, but at that point we already have the intracommunicator to do MPI-level collectives.

@rhc54
Copy link
Contributor Author

rhc54 commented Nov 6, 2025

Interesting - @abouteiller could you please reiterate your concerns about ordering of the final membership? Perhaps I'm not stating them correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants