Skip to content

Flush namespace sinks' residuals before destroying namespace#3711

Merged
rhc54 merged 2 commits intoopenpmix:masterfrom
Matthew-Whitlock:iof_sink_flush
Oct 29, 2025
Merged

Flush namespace sinks' residuals before destroying namespace#3711
rhc54 merged 2 commits intoopenpmix:masterfrom
Matthew-Whitlock:iof_sink_flush

Conversation

@Matthew-Whitlock
Copy link
Contributor

I was getting intermittent hangs during PMIx_server_finalize when running MPI jobs with the --output file=<filename> option. The problem is that sinks get destroyed when the namespace is destroyed, but any residuals on those sinks keep a pointer to the pmix_iof_write_event_t object in those sinks. Flushing the residuals during finalize causes PMIx to add invalid events to the event base, which then hangs when being closed by libevent.

@rhc54
Copy link
Contributor

rhc54 commented Oct 28, 2025

Afraid I don't understand - how is this new function any different than the one right above it (pmix_iof_flush_residuals)? Other than releasing the members of the list (which we currently do as a separate step), it appears effectively identical. It looks to me (minus some differences in function signature) like the server already flushes the residuals and then destructs the list. Is the problem perhaps the timing of when that happens versus your proposed change?

Just trying to understand why we would want to flush the residuals twice after this change, and if perhaps this isn't really the source of the hang you observed.

@Matthew-Whitlock
Copy link
Contributor Author

Yeah, it's about the timing. The namespace gets destroyed in PMIx_server_deregister_nspace before PMIx_server_finalize is called to invoke pmix_iof_flush_residuals. When the namespace is destroyed, if any of those residuals have pointers to the destroyed namespace's sinks (which are destroyed by the namespace's destructor), the residual adds an invalid event pointer to the event base (pmix_iof_sink_t.wev.ev) when it is later written/flushed.

The changed function signature lets us flush only the specific sinks' residuals, so that any continuing namespaces' outputs don't risk getting mangled by printing unfinished lines.

@rhc54
Copy link
Contributor

rhc54 commented Oct 28, 2025

So you are flushing the residuals relating to a specific nspace when that nspace is deregistered. At server finalize, we flush all remaining residuals from the global list. This implies that some namespaces were not deregistered prior to finalizing the server. Is that what you are seeing? Or is the residual list empty?

I'm wondering if the complete fix would include replacing the residual flush in server finalize with a pass through the list of registered namespaces to deregister those that remain when we finalize the server? This would ensure we cleaned up everything associated with the namespace, including the residuals. In fact, we do that just a few lines below:

    PMIX_LIST_FOREACH (ns, &pmix_globals.nspaces, pmix_namespace_t) {
        /* ensure that we do the specified cleanup - if this is an
         * abnormal termination, then the nspace object may not be
         * at zero refcount */
        pmix_execute_epilog(&ns->epilog);
    }

Perhaps we need to revise that loop a little bit to basically call "deregister_nspace" - that would cover the epilog as well as all the other required cleanup. I know we are finalizing, but right now it feels like we are forgetting that the host may continue on without us, meaning that the library needs to fully cleanup.

Not saying you (or this PR) have to do all that - just wanting to see if we completely grok the problem.

@Matthew-Whitlock
Copy link
Contributor Author

All namespaces are deregistered before finalizing (to my knowledge, w/ my tests, etc etc), that's not the problem. The problem is that when the namespace is deregistered, it just needs a tiny bit of extra cleanup.

The namespace currently cleans up any of its own pmix_iof_sink_t objects (allocated just for that namespace, because the app was run with --output file=<filename>), but does not check to make sure there are no dangling pointers to those sinks in the residuals list.

Any residuals with dangling pointers end up sitting in the residuals list 'forever' (because that namespace is finalized, so it won't get any more iof messages that would finish a residual line). The bug manifests in a deadlock during finalize only because that's when the server finally tries to clean up those residuals, but the actual bug is during the namespace deregister step.

@Matthew-Whitlock
Copy link
Contributor Author

The safest approach would probably be to put these lines:

pmix_iof_flush_sink(sink);
pmix_iof_static_dump_output(sink);

Inside of the pmix_iof_sink_t class destructor. That way you don't have to worry about making sure you always clean up the residuals before destroying a sink, since the sink handles it itself.

I just didn't want to go verify that it would be safe to access the residuals list and do IO at every existing location where sinks can be destroyed.

@Matthew-Whitlock
Copy link
Contributor Author

So you are flushing the residuals relating to a specific nspace when that nspace is deregistered. At server finalize, we flush all remaining residuals from the global list. This implies that some namespaces were not deregistered prior to finalizing the server. Is that what you are seeing? Or is the residual list empty?

I just realized what you were asking here. I think the residual list might not be empty even if all namespaces are deregistered, because there could be residuals for the pmix_client_globals.iof_stdout and pmix_client_globals.iof_stderr sinks.

@rhc54
Copy link
Contributor

rhc54 commented Oct 29, 2025

I just realized what you were asking here. I think the residual list might not be empty even if all namespaces are deregistered, because there could be residuals for the pmix_client_globals.iof_stdout and pmix_client_globals.iof_stderr sinks.

Thanks - that indeed was my question.

I just didn't want to go verify that it would be safe to access the residuals list and do IO at every existing location where sinks can be destroyed.

Scanning the code, there currently is only one place where sinks are released - and that is in the pmix_namespace_t destructor. Running a destructor on a namespace object can only occur in the PMIx library's event progress thread or after that progress thread has been stopped (e.g., in PMIx_server_finalize) - a requirement of thread safety. So I think you are correct and we should add those two lines to the sink destructor.

However, I'm willing to take this as-is and make the change later. While I agree that PRRTE will deregister known namespaces (i.e., namespaces that are directly "registered" by its host), there are other mechanisms by which namespace objects get added to the server (not part of the minimal "mpirun" operation). The current server finalize doesn't ensure everything associated with any lingering namespace objects gets properly cleaned up, and that should also be fixed.

Revising this in that second stage would be fine with me - make sense?

@rhc54 rhc54 force-pushed the iof_sink_flush branch 2 times, most recently from cbcd4e4 to afa8bba Compare October 29, 2025 19:23
Matthew-Whitlock and others added 2 commits October 29, 2025 13:49
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
As per the discussion on the PR, move the sink cleanup
calls to the sink destructor.

Signed-off-by: Ralph Castain <rhc@pmix.org>
@rhc54
Copy link
Contributor

rhc54 commented Oct 29, 2025

I went ahead and added it here. Had to add a little protection because clients don't have residuals (only servers do), but this better encapsulates the cleanup.

@rhc54 rhc54 merged commit f04adb2 into openpmix:master Oct 29, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants