Skip to content

[event_engine] Implement fork support in Posix Event Engine#38980

Closed
eugeneo wants to merge 17 commits intogrpc:masterfrom
eugeneo:fork/11-can-tests-pass
Closed

[event_engine] Implement fork support in Posix Event Engine#38980
eugeneo wants to merge 17 commits intogrpc:masterfrom
eugeneo:fork/11-can-tests-pass

Conversation

@eugeneo
Copy link
Copy Markdown
Contributor

@eugeneo eugeneo commented Mar 12, 2025

No description provided.

@eugeneo eugeneo added the release notes: no Indicates if PR should not be in release notes label Mar 12, 2025
@eugeneo eugeneo self-assigned this Mar 12, 2025
@eugeneo eugeneo changed the title [fork] Epoll1 poller support [event_engine] Event engine fork support Mar 17, 2025
@eugeneo eugeneo force-pushed the fork/11-can-tests-pass branch 2 times, most recently from 40106d9 to d5d2b1b Compare March 20, 2025 23:24
@eugeneo eugeneo marked this pull request as ready for review March 24, 2025 20:00
@eugeneo eugeneo requested review from apolcyn and gnossen as code owners March 24, 2025 20:00
Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting closer!

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 28, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Copy Markdown
Member

@drfloob drfloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are fairly minor. I'm glad to see the fork tests passing! Can you confirm that the fork tests pass say 100k times each with opt and asan builds, just to be sure? You can run them with 2000 RBE jobs, it should take a few minutes.

Copy link
Copy Markdown
Contributor Author

@eugeneo eugeneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Addressed the comments.

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close!

Please let me know if you have any questions. Thanks!

Copy link
Copy Markdown
Contributor Author

@eugeneo eugeneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I addressed the comments. Please take another look.

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a couple of minor nits, this looks great to me!

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Let's make sure everyone's happy with the rollback story before we merge this.

eugeneo and others added 6 commits May 20, 2025 10:12
- Moves fork handling from behind the event_engine_fork experiment
  - No experiment (parity with original fork implementation):
    - Thread pool and timer manager will go through the fork as expected
    - Epoll poller fd will be recreated in the child
    - EventHandles are closed in the child process.
  - Behind experiment (new functionality):
    - File descriptors are tracked in the collection
    - FD generation is tracked and checked on every Posix call.
    - AresResolver is reset on fork
- Fixes to address race between PosixEngine destructor and fork
  handling.
  - Thread pool will properly handle fork even if the parent PosixEngine
    instance entered the destructor before the fork call
  - Thread pool fork handler will wait for the parent event engine
    destructor to finish before shutting down threads.
Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a couple of minor nits here, otherwise looks great!

absl::StrCat("getsockname:", grpc_core::StrError(errno)));
}
return EventEngine::ResolvedAddress(addr.address(), len);
return grpc_event_engine::experimental::LocalAddress(fd.fd());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the grpc_event_engine::experimental:: prefix, since we're already inside of that namespace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to inline that function now, I only had it separate for internal code. Internal code now uses EventEnginePosixInterface too.

for (const auto& engine : LockEventEngines()) {
engine->BeforeFork();
for (const auto& [event_engine, executor] : LockForkHandlers()) {
if (event_engine) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pointer types, please prefer to say != nullptr rather than evaluating them as booleans.

Same thing on throughout.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants