Skip to content

test EOF behavior and fix missing EPOLLRDHUP events for TCP sockets#3574

Merged
sporksmith merged 3 commits intoshadow:mainfrom
sporksmith:epollrdhup-test
Apr 28, 2025
Merged

test EOF behavior and fix missing EPOLLRDHUP events for TCP sockets#3574
sporksmith merged 3 commits intoshadow:mainfrom
sporksmith:epollrdhup-test

Conversation

@sporksmith
Copy link
Copy Markdown
Contributor

@sporksmith sporksmith commented Apr 25, 2025

While debugging
https://gitlab.torproject.org/tpo/core/arti/-/issues/1972, it became apparent that the tokio runtime seems to rely on EPOLLRDHUP to know when a socket is closed, at least in some circumstances.

This adds tests for this and several variations of EOF behavior.

A second commit adds EPOLLRDHUP support for TCP sockets and enables running the corresponding tests under shadow.

@github-actions github-actions bot added the Component: Testing Unit and integration tests and frameworks label Apr 25, 2025
@github-actions github-actions bot added Component: Main Composing the core Shadow executable Component: Documentation In-repository documentation, under docs/ labels Apr 25, 2025
@sporksmith sporksmith changed the title test_epoll: test EOF behavior test EOF behavior and fix missing EPOLLRDHUP events for sockets Apr 25, 2025
@sporksmith sporksmith changed the title test EOF behavior and fix missing EPOLLRDHUP events for sockets test EOF behavior and fix missing EPOLLRDHUP events for TCP sockets Apr 25, 2025
@sporksmith sporksmith requested a review from stevenengler April 25, 2025 16:17
@sporksmith
Copy link
Copy Markdown
Contributor Author

The epoll changes are definitely a bit cargo-culted. A critical eye from someone more familiar with this code is more than welcome :)

@sporksmith sporksmith requested a review from robgjansen April 25, 2025 16:26
@sporksmith
Copy link
Copy Markdown
Contributor Author

The epoll changes are definitely a bit cargo-culted. A critical eye from someone more familiar with this code is more than welcome :)

On that note adding @robgjansen too; if you have time to look I think you might be the most familiar with this code?

Copy link
Copy Markdown
Member

@robgjansen robgjansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoll cargo-culting seems appropriately cargo-culted to me :) We should definitely have Steve review too though.

It was pleasant to see how much easier it was to add this to the Rust epoll module than the C module.

I couldn't convince myself why exactly we need support in both epoll modules... Do we? AFAIK, when the managed program creates an Epoll, we manage that state with the Rust mod, but when Shadow's syscall handler runs into a blocking syscall, we manage that wait task with the C mod. I guess a RDHUP might interact in both of the above paths?

Ultimately, the tests pass so I can't argue with that :D

@sporksmith
Copy link
Copy Markdown
Contributor Author

I couldn't convince myself why exactly we need support in both epoll modules... Do we? AFAIK, when the managed program creates an Epoll, we manage that state with the Rust mod, but when Shadow's syscall handler runs into a blocking syscall, we manage that wait task with the C mod. I guess a RDHUP might interact in both of the above paths?

Hmm, yeah I was and am a bit confused about the relationship between the two. There was definitely a bit of "keep adding to places where it looks like it should be until it works". I'll see if I can trace through a bit more; would definitely be preferable to not add dead code that might further confuse our future selves :)

@stevenengler
Copy link
Copy Markdown
Contributor

Cool! This looks like a difficult one to debug, so I'm glad you were able to narrow it down to this.

@stevenengler
Copy link
Copy Markdown
Contributor

I couldn't convince myself why exactly we need support in both epoll modules... Do we? AFAIK, when the managed program creates an Epoll, we manage that state with the Rust mod, but when Shadow's syscall handler runs into a blocking syscall, we manage that wait task with the C mod. I guess a RDHUP might interact in both of the above paths?

Hmm, yeah I was and am a bit confused about the relationship between the two. There was definitely a bit of "keep adding to places where it looks like it should be until it works". I'll see if I can trace through a bit more; would definitely be preferable to not add dead code that might further confuse our future selves :)

I think the C epoll code is only used for the C select() and poll() syscall handlers. The rust epoll syscall handlers should only be using the rust epoll code. But poll() does have a POLLRDHUP that we could probably wire up to the C epoll if we needed to.

While debugging
https://gitlab.torproject.org/tpo/core/arti/-/issues/1972, it became
apparent that the tokio runtime seems to rely on EPOLLRDHUP to know when
a socket is closed, at least in some circumstances.

This adds tests for this and several variations of EOF behavior.
shadow fails the tests for socket file descriptors with EPOLLRDHUP.
@sporksmith sporksmith enabled auto-merge April 28, 2025 18:29
@sporksmith sporksmith merged commit 5db659c into shadow:main Apr 28, 2025
25 checks passed
@sporksmith sporksmith deleted the epollrdhup-test branch April 28, 2025 18:53
zydou pushed a commit to zydou/arti that referenced this pull request May 2, 2025
This reverts commit 5f6bd6c.

This workaround is no longer needed since the underlying bugs have been
fixed in shadow (See shadow/shadow#3100 and
shadow/shadow#3574), and the CI has been bumped
to use a version of shadow with these fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Documentation In-repository documentation, under docs/ Component: Main Composing the core Shadow executable Component: Testing Unit and integration tests and frameworks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants