Keep track and release Reactor Netty 4 Transport accepted Http Channels during the Node shutdown#20106
Conversation
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
WalkthroughAdded channel tracking and release functionality for Reactor Netty 4 HTTP transport during node shutdown. Introduced integration tests to validate HTTP channel lifecycle management under concurrent requests, modified channel closure to handle async context coordination, and updated request consumers to register accepted channels with the transport. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Transport as ReactorNetty4HttpServerTransport
participant Consumer as ReactorNetty4StreamingRequestConsumer
participant Channel as HttpChannel
participant CloseContext
Client->>Transport: Incoming HTTP request
Transport->>Consumer: Create consumer (pass transport)
Consumer->>Transport: serverAcceptedChannel(channel)
Transport->>Channel: Register accepted channel
Consumer->>Consumer: Process request
Consumer->>Channel: Close on completion
alt closeContext not completed
Channel->>CloseContext: addListener(close future)
CloseContext->>Channel: Complete → triggers close
else closeContext already completed
Channel->>Channel: Close immediately
end
Note over Transport,Channel: On node shutdown
Transport->>Transport: Track and release all<br/>accepted channels
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
Comment |
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
…ky-msearch-error-reporting-smoke-tests Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
|
Thanks for the quick follow up here, @fdesu! Why does this channel not get closed? Manually closing the client seems to violate the intent of the test framework per this comment: |
|
@andrross oh, good point, wasn’t aware of that! I’m not entirely sure why the channel doesn’t get closed automatically but it just stays open since the deregistration doesn’t fire up which would have caused the RestCancellableNodeClient housekeeping to kick in. Maybe it’s somehow related to the channel type which is Reactor Netty4 kind of channel but I’m not really sure. Another thought that I had in mind was that the test(s) don’t fully consume the Response body and hence the receiving side just keeps the channel alive but it doesn’t seem to be the case since we use EntityUtils.toString which tries-with-resources the underlying InputStream. I’ll look deeper and would really appreciate if anyone can point me in the right direction. |
|
❌ Gradle check result for 25166b1: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@coderabbitai do you find @andrross comment above helpful to improve your review? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 933 🏁 Script executed: Length of output: 693 🏁 Script executed: Length of output: 17682 🏁 Script executed: Length of output: 3789 @peterzhuamazon Yes, @andrross's comment is very helpful and raises a valid architectural concern that should influence the review. After examining the framework code, I can confirm that the test framework is explicitly designed to:
The framework documentation at lines 2381-2382 explicitly states: "The returned client gets automatically closed when needed, it shouldn't be closed as part of tests otherwise it cannot be reused by other tests anymore." The Core Issue: The current fix uses try-with-resources to close the client in individual test methods, which:
What Needs Investigation: The real question is: Why doesn't the HttpChannel close automatically through the normal framework lifecycle? As @fdesu noted, it could be related to:
Recommendation: Rather than working around the framework's design, I'd suggest:
This would be a more robust solution that maintains the framework's architecture and benefits all tests using the framework. |
|
I mean, I don't think it's the test harness's design that causes the issue, it's something else. My expectation is, after the Note that the:
|
@fdesu I don't think you're missing anything. I think there might be a latent bug here with the Reactor Netty implementation. The channel gets added to a collection in AbstractHttpServerTransport#serverAcceptedChannel, which will guarantee that on node shutdown those channels get closed here. However, I don't see that |
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
…ky-msearch-error-reporting-smoke-tests Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
|
❌ Gradle check result for 4abed82: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
@andrross @reta great point, lots of thanks for this hint! I can confirm the source of failures was randomness in what transport gets picked for the smoke tests: when I've added a channel init lifecycle hook for the |
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
...etty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java
Outdated
Show resolved
Hide resolved
...etty4/src/main/java/org/opensearch/http/reactor/netty4/ReactorNetty4HttpServerTransport.java
Outdated
Show resolved
Hide resolved
…r proper context access Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
|
@fdesu I just figured out that I think we do have much simpler solution for the problem, by modifying The We don't need wrappers and initializers in this case |
Signed-off-by: Sergei Ustimenko <fdesu@proton.me>
|
@reta interesting, I assume the streaming request consumer would need to do the same for the I'm not entirely sure but the |
…ls during the Node shutdown (opensearch-project#20106) * Close RestClient after MSearch API calls to reliably close HttpChannels Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add the CHANGELOG entry Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Keep track of Reactor Netty 4 Transport accepted Http Channels Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Update CHANGELOG Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add missing visibility modifier to RestCancellableNodeClient#getNumTasks Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Extract channel wrapper into a separate class and extend the get() for proper context access Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add missing javadocs Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Hook into AbstractHttpServerTransport directly from Reactor Request Consumers Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Make sure the CloseContext is always called even if the channel is disposed already Signed-off-by: Andriy Redko <drreta@gmail.com> * Remove extra code Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add more coverage for streaming and non-streaming netty connections Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Revert visibility modifier of the RestCancellableNodeClient#getNumTasks Signed-off-by: Sergei Ustimenko <fdesu@proton.me> --------- Signed-off-by: Sergei Ustimenko <fdesu@proton.me> Signed-off-by: Andriy Redko <drreta@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
…ls during the Node shutdown (opensearch-project#20106) * Close RestClient after MSearch API calls to reliably close HttpChannels Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add the CHANGELOG entry Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Keep track of Reactor Netty 4 Transport accepted Http Channels Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Update CHANGELOG Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add missing visibility modifier to RestCancellableNodeClient#getNumTasks Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Extract channel wrapper into a separate class and extend the get() for proper context access Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add missing javadocs Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Hook into AbstractHttpServerTransport directly from Reactor Request Consumers Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Make sure the CloseContext is always called even if the channel is disposed already Signed-off-by: Andriy Redko <drreta@gmail.com> * Remove extra code Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Add more coverage for streaming and non-streaming netty connections Signed-off-by: Sergei Ustimenko <fdesu@proton.me> * Revert visibility modifier of the RestCancellableNodeClient#getNumTasks Signed-off-by: Sergei Ustimenko <fdesu@proton.me> --------- Signed-off-by: Sergei Ustimenko <fdesu@proton.me> Signed-off-by: Andriy Redko <drreta@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
Description
It appears that after executing Multi Search API, the HttpChannel stays registered with the
RestCancellableNodeClientas the underlying channel may not always get closed.The junit rule that supposed to close the Rest Client (the
OpenSearchTestClusterRule) during the cleanup phase doesn't always close the server-side channel which then pollutes theRestCancellableNodeClientand potentially causes more havoc.The solution is rather simple - close the Rest Client within the test case and reliably release all underlying resources. Running relevant tests multiple times with the fix applied seems to eliminate the issue.
Related Issues
Resolves #20034
Check List
[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.