Use QuicheMemSlice constructor with releasor.#31167
Use QuicheMemSlice constructor with releasor.#31167RyanTheOptimist merged 10 commits intoenvoyproxy:mainfrom
Conversation
The existing model of forwarding to the quiche::MemSliceImpl constructor is an abstraction violation and makes it hard to swap out the underlying MemSlice platform implementation. As such, we've recently added a new constructor to QuicheMemSlice (and the Impl API) that allows for an arbitrary custom releasor. This commit is a little wonky in that it only really uses a lambda releasor for its capture list to manage object lifetime, and so the lambda body is intentionally empty. Since we're storing a unique_ptr in the capture list (allowable since C++17), we can't store this in a std::function which requires copyability. As such, we use absl::AnyInvocable. Signed-off-by: Steve Wang <wangsteve@google.com>
Signed-off-by: Steve Wang <wangsteve@google.com>
The custom deleter's lifetime is bound to the BufferFragmentImpl, not the Apparently when I provided the sample Envoy implementation, I had inadvertently reverted the buffer_impl changes to use absl::AnyInvocable, which were necessary to make this compile... Signed-off-by: Steve Wang <wangsteve@google.com>
|
There is one failing test (test/integration/quic_protocol_integration_test) which seems to be a segfault when closing QUIC connections. Given that the focus of this PR involves QUIC connection lifetime management, I would not be surprised if it's a breakage that I need to track down. |
|
/assign @danzh2010 |
The crash stack might be a test setup issue irrelevant to this PR, but I couldn't repro it locally under opt build with ASAN in a clean branch. If you are able to repro and need help to look into the debug log, I'd be happy to help. |
|
I couldn't reproduce it running locally in this branch with --compilation_mode=opt, but I did look at the stack trace in the test logs (quoted below for convenience). It didn't obviously seem related to this PR, but it still makes me nervous. stack_decode.py didn't seem to consistently identify line numbers, even when building with --config=-gmlt to preserve debug symbols (after the first two lines, it started claiming Envoy code was coming from things like |
| // If it turns out to be expensive, add a new function to free data in the middle in buffer | ||
| // interface and re-design QuicheMemSliceImpl. | ||
| quic_slices.emplace_back(quiche::QuicheMemSlice::InPlace(), data, slice.len_); | ||
| auto single_slice_buffer = std::make_unique<Buffer::OwnedImpl>(); |
There was a problem hiding this comment.
This change I would love a way to fallback in case it goes wrong. RUNTIME_GUARD isn't an option here because we want to compile out the old way internally. Would #ifdef SOME_PREPROCESSOR be acceptable? @alyssawilk
There was a problem hiding this comment.
do we need to compile out the old way overnight? Can we do the usual flip-wait-deprecate?
There was a problem hiding this comment.
flip-wait-deprecate takes 6 months, and preferably we don't want to wait for that long.
There was a problem hiding this comment.
ah so for functional changes you need to wait 6 months for operators to update and test, but for "I want to guard this in case it causes crashes but really things look fine" changes you can remove at your own pace. Does that help any?
There was a problem hiding this comment.
As Dan mentions, we'd like to be able to change the compile-time dependencies (in particular so we no longer depend on a custom constructor in Envoy platform's QuicheMemSliceImpl).
I'm happy to leave the default behavior as-is and protect this new behavior behind a compile-time guard so everyone who doesn't care about swapping out this dependency is unaffected, but guarding it behind a runtime flag means we can't actually swap out the underlying implementation until the old way is removed.
There was a problem hiding this comment.
I'm not sure how we get a fallback and keep the code in, so I was trying to offer a less painful timeline to get both in sequence.
There was a problem hiding this comment.
Right. I don't think a runtime fallback is feasible for what I want to do.
There was a problem hiding this comment.
@RyanTheOptimist FYI. I think we want to use a runtime guard with an ifdef. Something like...
#ifdef USE_QUICHE_MEM_SLICE_RELEASOR_API_EXCLUSIVELY
if (!RUNTIME_GUARD(use_quiche_mem_slice_releasor)) {
quic_slices.emplace_back(quiche::QuicheMemSlice::InPlace(), data, slice.len_);
} else {
#endif
auto single_slice_buffer = std::make_unique<Buffer::OwnedImpl>();
// ...
#ifdef USE_QUICHE_MEM_SLICE_RELEASOR_API_EXCLUSIVELY
}
#endif
(Rough pseudocode. I'll send a commit along these lines later today.)
We want to be able to compile out the existing API. Once we've put some mileage on this implementation, we can set this to be default true for other Envoy users. Signed-off-by: Steve Wang <wangsteve@google.com>
We'll manage this with a short-term patch for our internal import. Signed-off-by: Steve Wang <wangsteve@google.com>
RyanTheOptimist
left a comment
There was a problem hiding this comment.
nit: Please mention the runtime guard in the PR description.
Did we get to the bottom of the test failures that were seen earlier?
| single_slice_buffer->move(data, slice.len_); | ||
| quic_slices.emplace_back( | ||
| reinterpret_cast<char*>(single_slice_buffer->frontSlice().mem_), slice.len_, | ||
| [single_slice_buffer = std::move(single_slice_buffer)](const char*) {}); |
There was a problem hiding this comment.
nit: Can we add a comment here to explain what's going on. At first glance, it looks like we move the data into a new unique_ptr, and then pass a pointer to the data as the first argument to this function and then the local variable goes out of scope and deletes the data. But of course, thats' not what happens because in the lambda capture, we std::move() that unique ptr into a captured variable which lives until that invokable goes out of scope when the slice is destructed. Do I have that right?
In other words, the only purpose of the callback is to capture the unique_ptr, not actually to do anything when executed (The action happens when the callback is destroyed)? We could also explicitly clear single_slice_buffer in the body of the callback if we wanted to be a bit more explicit, though I'm not sure how useful that is.
There was a problem hiding this comment.
I think clearing that explicitly is inherently more readable and makes it easier to reason about the lifetime (since then it's tied to the invocation of the releasor, as opposed to the destructor). That's not a bad idea.
|
/assign @RyanTheOptimist |
PR description updated. I never got to the bottom of the test failure, as I could never reproduce it, and the stack trace didn't point to buffer management being the culprit. I see a few other failures:
|
This aligns lifetime better with the intent of the releasor API. Signed-off-by: Steve Wang <wangsteve@google.com>
These are duplicated from other tests. (If it's preferable, I can instead parameterize the entire test suite, but that seemed more intrusive.) Signed-off-by: Steve Wang <wangsteve@google.com>
|
Looks like real test failures: https://github.com/envoyproxy/envoy/actions/runs/7143515779/job/19455192451?pr=31167 |
|
But only under gcc? Bizarre. |
Also windows. :/ Wonder if there are some compiler shenanigans going on here |
This may explain why we only see failing tests on client streams, and not on the server streams. I haven't been able to reproduce the failures locally but am still looking into them. Signed-off-by: Steve Wang <wangsteve@google.com>
|
I looked at compiler support for init-captures in lambdas, but that came about in C++14 and should have been supported by both gcc and msvc for 5+ years (and should certainly be in the versions that Envoy uses). Anyways, I'm pushing a new commit that should cause envoy_quic_server_stream_test to start failing in the same way, if this is reproducible. |
Head branch was pushed to by a user without write access
Signed-off-by: Steve Wang <wangsteve@google.com>
| auto single_slice_buffer = std::make_unique<Buffer::OwnedImpl>(); | ||
| single_slice_buffer->move(data, slice.len_); | ||
| quic_slices.emplace_back( | ||
| reinterpret_cast<char*>(single_slice_buffer->frontSlice().mem_), slice.len_, |
There was a problem hiding this comment.
bazel -c opt --config=gcc seems to complain about nullptr here. I guess this may be due to an issue with function parameter evaluation order?
It's likely safer to just use slice.mem_ here. (That may or may not be the segfault / crash I'm seeing.)
If the lambda is created before we access frontSlice().mem_, then this may result in a nullptr dereference. This seems to be implementation-defined. Signed-off-by: Steve Wang <wangsteve@google.com>
|
/wait |
|
Failed TSAN test: UpstreamProtocols/DownstreamProtocolIntegrationTest.ManyLargeRequestHeadersAccepted/IPv4_Http2Downstream_Http3UpstreamHttpParserNghttp2NoDeferredProcessingLegacy I don't expect this to be related, since the new logic should be runtime-guard-disabled in most tests. The other previously-failing presubmits are now passing (gcc, windows, coverage), so I'm inclined to guess this is a flaky test. |
The existing model of forwarding to the quiche::MemSliceImpl constructor is an abstraction violation and makes it hard to swap out the underlying MemSlice platform implementation. As such, we've recently added a new constructor to QuicheMemSlice (and the Impl API) that allows for an arbitrary custom releasor.
This PR uses the new constructor, guarded behind the runtime flag
envoy.reloadable_features.quiche_use_mem_slice_releasor_api(disabled by default).Since we're storing a unique_ptr in the capture list (allowable since C++17), we can't store this in a std::function which requires copyability. As such, we use absl::AnyInvocable.
(I've marked this as medium risk since it interacts with Envoy's memory management model for QUIC streams and so deserves some scrutiny.)
Commit Message: Use QuicheMemSlice constructor with releasor.
Additional Description:
Risk Level: Medium
Testing: This is a refactor, so it should be covered by existing tests (test/common/buffer:buffer_test, test/common/quic:envoy_quic_{client,server}_stream_test)
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a