Skip to content

ratelimit: support detach request after sent limit request#40958

Merged
yanavlasov merged 7 commits intoenvoyproxy:mainfrom
wbpcode:dev-resolve-lifetime-problem-of-limit
Sep 24, 2025
Merged

ratelimit: support detach request after sent limit request#40958
yanavlasov merged 7 commits intoenvoyproxy:mainfrom
wbpcode:dev-resolve-lifetime-problem-of-limit

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented Sep 3, 2025

Commit Message: ratelimit: support detach request after sent limit request
Additional Description:

To close #40892. In the previous implementation, at stream complete phase, the substitution formatter couldn't works as expected for limit request. This is because no related context is provided to the grpc client to avoid potential lifetime problem.

This new implementation add a detach() method at Grpc::AsyncRequest which will clean up the context (parent span, stream info, and so on). And then, at the rate limit filter, at stream complete phase, the stream info will be provided to the grpc client and then be cleaned up by the detach() to avoid potential dangling reference.

Risk Level: low.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.

@wbpcode wbpcode marked this pull request as draft September 3, 2025 14:13
@wbpcode wbpcode force-pushed the dev-resolve-lifetime-problem-of-limit branch 2 times, most recently from 70306a5 to f1aece4 Compare September 6, 2025 07:45
@wbpcode wbpcode marked this pull request as ready for review September 6, 2025 07:45
@wbpcode wbpcode force-pushed the dev-resolve-lifetime-problem-of-limit branch from f1aece4 to e3c02c7 Compare September 6, 2025 09:45
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Sep 6, 2025

Finally, this should be ready for review.

@wbpcode wbpcode requested a review from zuercher as a code owner September 6, 2025 11:30
Signed-off-by: WangBaiping <wbphub@gmail.com>
@wbpcode wbpcode force-pushed the dev-resolve-lifetime-problem-of-limit branch from b43f70d to 4cb2fe6 Compare September 7, 2025 16:16
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

Comment on lines +37 to +45

/**
* Detach the pending request. This will not cancel the request but will clean up
* all context associated with downstream request to avoid dangling references.
* NOTE: the callbacks that registered to take the response will be kept to handle
* the response when it arrives. The caller is responsible for ensuring that the
* callbacks have enough lifetime to handle the response.
*/
virtual void detach() PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this is very confusing. I don't really understand what we are clearing and why, and we are also not clearing response callbacks. What is the case that this is protecting against? Per your other PR can the lifetimes just be fixed externally in a more clear way?

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is most to address the send-and-forget case. We send an async side request and never cancel it even the downstream main request is ended.

This will clear the downstream stream info, side stream water mark callbacks, parent span and so on to ensure after the in-flight request cannot access these because they maybe dangling pointers/references.

But the response callbacks will be kept by design then the caller could do some cleanup or operations after the response. And when send-and-forget mode is used, the caller self should ensure the response callbacks' lifetime, and it is pretty easy to provide a long living response callbacks.


Per your other PR can the lifetimes just be fixed externally in a more clear way?

It's pretty hard to do that for these. The lifetime of stream info (and the parent span and so on) are bound with the downstream main request. To longer their lifetime is super huge refactor because the stream info and parent span (and so on) will also store raw pointer or reference to other objects.
This is also why we cannot let the developer to resolve the lifetime problem by themselves. It's hard even for us.

Do a cleanup (to these potential dangling references) with a detach() would much simpler. I can try to update the comment make it more clear.

Signed-off-by: WangBaiping <wbphub@gmail.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Sep 11, 2025

/retest

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Sep 12, 2025

Friendly ping again cc @mattklein123

Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

return AsyncStreamImpl::streamInfo();
}

void AsyncRequestImpl::detach() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semantically the same as the AsyncStreamImpl::waitForRemoteCloseAndDelete() It should be possible to just call the waitForRemoteCloseAndDelete since it handles stuck requests with a timeout.

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov I have checked the waitForRemoteCloseAndDelete() and they are different actually for now. The waitForRemoteCloseAndDelete() skipped some of cleanup logic (but the timeout is a good design). And part of this PR also resolve some lifetime problem like metadata criteria in route.

But I agree they have similar goal and should could be merged with another PR.

WDYT?

@mattklein123
Copy link
Copy Markdown
Member

OK this makes sense to me with the improved comment, but will defer to @yanavlasov who is more recent in the code.

@mattklein123 mattklein123 removed their assignment Sep 14, 2025
Signed-off-by: WangBaiping <wbphub@gmail.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Sep 15, 2025

/retest

1 similar comment
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Sep 16, 2025

/retest

Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

absl::nullopt, getHitAddend());
callbacks_->streamInfo(), getHitAddend());
// If the limit() call fails directly then the detach() will be no-op.
shared_client->detach();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the callbacks_ pointer in the shared_client become dangling at this point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. The callback pointer has as long lifetime as the whole request. It will clean up itself when the request is complete (no matter success or failure).

Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

// when calling the limit() function. To make sure we can call the detach() function
// safely, we convert the client_ to a shared_ptr.

std::shared_ptr<Filters::Common::RateLimit::Client> shared_client = std::move(client_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this dance with the shared_ptr though. It looks like the OnStreamDoneCallBack will be the only owner. Is it to call the detach() after the limit call?

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's possible the client may be deleted directly once the request failed directly and we have no way to know it out of the client. Keep a shared pointer to help us could always keep correct behavior. If the request failed directly, the detach() do nothing, and if the request still here, the detach() will do our expected cleanup.

@yanavlasov yanavlasov merged commit b314c6b into envoyproxy:main Sep 24, 2025
25 checks passed
@wbpcode wbpcode deleted the dev-resolve-lifetime-problem-of-limit branch September 24, 2025 02:16
@brightbyte
Copy link
Copy Markdown

Thank you very much @wbpcode!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

variable in initial_metadata do not get evaluated in rate_limit_service when used with apply_on_stream_done

4 participants