http ratelimit: option to reduce budget on stream done #37548
http ratelimit: option to reduce budget on stream done #37548wbpcode merged 30 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto
Outdated
Show resolved
Hide resolved
|
i guess the impl can be a bit large, so I might do that in separate PRs - anyways will think about it after the API gets approved |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
wow, we have a similar requirement internally and I finally figured out a similar way. It is super surprised and happy to see this. |
|
cool glad to hear that you came to the similar idea! |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
…both for clarity Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
…nd for future extension Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
@wbpcode thank you for the valuable feedback offline! I think I will go ahead and try implementing the idea - i don't think the change won't be that huge |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto
Outdated
Show resolved
Hide resolved
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
wbpcode
left a comment
There was a problem hiding this comment.
Thanks for the update. It's much better now. Some more comments are added. And please keep the route and route specific config in the filter at the begin of the requet. Or the route refreshment may results in that the encoding phase has different configuration with the decoding phase.
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
meanwhile i am working on the integration tests now ... some cases are failing |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
wbpcode
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the contribution.
|
cc @mattklein123 cc @tyxia for any additional comments. |
Commit Message: ratelimit: option to excute action on stream done
Additional Description:
This adds a new option
apply_on_stream_doneto the rate limitpolicy corresponding to each descriptor. This basically allows to configure
descriptors to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via metadata per descriptor,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.
This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".
Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".
Ref: envoyproxy/gateway#4756
Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a