Skip to content

http ratelimit: option to reduce budget on stream done #37548

Merged
wbpcode merged 30 commits intoenvoyproxy:mainfrom
mathetake:actionapplyondone
Dec 19, 2024
Merged

http ratelimit: option to reduce budget on stream done #37548
wbpcode merged 30 commits intoenvoyproxy:mainfrom
mathetake:actionapplyondone

Conversation

@mathetake
Copy link
Copy Markdown
Member

@mathetake mathetake commented Dec 6, 2024

Commit Message: ratelimit: option to excute action on stream done

Additional Description:
This adds a new option apply_on_stream_done to the rate limit
policy corresponding to each descriptor. This basically allows to configure
descriptors to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via metadata per descriptor,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.

This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".

Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".

Ref: envoyproxy/gateway#4756

Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #37548 was opened by mathetake.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #37548 was opened by mathetake.

see: more, trace.

@mathetake mathetake marked this pull request as ready for review December 6, 2024 22:10
@mathetake
Copy link
Copy Markdown
Member Author

mathetake commented Dec 6, 2024

i guess the impl can be a bit large, so I might do that in separate PRs - anyways will think about it after the API gets approved

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Dec 7, 2024

wow, we have a similar requirement internally and I finally figured out a similar way. It is super surprised and happy to see this.

@mathetake
Copy link
Copy Markdown
Member Author

cool glad to hear that you came to the similar idea!

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
…both for clarity

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
…nd for future extension

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake
Copy link
Copy Markdown
Member Author

@wbpcode thank you for the valuable feedback offline! I think I will go ahead and try implementing the idea - i don't think the change won't be that huge

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake changed the title http ratelimit: option to excute action on stream done ratelimit: option to excute action on stream done Dec 9, 2024
@mathetake

This comment was marked as outdated.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake changed the title ratelimit: option to excute action on stream done http ratelimit: option to excute action on stream done Dec 9, 2024
@mathetake

This comment was marked as outdated.

@mathetake mathetake changed the title http ratelimit: option to excute action on stream done http ratelimit: option to reduce budget on stream done Dec 9, 2024
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake requested a review from wbpcode December 18, 2024 00:48
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. It's much better now. Some more comments are added. And please keep the route and route specific config in the filter at the begin of the requet. Or the route refreshment may results in that the encoding phase has different configuration with the decoding phase.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake
Copy link
Copy Markdown
Member Author

meanwhile i am working on the integration tests now ... some cases are failing

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution.

@wbpcode wbpcode enabled auto-merge (squash) December 18, 2024 11:51
@wbpcode wbpcode disabled auto-merge December 18, 2024 11:55
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Dec 18, 2024

cc @mattklein123 cc @tyxia for any additional comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants