Fix: RateLimit requests were not released when a streaming generation exception occurred#11540
Merged
crazywoola merged 2 commits intolanggenius:mainfrom Dec 11, 2024
liuzhenghua:fix/rate-limit
Merged
Fix: RateLimit requests were not released when a streaming generation exception occurred#11540crazywoola merged 2 commits intolanggenius:mainfrom liuzhenghua:fix/rate-limit
crazywoola merged 2 commits intolanggenius:mainfrom
liuzhenghua:fix/rate-limit
Conversation
laipz8200
approved these changes
Dec 11, 2024
iamjoel
pushed a commit
that referenced
this pull request
Dec 16, 2024
… exception occurred (#11540)
AlwaysBluer
added a commit
to AlwaysBluer/dify
that referenced
this pull request
Dec 18, 2024
…m-vdb * 'lindorm-vdb' of github.com:AlwaysBluer/dify: Fix/pdf preview in build (langgenius#11621) feat(devcontainer): add alias to stop Docker containers (langgenius#11616) ci: better print version for ruff to check the change (langgenius#11587) feat(model): add vertex_ai Gemini 2.0 Flash Exp (langgenius#11604) fix: name of llama-3.3-70b-specdec (langgenius#11596) Added new models and Removed the deleted ones for Groq langgenius#11455 (langgenius#11456) [ref] use one method to get boto client for aws bedrock (langgenius#11506) chore: translate i18n files (langgenius#11577) fix: support mdx files close langgenius#11557 (langgenius#11565) fix: change workflow trace id (langgenius#11585) Feat: dark mode for logs and annotations (langgenius#11575) Lindorm vdb (langgenius#11574) feat: add gemini-2.0-flash-exp (langgenius#11570) fix: better opendal tests (langgenius#11569) Fix: RateLimit requests were not released when a streaming generation exception occurred (langgenius#11540) chore: translate i18n files (langgenius#11545) fix: workflow continue on error doc link (langgenius#11554)
jsincorporated
pushed a commit
to jsincorporated/asaAi
that referenced
this pull request
Jul 8, 2025
… exception occurred (langgenius#11540)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When you set the APP_MAX_ACTIVE_REQUESTS environment variable and call the streaming interface, an exception can cause the request count to not be released, eventually leading to the assistant being rate-limited due to an excessive number of active requests.
For example, when calling the chat-messages interface, setting the query parameter to an empty string and the response_mode to streaming can reproduce this issue.
Tip
Close issue syntax:
Fixes #<issue number>orResolves #<issue number>, see documentation for more details.Screenshots
Checklist
Important
Please review the checklist below before submitting your pull request.
dev/reformat(backend) andcd web && npx lint-staged(frontend) to appease the lint gods