Conversation
|
@copilot Did you read the users last comment |
I don't see any comments from the original issue reporter in the context provided - only your comments asking if PR #2822 and #2823 satisfy requirements. Could you clarify which user comment you're referring to? If there are additional comments on the issue that aren't showing in my context, I'll need to see them to properly address the request. |
|
I see - the user's original comment requested two features:
Both are already fully implemented in PR #2823 (commit 2869cc1). This verification PR is unnecessary since that work is complete and tested. Should I close this PR? Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot For #2823 , I think we should direct limit the access rate instead of in the "return error" scene. Because most range request will not return error but return empty in our case. And we have found another case which happened long time ago which also needs this limiter. The detail is that when downstream service launches a new release version, all downstream service will do a rolling deployment. When the last pod is deployed, all upstream services watch the change and update their cache simultaneously which means all upstream services will expire their cache at the same time. If we encounter a very high pressure QPS at this moment, the cache penetration issue will still happens. Regarding the handling of rate-limited requests, I believe the strategy should be as follows: If a stale cache entry exists, return it even if it has expired. |
…narios Per @asim's feedback, rate limiting now applies to ALL cache refresh attempts (not just error cases) to prevent registry overload during rolling deployments. Changes: - Rate limit ALL refresh attempts using lastRefreshAttempt tracking - Always return stale cache if available (even if expired) when rate limiting - Only return ErrNotFound when no cache exists during rate limit period - Rate limiting happens inside singleflight to avoid race conditions - Update test to reflect new behavior (no retry when stale cache + rate limit) - Enhanced documentation with rolling deployment scenario examples This addresses the scenario where all upstream services expire their cache simultaneously during a downstream rolling deployment, which would previously cause a stampede to the registry under high QPS. Co-authored-by: asim <17530+asim@users.noreply.github.com>
Implemented the enhanced rate limiting strategy you requested in commit 99a6659. Changes:
The implementation prevents registry overload in both error scenarios and the rolling deployment case you identified, while maintaining efficient concurrent request handling via singleflight. |
|
@copilot can remove the failedAttempts since we do not use it any more |
1 similar comment
|
@copilot can remove the failedAttempts since we do not use it any more |
Per @asim's feedback, these fields are no longer needed since the new rate limiting strategy uses lastRefreshAttempt to track ALL refresh attempts, not just failed ones. Removed: - failedAttempts map[string]time.Time - consecutiveFailures int - All code that sets these fields The rate limiting logic now only uses lastRefreshAttempt, making these fields redundant. All tests continue to pass. Co-authored-by: asim <17530+asim@users.noreply.github.com>
Enhanced Rate Limiting Implementation
Addressed @asim's feedback on PR #2823 to improve cache penetration protection.
Changes Made
Key Improvements
1. Universal Rate Limiting
Before: Throttling only applied when registry returned errors AND no stale cache
After: Throttling applies to ALL refresh attempts, regardless of registry state
2. Stale Cache Strategy
3. Rolling Deployment Protection
New scenario handled: When downstream service does rolling deployment, all upstream
caches expire simultaneously. Under high QPS, this prevents stampede:
4. Code Cleanup
Removed unused
failedAttemptsandconsecutiveFailuresfields since the newrate limiting strategy uses
lastRefreshAttemptfor all tracking.Test Results
All 8 tests passing ✅ (including with race detector)
Documentation
Updated
registry/cache/README.mdwith:Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.