fix(logs): LogRetention resources fail with rate exceeded errors#26858
Merged
mergify[bot] merged 4 commits intomainfrom Aug 24, 2023
Merged
fix(logs): LogRetention resources fail with rate exceeded errors#26858mergify[bot] merged 4 commits intomainfrom
mergify[bot] merged 4 commits intomainfrom
Conversation
rix0rrr
reviewed
Aug 23, 2023
packages/aws-cdk-lib/cloud-assembly-schema/lib/integ-tests/commands/deploy.ts
Show resolved
Hide resolved
Collaborator
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
rix0rrr
approved these changes
Aug 24, 2023
Contributor
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The LogRetention Custom Resource used to be able to handle server-side throttling, when a lot of requests to the CloudWatch Logs service are made at the same time.
Handling of this error case got lost during the migration to SDK v3.
If we have (read: a lot)
LogRetentionCustom Resources in a single Stack, CloudFormation apparently applies some internal breaks to the amount of parallelism. For example it appears that resources are batched in smaller groups that need to be completed before the next group is provisioned. And within the groups there appears to be a ever so slight delay between individual resources. Together this is enough to avoid rate limiting in most circumstances.Therefore, in practice this issues only occurs when multiple stacks are deployed in parallel.
To test this scenario, I have added support for
integ-runnerto deploy all stacks of a test case concurrently.Support for arbitrary command args already existed, but needed to explicitly include the
concurrencyoption.I then create an integration test that deploys 3 stacks à 25 LogRetention resources.
This triggers the error cases described in #26837.
The fix itself is twofold:
maxRetriesprop value to the SDK client to increase the number of attempts of the SDK internal throttling handling. But also enforce a minimum for these retries since they might catch additional retryable failures that our custom outer loop does not account for.ThrottlingExceptionerrors in the outer retry loop.Closes #26837
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license