Skip to content

fix(logs): log retention fails with OperationAbortedException#16083

Merged
mergify[bot] merged 4 commits intoaws:masterfrom
ddl-denis-parnovskiy:fix-15709
Sep 10, 2021
Merged

fix(logs): log retention fails with OperationAbortedException#16083
mergify[bot] merged 4 commits intoaws:masterfrom
ddl-denis-parnovskiy:fix-15709

Conversation

@ddl-denis-parnovskiy
Copy link
Copy Markdown
Contributor

@ddl-denis-parnovskiy ddl-denis-parnovskiy commented Aug 17, 2021

Fixes: #15709

When creating a lambda with log retention, CDK actually creates 2 lambda functions. The second lambda function alters log retention of the log group of the first lambda and the retention of its own log group.

Because log group creation is asynchronous, the log retention lambda tries to pre-create both log groups to guarantee it has an object to work on.

If a normal lambda execution also creates the related log group at the same time, an "OperationAbortedException:... Please retry" error is returned.

The existing code handles this situation for log retention lambda but not for the first lambda.

This fix adds the retry pattern to the general log group creation code.

Also existing code had a bug: if OperationAbortedException is hit, the error is hidden but the retention policy is skipped and not actually applied. This fix addresses this bug as well.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Fixes #15709

Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
@gitpod-io
Copy link
Copy Markdown

gitpod-io bot commented Aug 17, 2021

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 17, 2021

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in the old code, if CreateLogGroupSafe throws an error, setRetentionPolicy is skipped.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code:
CreateLogGroupSafe encounters OperationAbortedException, throws exception, setRetentionPolicy is skipped, the error is eaten by catch all.
Result: retention policy for this function is not set and no error. This is a bug in the old code I'm talking about.

New code:
CreateLogGroupSafe encounters OperationAbortedException, retries, if really fails after several attempts, then the error is floated up the stack.
Result: either log group is created with limited retention, or an exception is thrown.

@ddl-denis-parnovskiy ddl-denis-parnovskiy changed the title Fix for aws#15709 race when creating log groups fix: aws#15709 race when creating log groups Aug 17, 2021
@peterwoodworth peterwoodworth changed the title fix: aws#15709 race when creating log groups fix: race when creating log groups Aug 19, 2021
@peterwoodworth peterwoodworth added @aws-cdk/aws-logs Related to Amazon CloudWatch Logs effort/small Small work item – less than a day of effort p1 labels Aug 19, 2021
@peterwoodworth peterwoodworth changed the title fix: race when creating log groups fix(logs): race when creating log groups Aug 19, 2021
nija-at
nija-at previously requested changes Sep 6, 2021
Copy link
Copy Markdown
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this pull request. Please see some of my comments below.

try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

@nija-at nija-at assigned nija-at and unassigned rix0rrr Sep 6, 2021
@nija-at nija-at changed the title fix(logs): race when creating log groups fix(logs): log retention fails with OperationAbortedException Sep 6, 2021
Fixes #15709

Include OperationAbortedException into retryableErrors to handle race of "another create operation is in progress" when the target lambda creates the log group and this function also tries to create the same log group at the same time.
@mergify mergify bot dismissed nija-at’s stale review September 8, 2021 17:52

Pull request has been modified.

Copy link
Copy Markdown
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. Looks good to me.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 10, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 64d2bb7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 3e9f04d into aws:master Sep 10, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 10, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@ddl-denis-parnovskiy ddl-denis-parnovskiy deleted the fix-15709 branch September 11, 2021 00:20
mergify bot pushed a commit that referenced this pull request Dec 10, 2021
Fixes: #17546

This adds to the fix in #16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group.

The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this pull request Feb 21, 2022
)

Fixes: aws#17546

This adds to the fix in aws#16083 that was addressing the issue where the LogRetention Lambda can be executed concurrently and create a race condition where multiple invocations are trying to create or modify the same log group.

The previous fix addressed the issue if it occurred during log group creation, in the `createLogGroupSafe` method, but did not account for the same problem happening when modifying a log group's retention period in the `setRetentionPolicy` method. This fix applies the same logic from the last fix to the other method.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@aws-cdk/aws-logs Related to Amazon CloudWatch Logs effort/small Small work item – less than a day of effort p1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(aws-logs): log retention update can race against the target lambda's log group creation resulting in OperationAbortedException

5 participants