Skip to content

fix(apigateway): stage creation fails due to cloudwatch log role not present#10377

Closed
jcrugzz wants to merge 5 commits intoaws:masterfrom
jcrugzz:fix-restapi-race-condition
Closed

fix(apigateway): stage creation fails due to cloudwatch log role not present#10377
jcrugzz wants to merge 5 commits intoaws:masterfrom
jcrugzz:fix-restapi-race-condition

Conversation

@jcrugzz
Copy link
Copy Markdown

@jcrugzz jcrugzz commented Sep 15, 2020

Stages in RestApi can be configured to have logs enabled.
These logs are sent to CloudWatch Logs using the IAM role
configured in AWS::ApiGateway::Account (via CfnAccount).
When this IAM role is not configured on the account, stage
creation will fail.

This change fixes a race condition when the Stage resource
is created before the Account resource, by adding an explicit
'depends on'.

fixes #10722


What I may need help with is how to go about testing this? I guess I can just check for the presence of a dependency that I added in the case i described?

Looking forward to any notes or feedback as I would like to get rid of the checked in node_modules in my repository.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@jcrugzz jcrugzz changed the title ---- Fix RestApi race condition Sep 15, 2020
@SomayaB SomayaB changed the title Fix RestApi race condition fix(apigateway): RestApi race condition Sep 15, 2020
@jcrugzz
Copy link
Copy Markdown
Author

jcrugzz commented Sep 17, 2020

@nija-at let me know whenever you have a minute to review :).

Copy link
Copy Markdown
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jcrugzz - thanks for submitting this PR.

However, I'm not sure I understand the problem.

Can you state the error message you get and explain why you think this is a race condition between the Stage resource and CfnAccount resource?

@jcrugzz
Copy link
Copy Markdown
Author

jcrugzz commented Sep 23, 2020

@nija-at I don't have the specific error since I fixed it in my local node_modules 5 weeks ago before I went on paternity leave. Let me see if it's still in the account itself to get you the exact error message but let me also try and explain below my understanding of the dependencies within the APIG Stack being created. I'll post my example CDK stack for more clarity below before I explain.

class ApigStack extends Stack {
  /**
   *
   * @param {cdk.Construct} scope
   * @param {string} id
   * @param {cdk.StackProps=} props
   */
  constructor(scope, id, { domainName, cert, definition, ...props }) {
    super(scope, id, props);
    // The code that defines your stack goes here
    const certificate = Certificate.fromCertificateArn(this, `cert-${domainName}`, cert.CertificateArn);

    this.api = new SpecRestApi(this, 'apig', {
      apiDefinition: ApiDefinition.fromInline(definition),
      endpointConfiguration: {
        types: [ EndpointType.REGIONAL ]
      },
      deployOptions: {
        stageName: 'production',
        metricsEnabled: true,
        loggingLevel: MethodLoggingLevel.INFO
      }
    });

    this.domain = new DomainName(this, 'domain', {
      domainName,
      certificate,
      endpointType: EndpointType.REGIONAL,
      securityPolicy: SecurityPolicy.TLS_1_2
    });

    this.domain.addBasePathMapping(this.api, { basePath: '', stage: this.api.deploymentStage });
  }
}

In this APIG stack example, we have metricsEnabled which will create the CfnAccount as well as the IAM roles in order to write metrics to cloudwatch. The Stage created within the APIG depends upon that CfnAccount and those IAM roles for it to be created due to the deployOptions I defined. The error that would occur is that the stage would fail to create because the IAM roles did not finish creating before the Stage was being created. The Stage within the APIG only depends upon the root APIG resource in the generated CFN template from this stack. The change I added ensures that the IAM role for cloudwatch that live under that CfnAccount, are created before the Stage that depends on them is created. Currently that dependency doesn't exist which causes the race condition I am expressing here.

I debugged this with cloudtrail originally with one of my coworkers and I was able to add the dependency locally and fix the problem. The problem would only occur with creating the IAM role and CfnAccount was slower than creating the APIG -> Deploy -> Stage since the resources were created independently and in parallel by CFN.

@nija-at
Copy link
Copy Markdown
Contributor

nija-at commented Sep 25, 2020

Thanks for your response @jcrugzz. I still have a couple of clarifications.

The Stage created within the APIG depends upon that CfnAccount and those IAM roles for it to be created due to the deployOptions I defined.

The change I added ensures that the IAM role for cloudwatch that live under that CfnAccount, are created before the Stage that depends on them is created

Why do you say Stage depends on CfnAccount? As far as I can tell, the only dependency is CfnAccount depending on the RestApi.

Which IAM roles are you referring to here? As far as I can tell, The IAM role used by CfnAccount is not used anywhere else.

Copy link
Copy Markdown
Contributor

@nija-at nija-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this to 'request changes' until a response for the above clarifications.

@jcrugzz
Copy link
Copy Markdown
Author

jcrugzz commented Sep 30, 2020

@nija-at Sorry language gets a little tricky here. In my first statement you pointed out the following and ill include your response

The Stage created within the APIG depends upon that CfnAccount and those IAM roles for it to be created due to the deployOptions I defined.

Why do you say Stage depends on CfnAccount? As far as I can tell, the only dependency is CfnAccount depending on the RestApi.

My change is what adds the explicit depends where Stage depends upon CfnAccount. What I learned through debugging is that in the current Cloudformation template generated by CDK, there is a race condition due to an implicit dependency that I am trying to make explicit with my changes. So when I said depends upon here, its that the actual sequence of creating the Stage component within APIGateway requires an IAM role in order to have cloudwatch logs enabled before the Stage can successfully be created. Prior to my changes, the IAM role for writing to cloudwatch wasn't guaranteed to exist before the Stage is attempted to be created within the Cloudformation template. I add a dependency of Stage -> CfnAccount since CfnAccount is the parent of the explicit IAM Role needed. Maybe its more correct to have Stage depend upon this role instead when cloudwatch logging is enabled? I just figured since it was consumed by the account that that was the correct construct to depend on.

Which IAM roles are you referring to here? As far as I can tell, The IAM role used by CfnAccount is not used anywhere else.

The IAM role linked above is what I am referring to and it is what is used by the Stage in an indirect way since the Stage has ownership over enabling cloudwatch logging within the various components within an APIGateway.

Please let me know if I didn't improve clarity here. I'm happy to make time to hop on a call to help hash this out if this is inadequate. I created 8 APIGateways before even seeing this issue for the first time and thought it was a red herring until I was able to understand the subtle dependency issue when it was persistent in happening in particular regions.

@mergify mergify bot dismissed nija-at’s stale review September 30, 2020 16:24

Pull request has been modified.

@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 721c8c7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@nija-at nija-at changed the title fix(apigateway): RestApi race condition fix(apigateway): stage creation fails due to cloudwatch log role not present Oct 2, 2020
@nija-at
Copy link
Copy Markdown
Contributor

nija-at commented Oct 2, 2020

Thanks @jcrugzz for the explanation. That explanation made it super clear.
I've gone ahead and re-written the commit title and message based on that. Hope that's ok.

return resource;
}

protected configureDeployment(props: RestApiOptions) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix currently only configures a 'depends on' for the auto-created stage. This will need to be extended to apply to all stages, including the one that users will create.

I would suggest changing to add this dependency within the constructor of Stage and retrieving the CfnAccount using an internal getter.
Take a look at some of the method in restapi.ts that are marked @internal to see how this can be done.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks will check into that.

@ChrisSargent
Copy link
Copy Markdown

This bug still seems to be present - is there anything we can help with to get it fixed / reviewed in more detail?

@nija-at
Copy link
Copy Markdown
Contributor

nija-at commented Jul 16, 2021

@ChrisSargent - please see the one comment I have above. Fixing it, re-syncing with the latest code and master and re-submitting the PR will move this forward.

@nija-at
Copy link
Copy Markdown
Contributor

nija-at commented Jul 19, 2021

Closing this old pull request. Please submit a new one when ready.

@nija-at nija-at closed this Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[apigateway] race condition between Stage and CfnAccount

5 participants