Skip to content

feat(core): context lookup errors are reported to CX app#3772

Merged
rix0rrr merged 12 commits intomasterfrom
huijbers/context-report-error
Sep 20, 2019
Merged

feat(core): context lookup errors are reported to CX app#3772
rix0rrr merged 12 commits intomasterfrom
huijbers/context-report-error

Conversation

@rix0rrr
Copy link
Copy Markdown
Contributor

@rix0rrr rix0rrr commented Aug 23, 2019

Instead of the toolkit failing and stopping when a context provider
lookup fails, the error is reported to the CDK app, which can then
decide what to do with it.

In practice, it will attach a (localized) error to the Construct tree.
This has the following advantages:

  • Dependent context resolution can proceed in a stepwise fashion,
    without the toolkit stopping at the first failure. For example,
    a VPC lookup from a value found in SSM will work.
  • Stacks in multiple accounts that need context lookup can be used
    in a single app with cold context. Without this change they couldn't,
    because you could only have credentials for a single account at a
    time available so lookup for one of the accounts was bound to fail.

Fixes #3654.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Instead of the toolkit failing and stopping when a context provider
lookup fails, the error is reported to the CDK app, which can then
decide what to do with it.

In practice, it will attach a (localized) error to the Construct tree.
This has the following advantages:

* Dependent context resolution can proceed in a stepwise fashion,
  without the toolkit stopping at the first failure. For example,
  a VPC lookup from a value found in SSM will work.
* Stacks in multiple accounts that need context lookup can be used
  in a single app with cold context. Without this change they couldn't,
  because you could only have credentials for a single account at a
  time available so lookup for one of the accounts was bound to fail.

Fixes #3654.
@rix0rrr
Copy link
Copy Markdown
Contributor Author

rix0rrr commented Aug 23, 2019

TODO: tests and fixing build

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 23, 2019

Pull Request Checklist

  • Testing
  • Unit test added (prefer to add a new test rather than modify existing tests)
  • CLI change? Re-run/add CLI integration tests
  • Documentation
  • Inline docs: make sure all public APIs are documented (copy & paste from official AWS docs)
  • README: update module README
  • Design: for significant features, follow the design process
  • Title uses the format type(scope): text
  • Type: fix, feat, refactor go into CHANGELOG, chore is hidden
  • Scope: name of the module without the aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
  • Style: use all lower-case, do not end with a period
  • Description
  • Rationale: describe rationale of change and approach taken
  • Issues: Indicate issues fixed via: fixes #xxx or closes #xxx
  • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Sensitive Modules (requires 2 PR approvers)
  • IAM document library (in @aws-cdk/aws-iam)
  • EC2 security groups and ACLs (in @aws-cdk/aws-ec2)
  • Grant APIs (if not based on official documentation with a reference)

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 23, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 26, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 26, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

@rix0rrr
Copy link
Copy Markdown
Contributor Author

rix0rrr commented Aug 26, 2019

Before I lock in the error and nosave protocols using tests I want a second pair of eyes on this. No point spending effort locking down this behavior if other people are going to make me change it.

Some motivation/explanation:

  • I want to attach some metadata to context values. The protocol will be that if the value is an object and it has certain magic keys, it will be treated differently. If the value is not an object or does not have the magic keys, it will be treated as a regular, un-annotated value. For now, we don't need anything other than reporting errors, so we don't need to transport a value AND an annotation to the CX app. If we wanted to do that, we would add a key like _contextValue which would contain the actual value. For now, no need for that.
  • _dontSaveContext is purely CLI-internal: it's a protocol between context providers and the toolkit itself, and is an indication that the value should not be persisted between runs (if we don't add this, errors would never be tried to be looked up again upon subsequent runs, which is incorrect behavior). It happens to be transmitted to the CX app (simply because it is not being actively ignored anywhere) but is unused there.
  • _providerError is interpreted by context providers as the error that occurred during lookup and attached as metadata.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 26, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

* - The request does not have versioning yet, only the response.
*/
export const CLOUD_ASSEMBLY_VERSION = '0.36.0';
export const CLOUD_ASSEMBLY_VERSION = '1.6.0';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly means you will have to get a newer CLI version to use this functionality. Is that a breaking change? Does that mean we can never evolve the CX protocol ever again?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this constitutes a breaking change as long as there's a good error (and there should be)

*/
function stripTransientValues(obj: any) {
const ret: any = {};
for (const [key, value] of Object.entries(obj)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj might be null or undefined, in which case Object.entries will raise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It couldn't really, but I think you're telling me to shore up the input type definition.

eladb
eladb previously requested changes Aug 26, 2019
* - The request does not have versioning yet, only the response.
*/
export const CLOUD_ASSEMBLY_VERSION = '0.36.0';
export const CLOUD_ASSEMBLY_VERSION = '1.6.0';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this constitutes a breaking change as long as there's a good error (and there should be)

if (semver.lt(frameworkVersion, toolkitVersion)) {
throw new Error(
`CDK CLI can only be used with apps created by CDK >= ${CLOUD_ASSEMBLY_VERSION}`);
`CDK CLI can only be used with apps created by CDK >= ${toolkitVersion} (got ${frameworkVersion})`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frameworkVersion and toolkitVersion do not represent the CDK version but rather the manifest version. The addition of "got ${x}" will be misleading (it will report the manifest version of the toolkit/framework instead of the actual version). I would recommend not adding it.

Copy link
Copy Markdown
Contributor Author

@rix0rrr rix0rrr Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We purposely used CDK version numbers as manifest version numbers so that we can generate error messages like this. The alternatives would be:

# no information
The assembly is too new, get a newer CLI.

# information that's hard to map to software versions
The assembly has version 5, but this CLI can only read assembly version 3. Get a newer CLI.

Both of which are horrible in terms of what a user should do about it, or what versions they should be looking for.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reservation is only with the got ${frameworkVersion}. It will be confusing because it will not print the framework version but rather the cloud assembly version that the framework presents. Either we should find some other phrasing or avoid this part of the message. The first part is useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean. Agreed.

}

public static getValue(scope: Construct, options: GetContextValueOptions): any {
public static getValue(scope: Construct, options: GetContextValueOptions): GetContextValueResult {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why did we need to change the return type here. Seems like it doesn't provide any benefit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with getKey().

const value = await provider.getValue(missingContext.props);
let value;
try {
value = await provider.getValue(missingContext.props);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am missing something here. Why not just set the key to undefined and let the system stabilize itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, because we couldn't tell the difference between a value that hasn't been looked up yet, and a value that we tried to look up but the lookup failed.

Second, because the point is to catch the error and transport it to the CX.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the response. Why do we need to tell the difference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I used to tell the difference between failed lookups and untried lookups, that's no longer the case. But we still need to transport the error to the CX, so it can attach a useful error message to the construct tree.

@rix0rrr rix0rrr self-assigned this Aug 26, 2019
@rix0rrr
Copy link
Copy Markdown
Contributor Author

rix0rrr commented Sep 3, 2019

Since I didn't get any substantial feedback on the approach, I'm going to assume you all are okay with the direction of this change, and I will start adding unit tests now.

@mergify mergify bot dismissed eladb’s stale review September 3, 2019 08:32

Pull request has been modified.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 3, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 3, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 4, 2019

Codebuild (Continuous Integration) build failed for current commits. Please check log and resolve before PR is merged.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 13, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@rix0rrr rix0rrr requested a review from shivlaks as a code owner September 16, 2019 11:07
@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@rix0rrr rix0rrr added the pr/do-not-merge This PR should not be merged at this time. label Sep 16, 2019
@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@rix0rrr rix0rrr added the effort/medium Medium work item – several days of effort label Sep 18, 2019
@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@rix0rrr rix0rrr merged commit b0267e4 into master Sep 20, 2019
@rix0rrr rix0rrr deleted the huijbers/context-report-error branch September 20, 2019 09:14
rix0rrr added a commit that referenced this pull request Sep 23, 2019
Fix breakage introduced by #3782, which changed the wrong string.

Update other expectation to match change introduced in #3772.
eladb pushed a commit that referenced this pull request Sep 23, 2019
Instead of the toolkit failing and stopping when a context provider
lookup fails, the error is reported to the CDK app, which can then
decide what to do with it.

In practice, it will attach a (localized) error to the Construct tree.
This has the following advantages:

* Dependent context resolution can proceed in a stepwise fashion,
  without the toolkit stopping at the first failure. For example,
  a VPC lookup from a value found in SSM will work.
* Stacks in multiple accounts that need context lookup can be used
  in a single app with cold context. Without this change they couldn't,
  because you could only have credentials for a single account at a
  time available so lookup for one of the accounts was bound to fail.

Fixes #3654.
mergify bot pushed a commit that referenced this pull request Sep 24, 2019
* chore: fix CLI integration tests

Fix breakage introduced by #3782, which changed the wrong string.

Update other expectation to match change introduced in #3772.

* Use substitutions in error message

* Rewrite how CLI tests are being run

* Extract environment setup from tests

* Need to not ignore integ test Javascript files
@NGL321 NGL321 added the contribution/core This is a PR that came from AWS. label Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort pr/do-not-merge This PR should not be merged at this time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cdk.context.json: vpc lookup which depends on an ssm lookup fails

5 participants