Skip to content

(aws-autoscaling): machine ami id not caching to cdk.context.json #12484

@andreialecu

Description

@andreialecu

We are running a cdk deploy '*' step as part of CI.

Even if there are no stack changes whatsover, every 10 days or so, cdk starts recreating an autoscaling group, thus breaking the ECS cluster and results in downtime until everything is built again:

Here are relevant logs:

Stack ARN:
arn:aws:cloudformation:*********:<...>:stack/App-AnalyticsApi-ECR-Production/39030a30-03d5-11eb-9455-02ea080ae8b9
App-AnalyticsEcsCluster2-Production
App-AnalyticsEcsCluster2-Production: deploying...
App-AnalyticsEcsCluster2-Production: creating CloudFormation changeset...
 0/3 | 9:02:13 PM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | App-AnalyticsEcsCluster2-Production User Initiated
 1/3 | 9:02:19 PM | UPDATE_IN_PROGRESS   | AWS::AutoScaling::LaunchConfiguration | Cluster/Scaling/LaunchConfig (ClusterScalingLaunchConfig3E9D5827) Requested update requires the creation of a new physical resource; hence creating one.
 1/3 | 9:02:20 PM | UPDATE_IN_PROGRESS   | AWS::AutoScaling::LaunchConfiguration | Cluster/Scaling/LaunchConfig (ClusterScalingLaunchConfig3E9D5827) Resource creation Initiated
 1/3 | 9:02:20 PM | UPDATE_COMPLETE      | AWS::AutoScaling::LaunchConfiguration | Cluster/Scaling/LaunchConfig (ClusterScalingLaunchConfig3E9D5827) 
 2/3 | 9:02:24 PM | UPDATE_IN_PROGRESS   | AWS::AutoScaling::AutoScalingGroup    | Cluster/Scaling/ASG (ClusterScalingASGE8638730) Requested update requires the creation of a new physical resource; hence creating one.
 2/3 | 9:02:24 PM | UPDATE_IN_PROGRESS   | AWS::AutoScaling::AutoScalingGroup    | Cluster/Scaling/ASG (ClusterScalingASGE8638730) Resource creation Initiated
 2/3 | 9:02:25 PM | UPDATE_COMPLETE      | AWS::AutoScaling::AutoScalingGroup    | Cluster/Scaling/ASG (ClusterScalingASGE8638730) 
 2/3 | 9:02:27 PM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack            | App-AnalyticsEcsCluster2-Production 
 2/3 | 9:02:28 PM | DELETE_IN_PROGRESS   | AWS::AutoScaling::AutoScalingGroup    | Cluster/Scaling/ASG (ClusterScalingASGE8638730) 
2/3 Currently in progress: App-AnalyticsEcsCluster2-Production, ClusterScalingASGE8638730
 1/3 | 9:06:02 PM | DELETE_COMPLETE      | AWS::AutoScaling::AutoScalingGroup    | Cluster/Scaling/ASG (ClusterScalingASGE8638730) 
 1/3 | 9:06:03 PM | DELETE_IN_PROGRESS   | AWS::AutoScaling::LaunchConfiguration | Cluster/Scaling/LaunchConfig (ClusterScalingLaunchConfig3E9D5827) 
 1/3 | 9:06:03 PM | DELETE_COMPLETE      | AWS::AutoScaling::LaunchConfiguration | Cluster/Scaling/LaunchConfig (ClusterScalingLaunchConfig3E9D5827) 
 1/3 | 9:06:04 PM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack            | App-AnalyticsEcsCluster2-Production 

 ✅  App-AnalyticsEcsCluster2-Production

It's always the same output.

Note that the cluster uses spot instances, which may be relevant.

Reproduction Steps

Potentially relevant code stack code:

import * as cdk from "@aws-cdk/core";
import * as ecs from "@aws-cdk/aws-ecs";
import * as ec2 from "@aws-cdk/aws-ec2";
import { VpcStack } from "./vpc-stack";

export class EcsClusterStack extends cdk.Stack {
  cluster: ecs.Cluster;
  constructor(
    scope: cdk.App,
    id: string,
    props: cdk.StackProps,
    {
      vpcStack,
      instanceType,
      spotPrice,
      subnetType,
    }: {
      vpcStack: VpcStack;
      instanceType: ec2.InstanceType;
      spotPrice: string;
      subnetType?: ec2.SubnetType;
    }
  ) {
    super(scope, id, props);

    // Create an ECS cluster
    this.cluster = new ecs.Cluster(this, "Cluster", {
      vpc: vpcStack.vpc,
    });

    // Add capacity to it
    this.cluster.addCapacity("Scaling", {
      maxCapacity: 2,
      minCapacity: 2,
      //desiredCapacity: 2,
      taskDrainTime: cdk.Duration.seconds(0),
      vpcSubnets: { subnetType },
      instanceType,
      spotPrice,
      // Enable the Automated Spot Draining support for Amazon ECS
      spotInstanceDraining: true,
    });
  }
}

What did you expect to happen?

No stack updates to be issued. No downtime.

What actually happened?

Stack is updated unnecessarily. Downtime for 10+ minutes.

Environment

  • CDK CLI Version : 1.71.0
  • Framework Version: 1.71.0

Other

It happens on CircleCI, but also when running locally, after not being ran for a while. I believe it is around 10 days, but I'm not completely sure.


This is 🐛 Bug Report

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.documentationThis is a problem with documentation.p1

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions