Skip to content

cli: ecs service hotswap deployment does not wait for deployment to complete #27882

@tomwwright

Description

@tomwwright

Describe the bug

ECS hotswap deployments report success immediately, instead of waiting for the deployment to succeed (or fail).

Example showing complete deploy time of 16s:

> pnpm exec cdk deploy --hotswap --exclusively my-cool-stack

✨  Synthesis time: 4.43s

⚠️ The --hotswap and --hotswap-fallback flags deliberately introduce CloudFormation drift to speed up deployments
⚠️ They should only be used for development - never use them for your production Stacks!

my-cool-stack:  start: Building 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  success: Built 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  start: Publishing 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  success: Published 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack: deploying... [1/1]

✨ hotswapping resources:
   ✨ ECS Task Definition 'my-cool-stack-api'
   ✨ ECS Service 'my-cool-stack-backendServiceC9D5DD77-jJXtgE5oL9az'
   ✨ ECS Task Definition 'my-cool-stack-frontend'
   ✨ ECS Service 'my-cool-stack-frontendService12C63704-yOwzQjJgpvjX'
✨ ECS Task Definition 'my-cool-stack-frontend' hotswapped!
✨ ECS Service 'my-cool-stack-frontendService12C63704-yOwzQjJgpvjX' hotswapped!
✨ ECS Task Definition 'my-cool-stack-api' hotswapped!
✨ ECS Service 'my-cool-stack-backendServiceC9D5DD77-jJXtgE5oL9az' hotswapped!

 ✅  my-cool-stack

✨  Deployment time: 12.54s

Stack ARN:
xxx

✨  Total time: 16.96s

I note this behaviour looks to have been the same since the hotswap was initially implemented and so any users of this feature might expect that it is behaving as expected

Expected Behavior

I expect that the CDK hotswap deployment monitors the state of the triggered deployment via the DescribeServices API to ensure it completes successfully before continuing

Current Behavior

Currently the CDK pushes the ECS hotswap deployment and then immediately reports it as a success and continues.

The CDK does set up a custom waiter to await the successful deployment but the success acceptor is configured as the expression:

length(services[].deployments[? status == 'PRIMARY' && runningCount < desiredCount][]) == `0`

This doesn't wait correctly as the new PRIMARY deployment is first created with an intermediate state of runningCount: 0 and desiredCount: 0. It is then populated correctly with a desired and pending count as the scheduler gets to work. But in that initial zero state runningCount < desiredCount is false and therefore the waiter matches on it for success and continues.

Reproduction Steps

Perform any ECS hotswap deployment

Possible Solution

The following waiter acceptor expression should more accurately interrogate the DescribeServices state. I can raise a PR if we agree this is an issue that needs to be fixed.

length(services[].deployments[? status == 'PRIMARY' && rolloutState == 'COMPLETED'][]) == `1`

Additional Information/Context

Running this command I observed the following deployment state changes:

watch -n 1 aws ecs describe-services --cluster $cluster --services $service --query 'services[].deployments'

New deployment created in "zero" state

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 0,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Deployment gets correct counts

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 1,
            "runningCount": 0,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Deployment launches new task successfully, previous deployment scaled down

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Previous deployment scaled down, moves into DRAINING state

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "DRAINING",
            ....
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 0,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Previous deployment removed

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        }
    ]
]

New deployment completed

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 completed."
        }
    ]
]

CDK CLI Version

2.103.0 (build d0d7547)

Framework Version

No response

Node.js Version

18.16.0

OS

MacOS

Language

TypeScript

Language Version

4.9.5

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.effort/mediumMedium work item – several days of effortp1package/toolsRelated to AWS CDK Tools or CLI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions