Skip to content

(cdk-assets): osx cannot run identical docker login commands in parallel #20116

@misterjoshua

Description

@misterjoshua

Describe the bug

Per @brianfrantz in #19367 (comment):

It appears that this change has exposed a race condition when publishing multiple ECS images from a Mac. It is causing our deployments to fail with:

[83%] fail: docker login --username AWS --password-stdin https://285872988623.dkr.ecr.us-east-1.amazonaws.com exited with error code 1: Error saving credentials: error storing credentials - err: exit status 1, out: The specified item already exists in the keychain.

It appears that this issue is caused by a combination of the CDK performing a “docker login” on each published asset, along with specific logic in the docker project for storing credentials in the OSX keychain.

The source code for docker-credentials-osxkeychain doesn’t update existing OSX keychain entries if present. Instead it always does a SecKeychainAddInternetPassword call, which fails if the entry is already in the keychain. That error isn’t caught and handled. Instead the code attempts to avoid this issue by deleting the entry in the keychain before each add attempt. This logic works fine for the normal use case of doing a single login and then making multiple authenticated docker calls (single-threaded or in parallel). In fact, the logic has been in place for five-plus years.

Unfortunately, the CDK calls “docker login” before each “docker push” call instead of once for the batch. Two publish calls started at the same time will (most of the time) result in a credentials sequence of “delete, delete, add, add” such that the second add will fail because the first add created the entry.

The best method for fixing this issue in the CDK would involve refactoring the publishing code to perform a single “docker login” for the batch of container assets. Unfortunately this would be a significant redesign as the logic is shared between lambda assets and container assets.

Another option might be to catch the failed login and try again after a small delay, with or without detecting the specific error message. This doesn’t feel super clean, but is less effort and is less likely to cause side effects. That said, while it should work fine for small numbers of container assets, it may become problematic when there are many publish calls all logging in and retrying all at once.

Expected Behavior

docker login should be run once per ecr repository any docker image asset is being pushed to.

Current Behavior

It runs docker login for every push, concurrently, and this causes the login command to fail.

Reproduction Steps

app.ts

import { App, Stack } from 'aws-cdk-lib';
import { ContainerImage, FargateTaskDefinition } from 'aws-cdk-lib/aws-ecs';

const devEnv = {
  account: process.env.CDK_DEFAULT_ACCOUNT,
  region: process.env.CDK_DEFAULT_REGION,
};

const app = new App();

const stack = new Stack(app, 'cdk-concurrency-bug', { env: devEnv });

for (let i = 0; i < 30; i++) {
  const task = new FargateTaskDefinition(stack, `Task${i}`, {});
  task.addContainer('asset', {
    image: ContainerImage.fromAsset(__dirname, {
      buildArgs: {
        VAR: `var-${i}`,
      },
    }),
  });
}

app.synth();

Dockerfile

FROM debian:11-slim
ARG VAR=foo

WORKDIR /app
RUN echo $VAR >var.txt

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.21.1

Framework Version

2.21.1

Node.js Version

14.19.1

OS

OSX

Language

Typescript

Language Version

No response

Other information

No response

Metadata

Metadata

Assignees

Labels

@aws-cdk/assetsRelated to the @aws-cdk/assets packagebugThis issue is a bug.needs-triageThis issue or PR still needs to be triaged.p1

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions