-
Notifications
You must be signed in to change notification settings - Fork 4.5k
(cdk-assets): osx cannot run identical docker login commands in parallel #20116
Description
Describe the bug
Per @brianfrantz in #19367 (comment):
It appears that this change has exposed a race condition when publishing multiple ECS images from a Mac. It is causing our deployments to fail with:
[83%] fail: docker login --username AWS --password-stdin https://285872988623.dkr.ecr.us-east-1.amazonaws.com exited with error code 1: Error saving credentials: error storing credentials - err: exit status 1, out: The specified item already exists in the keychain.It appears that this issue is caused by a combination of the CDK performing a “docker login” on each published asset, along with specific logic in the docker project for storing credentials in the OSX keychain.
The source code for docker-credentials-osxkeychain doesn’t update existing OSX keychain entries if present. Instead it always does a SecKeychainAddInternetPassword call, which fails if the entry is already in the keychain. That error isn’t caught and handled. Instead the code attempts to avoid this issue by deleting the entry in the keychain before each add attempt. This logic works fine for the normal use case of doing a single login and then making multiple authenticated docker calls (single-threaded or in parallel). In fact, the logic has been in place for five-plus years.
Unfortunately, the CDK calls “docker login” before each “docker push” call instead of once for the batch. Two publish calls started at the same time will (most of the time) result in a credentials sequence of “delete, delete, add, add” such that the second add will fail because the first add created the entry.
The best method for fixing this issue in the CDK would involve refactoring the publishing code to perform a single “docker login” for the batch of container assets. Unfortunately this would be a significant redesign as the logic is shared between lambda assets and container assets.
Another option might be to catch the failed login and try again after a small delay, with or without detecting the specific error message. This doesn’t feel super clean, but is less effort and is less likely to cause side effects. That said, while it should work fine for small numbers of container assets, it may become problematic when there are many publish calls all logging in and retrying all at once.
Expected Behavior
docker login should be run once per ecr repository any docker image asset is being pushed to.
Current Behavior
It runs docker login for every push, concurrently, and this causes the login command to fail.
Reproduction Steps
app.ts
import { App, Stack } from 'aws-cdk-lib';
import { ContainerImage, FargateTaskDefinition } from 'aws-cdk-lib/aws-ecs';
const devEnv = {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION,
};
const app = new App();
const stack = new Stack(app, 'cdk-concurrency-bug', { env: devEnv });
for (let i = 0; i < 30; i++) {
const task = new FargateTaskDefinition(stack, `Task${i}`, {});
task.addContainer('asset', {
image: ContainerImage.fromAsset(__dirname, {
buildArgs: {
VAR: `var-${i}`,
},
}),
});
}
app.synth();Dockerfile
FROM debian:11-slim
ARG VAR=foo
WORKDIR /app
RUN echo $VAR >var.txt
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.21.1
Framework Version
2.21.1
Node.js Version
14.19.1
OS
OSX
Language
Typescript
Language Version
No response
Other information
No response