Skip to content

[CI] Permission denied while trying to connect to the docker API #172427

@kwen2501

Description

@kwen2501

The docker issue below starts on Jan 13, 2026.

It impact Symmetric Memory tests on H100 instances.

For example:
linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm / test (h100-symm-mem, 1, 1, linux.aws.h100.4)

Login Succeeded
++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
+ IMAGE_SIZE=15249.531907081604
+ echo 'Compressed size of image in MB: 15249.531907081604'
+ set -e
+ docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
Compressed size of image in MB: 15249.531907081604
+ retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
+ docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
permission denied while trying to connect to the docker API at unix:///var/run/docker.sock
+ sleep 1
+ docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
permission denied while trying to connect to the docker API at unix:///var/run/docker.sock
+ sleep 2
+ docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-bfafaf0e518ed45e732eaef14740ce84d523e20a
permission denied while trying to connect to the docker API at unix:///var/run/docker.sock
Error: Process completed with exit code 1.

cc @seemethere @malfet @pytorch/pytorch-dev-infra @atalman @fduwjj @dzmitry-huba @fegin

Metadata

Metadata

Assignees

Labels

module: ciRelated to continuous integrationmodule: infraRelates to CI infrastructuretriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions