Skip to content

[RELAND] [RELAND] .circleci: Improve docker image build workflow#38335

Closed
seemethere wants to merge 2 commits intopytorch:masterfrom
seemethere:reland_reland_rework_docker_ci_build_flow
Closed

[RELAND] [RELAND] .circleci: Improve docker image build workflow#38335
seemethere wants to merge 2 commits intopytorch:masterfrom
seemethere:reland_reland_rework_docker_ci_build_flow

Conversation

@seemethere
Copy link
Copy Markdown
Member

@seemethere seemethere commented May 12, 2020

This reverts commit 6e66e85.

Two things learned from the previous reland:

  • cirlceci-agent step halt doesn't actually halt the step in place, you must explicitly exit the step after the step halt is called
  • Even though circleci uses git to checkout repositories inside of docker images, that does not mean git is available after the fact.
Changes from previous reland
commit cc99a12c9029472bd73325876bc0e9dbb1746b05
Author: Eli Uriegas <eliuriegas@fb.com>
Date:   Tue May 12 10:58:18 2020 -0700

    .cirlceci: Install git for gc, exit step explicitly

    Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 481d7889da..856a0fb10a 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -2018,13 +2018,15 @@ jobs:
               export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
               eval $(aws ecr get-login --no-include-email --region us-east-1)
               set -x
+              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # Check if image already exists, if it does then skip building it
               if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then
                 circleci-agent step halt
+                # circleci-agent step halt doesn't actually halt the step so we need to
+                # explicitly exit the step here ourselves before it causes too much trouble
+                exit 0
               fi
-              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # If no image exists but the hash is the same as the previous hash then we should error out here
-              # no stampeding herd effect plz.
               if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then
                 echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
                 echo "       contact the PyTorch team to restore the original images"
diff --git a/.circleci/ecr_gc_docker/Dockerfile b/.circleci/ecr_gc_docker/Dockerfile
index d0198acb86..36347d5e6d 100644
--- a/.circleci/ecr_gc_docker/Dockerfile
+++ b/.circleci/ecr_gc_docker/Dockerfile
@@ -1,6 +1,6 @@
 FROM ubuntu:16.04

-RUN apt-get update && apt-get install -y python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log
+RUN apt-get update && apt-get install -y git python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

 ADD requirements.txt /requirements.txt

diff --git a/.circleci/verbatim-sources/docker_jobs.yml b/.circleci/verbatim-sources/docker_jobs.yml
index e04d11c5cd..3918cc04ae 100644
--- a/.circleci/verbatim-sources/docker_jobs.yml
+++ b/.circleci/verbatim-sources/docker_jobs.yml
@@ -35,13 +35,15 @@
               export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
               eval $(aws ecr get-login --no-include-email --region us-east-1)
               set -x
+              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # Check if image already exists, if it does then skip building it
               if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then
                 circleci-agent step halt
+                # circleci-agent step halt doesn't actually halt the step so we need to
+                # explicitly exit the step here ourselves before it causes too much trouble
+                exit 0
               fi
-              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # If no image exists but the hash is the same as the previous hash then we should error out here
-              # no stampeding herd effect plz.
               if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then
                 echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
                 echo "       contact the PyTorch team to restore the original images"

…ocker image build workflow"

This reverts commit 6e66e85.
@seemethere seemethere requested review from ezyang and malfet May 12, 2020 17:59
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented May 12, 2020

💊 CI failures summary and remediations

As of commit edbdc85 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 10 times.

@seemethere seemethere force-pushed the reland_reland_rework_docker_ci_build_flow branch from 6c50de3 to cc99a12 Compare May 12, 2020 18:00
@malfet
Copy link
Copy Markdown
Contributor

malfet commented May 12, 2020

Can you please pull this one into pytorch/ci-all/docker_ci_build_flow branch to test it against all jobs that we have?

@seemethere
Copy link
Copy Markdown
Member Author

seemethere commented May 12, 2020

Running ci-all here: #38339, no need to look at the results there, they should just show up here since they're the same commit

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
@seemethere seemethere force-pushed the reland_reland_rework_docker_ci_build_flow branch from cc99a12 to edbdc85 Compare May 12, 2020 20:30
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seemethere is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@seemethere merged this pull request in cdf4d42.

facebook-github-bot pushed a commit that referenced this pull request May 14, 2020
Summary:
Previous attempts to get this right:
* #38335
* #38279
* #37976

This tag kept getting deleted before the docker image ci workflow could
be merged causing it to have upstream breakages.

It'd be best to make sure the garbage collector just doesnt garbage
collect it.

This is a pre-step to merge #38484

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: #38483

Differential Revision: D21577359

Pulled By: seemethere

fbshipit-source-id: c4e0709bd8fff8f24a988b60eaa9f8c01576ef2f
facebook-github-bot pushed a commit that referenced this pull request May 15, 2020
Summary:
closes #37855

Relies on #38483

Previous attempts to get this right:
* #38335
* #38279
* #37976

This reverts commit 8063960.

Improves the docker image build workflow from many steps to basically
transparent from a user's perspective.

To update docker images now all one has to do is edit the
.circleci/docker folder and it will update automatically and also
dynamically add the tags to the list of tags to keep from the garbage
collector.

Adding a new image will currently stay the same but we can explore doing
that dynamically as well.

How the build workflow works:
  - Docker tags are determined by the hash defined from git for the
    .circleci/docker sub-directory (extracted using git rev-parse)
  - Images are only built if the computed hash is not found in ecr and
    the hash is different than the previously computed hash. The
    previously computed hash is found using the same process as before
    but subbing out HEAD for the merge base between HEAD and the base
    git revision
  - That tag is then passed through the jobs using a shared workspace
    which is added to downstream jobs using the circleci ${BASH_ENV}

How the new garbage collection works:
  - Tags to keep are generated by stepping through all of the commits in
    in the .circleci/docker subdirectory

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: #38484

Differential Revision: D21585458

Pulled By: seemethere

fbshipit-source-id: 37792a1e0f5e5531438c4ae61507639c133aa76d
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…orch#38335)

Summary:
This reverts commit 401ffb0.

Two things learned from the previous reland:
* `cirlceci-agent step halt` doesn't actually halt the step in place, you must explicitly exit the step after the `step halt` is called
* Even though `circleci` uses `git` to checkout repositories inside of docker images, that does not mean `git` is available after the fact.

<details>
<summary> Changes from previous reland </summary>

```patch
commit cc99a12
Author: Eli Uriegas <eliuriegas@fb.com>
Date:   Tue May 12 10:58:18 2020 -0700

    .cirlceci: Install git for gc, exit step explicitly

    Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

 diff --git a/.circleci/config.yml b/.circleci/config.yml
index 481d788..856a0fb 100644
 --- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -2018,13 +2018,15 @@ jobs:
               export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
               eval $(aws ecr get-login --no-include-email --region us-east-1)
               set -x
+              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # Check if image already exists, if it does then skip building it
               if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then
                 circleci-agent step halt
+                # circleci-agent step halt doesn't actually halt the step so we need to
+                # explicitly exit the step here ourselves before it causes too much trouble
+                exit 0
               fi
-              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # If no image exists but the hash is the same as the previous hash then we should error out here
-              # no stampeding herd effect plz.
               if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then
                 echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
                 echo "       contact the PyTorch team to restore the original images"
 diff --git a/.circleci/ecr_gc_docker/Dockerfile b/.circleci/ecr_gc_docker/Dockerfile
index d0198ac..36347d5 100644
 --- a/.circleci/ecr_gc_docker/Dockerfile
+++ b/.circleci/ecr_gc_docker/Dockerfile
@@ -1,6 +1,6 @@
 FROM ubuntu:16.04

-RUN apt-get update && apt-get install -y python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log
+RUN apt-get update && apt-get install -y git python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

 ADD requirements.txt /requirements.txt

 diff --git a/.circleci/verbatim-sources/docker_jobs.yml b/.circleci/verbatim-sources/docker_jobs.yml
index e04d11c..3918cc0 100644
 --- a/.circleci/verbatim-sources/docker_jobs.yml
+++ b/.circleci/verbatim-sources/docker_jobs.yml
@@ -35,13 +35,15 @@
               export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
               eval $(aws ecr get-login --no-include-email --region us-east-1)
               set -x
+              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # Check if image already exists, if it does then skip building it
               if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then
                 circleci-agent step halt
+                # circleci-agent step halt doesn't actually halt the step so we need to
+                # explicitly exit the step here ourselves before it causes too much trouble
+                exit 0
               fi
-              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
               # If no image exists but the hash is the same as the previous hash then we should error out here
-              # no stampeding herd effect plz.
               if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then
                 echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
                 echo "       contact the PyTorch team to restore the original images"

```

</details>
Pull Request resolved: pytorch#38335

Differential Revision: D21536269

Pulled By: seemethere

fbshipit-source-id: 5577f84fa49dd6e1e88fce461646fd68be3d417d
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Previous attempts to get this right:
* pytorch#38335
* pytorch#38279
* pytorch#37976

This tag kept getting deleted before the docker image ci workflow could
be merged causing it to have upstream breakages.

It'd be best to make sure the garbage collector just doesnt garbage
collect it.

This is a pre-step to merge pytorch#38484

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: pytorch#38483

Differential Revision: D21577359

Pulled By: seemethere

fbshipit-source-id: c4e0709bd8fff8f24a988b60eaa9f8c01576ef2f
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
closes pytorch#37855

Relies on pytorch#38483

Previous attempts to get this right:
* pytorch#38335
* pytorch#38279
* pytorch#37976

This reverts commit 46bc26c.

Improves the docker image build workflow from many steps to basically
transparent from a user's perspective.

To update docker images now all one has to do is edit the
.circleci/docker folder and it will update automatically and also
dynamically add the tags to the list of tags to keep from the garbage
collector.

Adding a new image will currently stay the same but we can explore doing
that dynamically as well.

How the build workflow works:
  - Docker tags are determined by the hash defined from git for the
    .circleci/docker sub-directory (extracted using git rev-parse)
  - Images are only built if the computed hash is not found in ecr and
    the hash is different than the previously computed hash. The
    previously computed hash is found using the same process as before
    but subbing out HEAD for the merge base between HEAD and the base
    git revision
  - That tag is then passed through the jobs using a shared workspace
    which is added to downstream jobs using the circleci ${BASH_ENV}

How the new garbage collection works:
  - Tags to keep are generated by stepping through all of the commits in
    in the .circleci/docker subdirectory

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: pytorch#38484

Differential Revision: D21585458

Pulled By: seemethere

fbshipit-source-id: 37792a1e0f5e5531438c4ae61507639c133aa76d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants