Skip to content

Update CI Dockerfiles#525

Merged
12 commits merged intoros-navigation:masterfrom
ruffsl:nightly
Jan 31, 2019
Merged

Update CI Dockerfiles#525
12 commits merged intoros-navigation:masterfrom
ruffsl:nightly

Conversation

@ruffsl
Copy link
Copy Markdown
Member

@ruffsl ruffsl commented Jan 18, 2019

Updates dockerfiles to use new Official Library docker images for ROS2 and/or nightly builds from osrf/ros2:nightly. This PR avoids changing users, shells, and repeating layers.

Context:

Use upstream images
Reorganize build stages
use env to export ROS_DOMAIN_ID
remove mapserver to be managed by rosdep
don't ignore errors during rosdep install
workspace chaining done durring initializion should suffice
@ruffsl ruffsl changed the title Update CI Dockerfiles WIP | Update CI Dockerfiles Jan 18, 2019
@SteveMacenski
Copy link
Copy Markdown
Member

Love these changes, thanks!

@mkhansenbot mkhansenbot requested review from a user, SteveMacenski and mkhansenbot January 18, 2019 16:34
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. The travis failure seems to be due to a flaky system test, so isn't a big deal. I've restarted the build to see if it will pass this time, but it's not a blocking issue

# setup keys
# check if proxy is set and get keys, using proxy if it is set
RUN if [ "$http_proxy" == "" ]; \
RUN if [ "$http_proxy" != "" ]; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got rid of setting this key when there is no proxy variable set. Seems we shouldn't need it when there is a proxy either. Any reason you left it here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back a build arg in 5b460ff

@SteveMacenski
Copy link
Copy Markdown
Member

The system test level failure is killing master branch too

Dockerfile Outdated
/ros_entrypoint.sh

CMD ["bash"]
ENV ROS_DOMAIN_ID 22
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this buy us?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get the system test to pass without this, suggesting that somehow different systems were interfering with each other. WIthout this variable set, all ROS 2 systems on the network can see each other. This variable makes it so only the ROS 2 systems with ID 22 can see each other.

I'm assuming that somewhere at Travis, at any given moment there must be some ROS 2 node running causing interference.

Anyhow, it's an ugly hack. Clearly the system test needs work. Maybe I'll just disable it again for the time being.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you need to enable the host network driver? I.e: docker run ... --net=host. I feel this is unnecessary when only using one container.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well ... I tried it with and without setting that flag, and it only worked with it set. My assumption was that the network restrictions in docker were interfering with the ROS 2 infrastructure, but I never investigated why exactly.

As I recall, AMCL couldn't receive a map from the map server without that set. This is the first communication between elements of the navigation stack, so it seemed even the most basic message passing was inoperable without --net=host

However, this could stand more investigation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we test the latest commit of this PR? I'd like to see exact errors.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the system test is running again, I retried without --net=host and without setting the ROS_DOMAIN_ID and the system test seems to work fine.

I'm not sure why they seemed necessary before. All I can think of is that, since the test was flaky, I just got unlucky and got random passes with those set and got random failures when they weren't.

@ghost
Copy link
Copy Markdown

ghost commented Jan 22, 2019

Fixes #497

- matthew.k.hansen@intel.com

env:
- CMAKE_BUILD_TYPE=Release
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly expand this into a build matrix for release, debug, etc...
https://docs.travis-ci.com/user/environment-variables/#global-variables

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

then echo "Travis CI is supported only in ros-planning/navigation2" && exit 1;
fi
- docker build --tag navigation2:latest
--build-arg PULLREQ=$TRAVIS_PULL_REQUEST
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this multiline to better breakout future arguments

- docker exec nav2_bash bash -c "colcon test-result --verbose"
- docker exec nav2_bash bash -c ". install/setup.bash && cd build/nav2_system_tests; ./ctest_retry.bash 3"
- docker run --rm --detach --name nav2_bash navigation2:latest
sleep infinity
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'bash -l' wasn't working when non-interactively detached, so a swapped to sleep infinity instead.

- docker exec nav2_bash bash -c ". install/setup.bash && cd build/nav2_system_tests; ./ctest_retry.bash 3"
- docker run --rm --detach --name nav2_bash navigation2:latest
sleep infinity
- docker exec --interactive --tty nav2_bash /ros_entrypoint.sh
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using /ros_entrypoint.sh with a given command is the same as sourcing, given how the dockerfile was prepared.

- if [ "$TRAVIS_BRANCH" == "master" ] && [ "$TRAVIS_PULL_REQUEST" == "false" ]; then
echo "Successfully built! Deploying container..."
docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD ;
docker tag navigation2:latest stevemacenski/navigation2:latest
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more official org for ros-planning docker hub repos, or is stevemacenski the indefinite de facto. Might be nice to play upon the CMAKE_BUILD_TYPE env/matrix to push debug and release tags if that would be helpful.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a more official docker hub. I'm not sure how to create one. @SteveMacenski have you looked at that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do end up creating a org on DockerHub, I'd recommend something along the lines described in theAction Items section of #534

&& rm -rf /var/lib/apt/lists/*

RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc
COPY tools/ctest_retry.bash /ros2_ws/navigation2_ws/build/nav2_system_tests
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in #527 that the retry ci was removed travis. If you add anymore helper CI scripts, I might suggest you mount them as volumes at the docker run command rather than baking then into the build image.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't merge #527 because we found a real issue that is being addressed in #538, so we DO want the system tests to stay enabled. If you have a better way to do this from the Travis file, please add it to the PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exact command do you want to run?
We can just add it to the current order of scripts in the .travis.yaml

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Travis CI is calling the ctest_retry, which runs the system tests and retries up to 3 times, if they pass once, it returns true. This is because we have some system issues that cause about a 20% flaky test rate. We're working on fixing those by changing to use lifecycle nodes in the future, but this allows us to get the flaky fail rate down to ~1% and enable the test in CI to validate PRs. We don't want to remove this without another equivalent method.

--ignore-src \
&& rm -rf /var/lib/apt/lists/*

RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For changing up the environment at test or runtime, I'd suggest passing a env.list file to docker run/exec in the travis file, rather than baking them into the build image.
https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we have to set this for every docker run command right? Is there no way to set it once?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we have to set this for every docker run command right?

Each time you you invoke docker run you start a separate container, so correct, yes.

Is there no way to set it once?

Not without baking it into the image or reusing the same container. Yet adding a single --env-file ./env.list arg to be able to dynamically set all the runtime environment variables your CI may need seems far simpler, given Docker still has no way of Reset properties inherited from parent image: moby/moby#3465

However I think setting the ROS_DOMAIN_ID is completely unnecessary given we probably shouldn't have been using --net=host network driver to begin with. That'd just be a timebomb until someone else's on travis happens to use the same ID, or we trigger two CI jobs scheduled in parallel and they start cross talking during tests due to bypassing the network isolation.

@ruffsl ruffsl changed the title WIP | Update CI Dockerfiles Update CI Dockerfiles Jan 22, 2019
@ruffsl
Copy link
Copy Markdown
Member Author

ruffsl commented Jan 22, 2019

Ping @mkhansen-intel , @SteveMacenski and @crdelsey for review.

Dockerfile Outdated
RUN mkdir -p $NAV2_WS/src
WORKDIR $NAV2_WS/src
ARG GIT_REPO_URL=https://github.com/ros-planning/navigation2.git
RUN git clone $GIT_REPO_URL navigation2
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sorta feel like we should be COPYing in the source code from the current build context, rather than going through the trouble of cloning from a remote URL, given that much of Travis and CIs sort of deal with merging and staging the repo (build context) before running our scripts. But I guess that depends on what the expected use case of this Dockerfile is. Does one clone this repo to acquire said dockerfile, or expect to build it with the one Dockerfile standalone.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might also make the build more traceable, as there would be so much of a chance for race conditions when building while other commits are merged into the tracked BRANCH.

Copy link
Copy Markdown

@ghost ghost Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion about this in #389. @SteveMacenski @mkhansen-intel Are you still opposed to this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a copy in docker build actually copy from the host machine? I'm not sure it will, the container wouldnt have permissions

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a copy in docker build actually copy from the host machine?

Yes, building the dockerfile locally on the host works just the same as if its was build using CIs like Travis, CircleCI, or an automated repo on Docker Hub. One would simply checkout the remote PR they want to test locally, and then invoke docker build in desired branch.

I'm not sure it will, the container wouldnt have permissions

I'm not sure what you mean here. When you build a Dockerfile, the entire directory that the Dockerfile resides in used as the build context for the Docker engine. By default, COPY does not preserve the permissions of the content copied from the build context unless you explicitly configure it.

https://docs.docker.com/engine/reference/builder/#copy

Here the default user is left unchanged as root, given this Dockerfile is just used for building, not hosting a web service/database or anything.

rather than cloning/checking branch URL
@ruffsl ruffsl mentioned this pull request Jan 24, 2019
12 tasks
Copy link
Copy Markdown
Collaborator

@mkhansenbot mkhansenbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments and questions below, and I'd like Steve's input on a couple of them.

--ignore-src \
&& rm -rf /var/lib/apt/lists/*

RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we have to set this for every docker run command right? Is there no way to set it once?

- if [ "$TRAVIS_BRANCH" == "master" ] && [ "$TRAVIS_PULL_REQUEST" == "false" ]; then
echo "Successfully built! Deploying container..."
docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD ;
docker tag navigation2:latest stevemacenski/navigation2:latest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a more official docker hub. I'm not sure how to create one. @SteveMacenski have you looked at that?

libasio-dev \
libtinyxml2-dev
# copy ros package repo
ENV NAV2_WS /opt/nav2_ws
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move the nav2 workspace under /opt ? Is there a reason for that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much other than to keep image file heigheracy tidy. This might be helpful to organize the workspace that the entrypoint is sourcing, rather than polluting the root directory. This mainly just me trying to establish a template other ROS2 projects could replicate, or build upon. Say, perhaps if this navigation2 source build image is somehow reused or built FROM by another CI in ros-planning.

ENV NAV2_WS /opt/nav2_ws
RUN mkdir -p $NAV2_WS/src
WORKDIR $NAV2_WS/src
COPY ./ navigation2/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it so the dockerfile can't be run as a stand-alone outside of the navigation2 folder. I believe @SteveMacenski had a use case which needed that capability

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most Docker enabled pipelines (DockerHub/Travis/CircleCI/GitLab) work on the premise of downloading the target Dockerfile and its build context before building, so deviating this pattern would make things harder to robustly integrate. For one, it would make it hard to ensure a one-to-one commit-to-ci relationship when the Dockerfile build is compiling the head of PR branch, and not the exact commit that triggered the CI 30-min ago while waiting in the Travis job/slot que. Building an image to test a branch using a Dockerfile not from that branch would be dangerous; like risking reproducibility by building from both the wrong blueprint and with outdated measurements.

Citing "Best practices for getting code into a container (git clone vs. copy vs. data container)":

We never use git clone, beacuse using it you are not able to pull specific commit with easy way in CI and you are not interested in latest commit, cause during build on CI new commit can appear.

https://forums.docker.com/t/best-practices-for-getting-code-into-a-container-git-clone-vs-copy-vs-data-container/4077/3

Also, the previous Dockerfile from before was copying artifacts from the local build context, so checkout-and-copy was already necessary to get the context (if not the Dockerfile itself). Plus, it's not much of an issue if we want to clone a PR branch on github if we want to build the Dockerfile locally, and still be confident about the exact commit we want to test:
https://help.github.com/articles/checking-out-pull-requests-locally

WORKDIR /ros2_ws/navstack_dependencies_ws/src
RUN vcs import . < /ros2_ws/navigation2_ws/src/navigation2/tools/ros2_dependencies.repos
# install dependency package dependencies
RUN . /opt/ros/$ROS_DISTRO/setup.sh && \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sets the $ROS_DISTRO env var?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an environment variable set upstream in all ros related images from osrf, and enables downstream images to be more postable across distro releases, e.g. preventing folks from having to hardcode a dockerfile to the distro installed from the current base image.

https://github.com/osrf/docker_images/blob/ea0fdabf250f816203d058dca1df12802a9c44cc/ros2/nightly/nightly/Dockerfile#L45

&& rm -rf /var/lib/apt/lists/*

RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc
COPY tools/ctest_retry.bash /ros2_ws/navigation2_ws/build/nav2_system_tests
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't merge #527 because we found a real issue that is being addressed in #538, so we DO want the system tests to stay enabled. If you have a better way to do this from the Travis file, please add it to the PR.

Co-Authored-By: ruffsl <roxfoxpox@gmail.com>
@ghost
Copy link
Copy Markdown

ghost commented Jan 31, 2019

Looks like it is just failing CI due to the merge conflict in travis.yml with #526.

Copy link
Copy Markdown
Collaborator

@mkhansenbot mkhansenbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes I requested.

@ghost ghost merged commit 626b160 into ros-navigation:master Jan 31, 2019
@ruffsl ruffsl deleted the nightly branch July 30, 2019 05:42
mini-1235 pushed a commit to mini-1235/navigation2 that referenced this pull request Feb 5, 2025
* Section 1: docker

* docker images

* done

* words

* fromatting

* formatting

* rename conclusions

* formatting

* formatting

* fixing

* nice detail

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* Update tutorials/docs/docker_dev.rst

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>

* adding ryan suggestions

* fix container

* links

* line

* prefix

* prefix

---------

Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants