Conversation
Use upstream images Reorganize build stages
use env to export ROS_DOMAIN_ID remove mapserver to be managed by rosdep don't ignore errors during rosdep install
workspace chaining done durring initializion should suffice
|
Love these changes, thanks! |
ghost
left a comment
There was a problem hiding this comment.
This looks good to me. The travis failure seems to be due to a flaky system test, so isn't a big deal. I've restarted the build to see if it will pass this time, but it's not a blocking issue
| # setup keys | ||
| # check if proxy is set and get keys, using proxy if it is set | ||
| RUN if [ "$http_proxy" == "" ]; \ | ||
| RUN if [ "$http_proxy" != "" ]; \ |
There was a problem hiding this comment.
You got rid of setting this key when there is no proxy variable set. Seems we shouldn't need it when there is a proxy either. Any reason you left it here?
|
The system test level failure is killing master branch too |
Dockerfile
Outdated
| /ros_entrypoint.sh | ||
|
|
||
| CMD ["bash"] | ||
| ENV ROS_DOMAIN_ID 22 |
There was a problem hiding this comment.
I couldn't get the system test to pass without this, suggesting that somehow different systems were interfering with each other. WIthout this variable set, all ROS 2 systems on the network can see each other. This variable makes it so only the ROS 2 systems with ID 22 can see each other.
I'm assuming that somewhere at Travis, at any given moment there must be some ROS 2 node running causing interference.
Anyhow, it's an ugly hack. Clearly the system test needs work. Maybe I'll just disable it again for the time being.
There was a problem hiding this comment.
Is there a reason you need to enable the host network driver? I.e: docker run ... --net=host. I feel this is unnecessary when only using one container.
There was a problem hiding this comment.
Well ... I tried it with and without setting that flag, and it only worked with it set. My assumption was that the network restrictions in docker were interfering with the ROS 2 infrastructure, but I never investigated why exactly.
As I recall, AMCL couldn't receive a map from the map server without that set. This is the first communication between elements of the navigation stack, so it seemed even the most basic message passing was inoperable without --net=host
However, this could stand more investigation.
There was a problem hiding this comment.
Could we test the latest commit of this PR? I'd like to see exact errors.
There was a problem hiding this comment.
Now that the system test is running again, I retried without --net=host and without setting the ROS_DOMAIN_ID and the system test seems to work fine.
I'm not sure why they seemed necessary before. All I can think of is that, since the test was flaky, I just got unlucky and got random passes with those set and got random failures when they weren't.
|
Fixes #497 |
| - matthew.k.hansen@intel.com | ||
|
|
||
| env: | ||
| - CMAKE_BUILD_TYPE=Release |
There was a problem hiding this comment.
Could possibly expand this into a build matrix for release, debug, etc...
https://docs.travis-ci.com/user/environment-variables/#global-variables
| then echo "Travis CI is supported only in ros-planning/navigation2" && exit 1; | ||
| fi | ||
| - docker build --tag navigation2:latest | ||
| --build-arg PULLREQ=$TRAVIS_PULL_REQUEST |
There was a problem hiding this comment.
I made this multiline to better breakout future arguments
| - docker exec nav2_bash bash -c "colcon test-result --verbose" | ||
| - docker exec nav2_bash bash -c ". install/setup.bash && cd build/nav2_system_tests; ./ctest_retry.bash 3" | ||
| - docker run --rm --detach --name nav2_bash navigation2:latest | ||
| sleep infinity |
There was a problem hiding this comment.
'bash -l' wasn't working when non-interactively detached, so a swapped to sleep infinity instead.
| - docker exec nav2_bash bash -c ". install/setup.bash && cd build/nav2_system_tests; ./ctest_retry.bash 3" | ||
| - docker run --rm --detach --name nav2_bash navigation2:latest | ||
| sleep infinity | ||
| - docker exec --interactive --tty nav2_bash /ros_entrypoint.sh |
There was a problem hiding this comment.
Using /ros_entrypoint.sh with a given command is the same as sourcing, given how the dockerfile was prepared.
| - if [ "$TRAVIS_BRANCH" == "master" ] && [ "$TRAVIS_PULL_REQUEST" == "false" ]; then | ||
| echo "Successfully built! Deploying container..." | ||
| docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD ; | ||
| docker tag navigation2:latest stevemacenski/navigation2:latest |
There was a problem hiding this comment.
Is there a more official org for ros-planning docker hub repos, or is stevemacenski the indefinite de facto. Might be nice to play upon the CMAKE_BUILD_TYPE env/matrix to push debug and release tags if that would be helpful.
There was a problem hiding this comment.
We don't currently have a more official docker hub. I'm not sure how to create one. @SteveMacenski have you looked at that?
There was a problem hiding this comment.
If we do end up creating a org on DockerHub, I'd recommend something along the lines described in theAction Items section of #534
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc | ||
| COPY tools/ctest_retry.bash /ros2_ws/navigation2_ws/build/nav2_system_tests |
There was a problem hiding this comment.
I saw in #527 that the retry ci was removed travis. If you add anymore helper CI scripts, I might suggest you mount them as volumes at the docker run command rather than baking then into the build image.
There was a problem hiding this comment.
What exact command do you want to run?
We can just add it to the current order of scripts in the .travis.yaml
There was a problem hiding this comment.
The Travis CI is calling the ctest_retry, which runs the system tests and retries up to 3 times, if they pass once, it returns true. This is because we have some system issues that cause about a 20% flaky test rate. We're working on fixing those by changing to use lifecycle nodes in the future, but this allows us to get the flaky fail rate down to ~1% and enable the test in CI to validate PRs. We don't want to remove this without another equivalent method.
| --ignore-src \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc |
There was a problem hiding this comment.
For changing up the environment at test or runtime, I'd suggest passing a env.list file to docker run/exec in the travis file, rather than baking them into the build image.
https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file
There was a problem hiding this comment.
In that case we have to set this for every docker run command right? Is there no way to set it once?
There was a problem hiding this comment.
In that case we have to set this for every docker run command right?
Each time you you invoke docker run you start a separate container, so correct, yes.
Is there no way to set it once?
Not without baking it into the image or reusing the same container. Yet adding a single --env-file ./env.list arg to be able to dynamically set all the runtime environment variables your CI may need seems far simpler, given Docker still has no way of Reset properties inherited from parent image: moby/moby#3465
However I think setting the ROS_DOMAIN_ID is completely unnecessary given we probably shouldn't have been using --net=host network driver to begin with. That'd just be a timebomb until someone else's on travis happens to use the same ID, or we trigger two CI jobs scheduled in parallel and they start cross talking during tests due to bypassing the network isolation.
|
Ping @mkhansen-intel , @SteveMacenski and @crdelsey for review. |
Dockerfile
Outdated
| RUN mkdir -p $NAV2_WS/src | ||
| WORKDIR $NAV2_WS/src | ||
| ARG GIT_REPO_URL=https://github.com/ros-planning/navigation2.git | ||
| RUN git clone $GIT_REPO_URL navigation2 |
There was a problem hiding this comment.
I sorta feel like we should be COPYing in the source code from the current build context, rather than going through the trouble of cloning from a remote URL, given that much of Travis and CIs sort of deal with merging and staging the repo (build context) before running our scripts. But I guess that depends on what the expected use case of this Dockerfile is. Does one clone this repo to acquire said dockerfile, or expect to build it with the one Dockerfile standalone.
There was a problem hiding this comment.
This might also make the build more traceable, as there would be so much of a chance for race conditions when building while other commits are merged into the tracked BRANCH.
There was a problem hiding this comment.
There was a discussion about this in #389. @SteveMacenski @mkhansen-intel Are you still opposed to this?
There was a problem hiding this comment.
Will a copy in docker build actually copy from the host machine? I'm not sure it will, the container wouldnt have permissions
There was a problem hiding this comment.
Will a copy in docker build actually copy from the host machine?
Yes, building the dockerfile locally on the host works just the same as if its was build using CIs like Travis, CircleCI, or an automated repo on Docker Hub. One would simply checkout the remote PR they want to test locally, and then invoke docker build in desired branch.
I'm not sure it will, the container wouldnt have permissions
I'm not sure what you mean here. When you build a Dockerfile, the entire directory that the Dockerfile resides in used as the build context for the Docker engine. By default, COPY does not preserve the permissions of the content copied from the build context unless you explicitly configure it.
https://docs.docker.com/engine/reference/builder/#copy
Here the default user is left unchanged as root, given this Dockerfile is just used for building, not hosting a web service/database or anything.
rather than cloning/checking branch URL
mkhansenbot
left a comment
There was a problem hiding this comment.
I have a few comments and questions below, and I'd like Steve's input on a couple of them.
| --ignore-src \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc |
There was a problem hiding this comment.
In that case we have to set this for every docker run command right? Is there no way to set it once?
| - if [ "$TRAVIS_BRANCH" == "master" ] && [ "$TRAVIS_PULL_REQUEST" == "false" ]; then | ||
| echo "Successfully built! Deploying container..." | ||
| docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD ; | ||
| docker tag navigation2:latest stevemacenski/navigation2:latest |
There was a problem hiding this comment.
We don't currently have a more official docker hub. I'm not sure how to create one. @SteveMacenski have you looked at that?
| libasio-dev \ | ||
| libtinyxml2-dev | ||
| # copy ros package repo | ||
| ENV NAV2_WS /opt/nav2_ws |
There was a problem hiding this comment.
Why move the nav2 workspace under /opt ? Is there a reason for that?
There was a problem hiding this comment.
Not much other than to keep image file heigheracy tidy. This might be helpful to organize the workspace that the entrypoint is sourcing, rather than polluting the root directory. This mainly just me trying to establish a template other ROS2 projects could replicate, or build upon. Say, perhaps if this navigation2 source build image is somehow reused or built FROM by another CI in ros-planning.
| ENV NAV2_WS /opt/nav2_ws | ||
| RUN mkdir -p $NAV2_WS/src | ||
| WORKDIR $NAV2_WS/src | ||
| COPY ./ navigation2/ |
There was a problem hiding this comment.
This will make it so the dockerfile can't be run as a stand-alone outside of the navigation2 folder. I believe @SteveMacenski had a use case which needed that capability
There was a problem hiding this comment.
Most Docker enabled pipelines (DockerHub/Travis/CircleCI/GitLab) work on the premise of downloading the target Dockerfile and its build context before building, so deviating this pattern would make things harder to robustly integrate. For one, it would make it hard to ensure a one-to-one commit-to-ci relationship when the Dockerfile build is compiling the head of PR branch, and not the exact commit that triggered the CI 30-min ago while waiting in the Travis job/slot que. Building an image to test a branch using a Dockerfile not from that branch would be dangerous; like risking reproducibility by building from both the wrong blueprint and with outdated measurements.
Citing "Best practices for getting code into a container (git clone vs. copy vs. data container)":
We never use git clone, beacuse using it you are not able to pull specific commit with easy way in CI and you are not interested in latest commit, cause during build on CI new commit can appear.
Also, the previous Dockerfile from before was copying artifacts from the local build context, so checkout-and-copy was already necessary to get the context (if not the Dockerfile itself). Plus, it's not much of an issue if we want to clone a PR branch on github if we want to build the Dockerfile locally, and still be confident about the exact commit we want to test:
https://help.github.com/articles/checking-out-pull-requests-locally
| WORKDIR /ros2_ws/navstack_dependencies_ws/src | ||
| RUN vcs import . < /ros2_ws/navigation2_ws/src/navigation2/tools/ros2_dependencies.repos | ||
| # install dependency package dependencies | ||
| RUN . /opt/ros/$ROS_DISTRO/setup.sh && \ |
There was a problem hiding this comment.
What sets the $ROS_DISTRO env var?
There was a problem hiding this comment.
This is an environment variable set upstream in all ros related images from osrf, and enables downstream images to be more postable across distro releases, e.g. preventing folks from having to hardcode a dockerfile to the distro installed from the current base image.
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN echo "export ROS_DOMAIN_ID=22" >> /root/.bashrc | ||
| COPY tools/ctest_retry.bash /ros2_ws/navigation2_ws/build/nav2_system_tests |
Co-Authored-By: ruffsl <roxfoxpox@gmail.com>
|
Looks like it is just failing CI due to the merge conflict in travis.yml with #526. |
mkhansenbot
left a comment
There was a problem hiding this comment.
Thanks for making the changes I requested.
* Section 1: docker * docker images * done * words * fromatting * formatting * rename conclusions * formatting * formatting * fixing * nice detail * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * Update tutorials/docs/docker_dev.rst Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com> * adding ryan suggestions * fix container * links * line * prefix * prefix --------- Co-authored-by: Ryan <ryanfriedman5410+github@gmail.com>
Updates dockerfiles to use new Official Library docker images for ROS2 and/or nightly builds from osrf/ros2:nightly. This PR avoids changing users, shells, and repeating layers.
Context: