[Docker] Make non-root image volume mount points writable by the isaaclab user#6082
Conversation
The base and cuRobo images run as the non-root isaaclab user (uid/gid 1000). A fresh Docker named volume inherits ownership from the image directory at its mount path on first mount; if that directory is missing or root-owned, the volume comes up root-owned and the isaaclab user cannot write it. This broke training in the Docker images: skrl train.py aborted with PermissionError creating logs/, and Isaac Sim emitted omni.datastore lock failures under kit/cache (failing the CI log blacklist). Pre-create every named-volume mount point that lives outside the runtime home and chown it to isaaclab before the USER switch, so fresh volumes inherit isaaclab ownership. Add a static test cross-checking the docker-compose volume targets against the Dockerfile mkdir/chown coverage so a newly added volume cannot regress this.
There was a problem hiding this comment.
🤖 IsaacLab Review Bot — PR #6082
[Docker] Make non-root image volume mount points writable by the isaaclab user
Summary
Well-structured fix for a real regression: fresh Docker named volumes were inheriting root:root ownership at mount points that either didn't exist or were root-owned in the image, breaking training for the non-root isaaclab user. The approach (pre-create + chown before USER switch) is the correct Docker idiom, and the static test suite adds valuable regression protection.
Findings
🔵 Suggestion — Layer consolidation opportunity
The new RUN mkdir -p ... && chown -R ... adds an image layer. Since it immediately follows the COPY --chown directive, and the earlier RUN chown -R isaaclab:isaaclab ${ISAACLAB_PATH} ${DOCKER_USER_HOME} is a separate layer, there's no opportunity to merge without disrupting the logical ordering. The current approach is correct — just noting that the layer is small (metadata-only directories) so build cache impact is negligible.
🔵 Suggestion — chown -R scope for kit/logs
The chown -R targets ${ISAACSIM_ROOT_PATH}/kit/logs (the parent) rather than the specific kit/logs/Kit/Isaac-Sim mount point. This is intentional and correct — it covers both the created path and any intermediate directories — but it does chown content that Isaac Sim may have placed under kit/logs/ at build time. Given that the Isaac Sim install is root-owned and this only affects the logs subtree, the risk is minimal. No action needed.
🔵 Suggestion — Dockerfile.curobo duplication (resolved)
The 20-line block is identical between Both Dockerfiles now use the same dynamic approach via Dockerfile.base and Dockerfile.curobo.volume_mounts.py, so the duplication concern is reduced to a small, identical invocation block rather than a hardcoded path list.
🟡 Warning — Test relies on YAML indentation parsing (resolved)
Fixed in baa7bb8 — now uses _compose_volume_targets() parses docker-compose.yaml with regex line-by-line rather than using a YAML parser.yaml.safe_load via the dedicated volume_mounts.py module.
✅ Architecture: Correct use of mkdir -p + chown -R before USER switch. All non-home volume targets from docker-compose.yaml are covered dynamically.
✅ Completeness: Cross-checked against docker-compose.yaml — all type: volume targets that live outside ${DOCKER_USER_HOME} are addressed.
✅ Test coverage: The parser module is tested directly; Dockerfile wiring is validated via string assertions; PyYAML absence is handled gracefully.
CI Status
- ✅ pre-commit, Build Wheel, Detect Changes — passed
- ⏳ Build Base Docker Image, Build cuRobo Docker Image — pending (expected for Dockerfile changes)
Verdict
Looks good. Clean, well-motivated fix with thorough regression tests. No blocking concerns.
Update (baa7bb8): The author addressed the previous review's 🟡 warning about fragile regex-based YAML parsing by introducing a proper parser module (docker/utils/volume_mounts.py) that uses yaml.safe_load. This is a significant improvement:
Changes in this push:
-
New
docker/utils/volume_mounts.py— A standalone module that parsesdocker-compose.yamlvia PyYAML, extracts alltype: volumetargets from thex-default-isaac-lab-volumesextension field, and resolves${VAR}references from the environment. Includes amain()entrypoint with validation (fails on empty results or unresolved variables). -
Dockerfiles refactored — Both
Dockerfile.baseandDockerfile.curobonow dynamically derive mount points by running the parser at build time:RUN set -o pipefail \ && export DOCKER_ISAACSIM_ROOT_PATH="${ISAACSIM_ROOT_PATH}" ... \ && dirs="$(... volume_mounts.py | grep '^/')" \ && test -n "${dirs}" \ && mkdir -p ${dirs} \ && chown -R isaaclab:isaaclab ${dirs}
The
set -o pipefail+test -nguards ensure the build aborts if parsing fails — no silent degradation. -
Test suite simplified — Removed ~100 lines of regex-based parsing/env-resolution logic. Tests now validate the parser module directly (
test_compose_volume_targets_parse,test_resolved_targets_are_absolute_paths) and verify each Dockerfile wires in the parser with the expected guards.
Assessment:
✅ Addresses 🟡 warning — YAML parsing is now done properly via PyYAML rather than regex.
✅ Single source of truth — docker-compose.yaml is parsed once by one module; no duplicated path lists.
✅ Defensive build — pipefail + grep '^/' + test -n is a solid guard chain.
✅ Test coverage — Both the parser and the Dockerfile wiring are tested; PyYAML absence gracefully skips parser unit tests via pytest.importorskip.
🔵 Minor note — The grep '^/' filter in the Dockerfile silently drops any target that doesn't resolve to an absolute path. Combined with volume_mounts.py's own unresolved-variable check on stderr, this is safe — but a build log won't show which paths were dropped if stderr isn't visible. Very minor; no action needed.
Verdict: Still looks good. This push meaningfully improves the architecture by eliminating path duplication and the fragile regex parsing. No new concerns. Recommend merge after Docker image build CI passes.
🤖 automated review by isaaclab-review-bot | commits: 87bdb54 → baa7bb8
Update (b17f0e3): Minor refinement to the build-time guard logic:
Changes in this push:
-
Removed
test -n "${dirs}"guard from bothDockerfile.baseandDockerfile.curobo— This is now redundant becausevolume_mounts.pyitself exits non-zero on an empty/unresolved list, andgrepexits non-zero if no lines match. Underset -o pipefail, either failure already aborts theRUNstep. -
Updated Dockerfile comments — Clarified that the safety net is
set -o pipefail+ the non-zero exits fromvolume_mounts.pyandgrep, rather than the explicittest -nguard. -
Test assertion updated —
test_dockerfile_prepares_volume_mounts_from_composenow assertsset -o pipefailpresence instead oftest -n "${dirs}", matching the simplified Dockerfile logic. -
Minor test hardening — Added
assert spec is not None and spec.loader is not Nonein_load_modulehelper to satisfy type checkers and make failures more explicit.
Assessment:
✅ Correct simplification — The test -n was belt-and-suspenders; removing it reduces noise without sacrificing safety since both upstream commands already fail on empty output under pipefail.
✅ Tests updated consistently — Assertions match the new guard mechanism.
✅ No new concerns.
Verdict: Still LGTM. Clean follow-up that removes redundancy. Ready for merge once Docker CI passes.
🤖 automated review by isaaclab-review-bot | commits: 87bdb54 → b17f0e3
Replace the hardcoded per-Dockerfile list of mount-point directories with a parse of docker-compose.yaml, the single source of truth for the named volumes. docker/utils/volume_mounts.py reads the volume targets; the base and cuRobo image builds invoke it to pre-create and chown each mount point to the isaaclab user before the USER switch. This removes the second list that had to be kept in sync with compose: a volume added to compose is now prepared automatically, with no Dockerfile edit. pipefail plus a non-empty guard abort the build if the parse yields nothing, so a parse failure can never silently ship an unprepared mount point. The regression test now unit-tests the parser (covering the directories that triggered the original failures) and asserts each non-root Dockerfile wires the parser in with the guard, instead of policing a duplicated list.
Greptile SummaryThis PR fixes a permission regression in the non-root Docker images where freshly mounted named volumes (seeded root-owned by the Docker daemon) were inaccessible to the
Confidence Score: 4/5Safe to merge; the change only affects image-build steps that run before USER isaaclab, with no impact on runtime behaviour or other images. The core mechanism is straightforward and well-tested statically. Both Dockerfiles already declare SHELL so pipefail is valid. The only observations are that _load_module does not guard against a None spec/loader, and the test -n guard in the Dockerfile is unreachable because pipefail causes the assignment to fail first. docker/test/test_dockerfile_nonroot.py — the _load_module helper could produce a confusing AttributeError rather than a clear message if the module path is incorrect. Important Files Changed
Sequence DiagramsequenceDiagram
participant DF as Dockerfile (build step)
participant SH as isaaclab.sh -p
participant PY as volume_mounts.py
participant YML as docker-compose.yaml
participant OS as os.path.expandvars
DF->>SH: "invoke with DOCKER_* env vars exported"
SH->>PY: python volume_mounts.py (via CLI -p flag)
PY->>YML: yaml.safe_load() parse x-default-isaac-lab-volumes
YML-->>PY: list of type:volume mount entries
PY->>OS: expandvars each target path
OS-->>PY: /isaac-sim/kit/cache, /workspace/isaaclab/logs
PY-->>SH: print one resolved path per line (exit 1 if unresolved)
SH-->>DF: stdout piped through grep
DF->>DF: mkdir -p dirs
DF->>DF: chown -R isaaclab:isaaclab dirs
DF->>DF: USER isaaclab
Reviews (1): Last reviewed commit: "Derive volume mount points from compose ..." | Re-trigger Greptile |
| spec = importlib.util.spec_from_file_location(name, path) | ||
| module = importlib.util.module_from_spec(spec) | ||
| spec.loader.exec_module(module) |
There was a problem hiding this comment.
spec and spec.loader are not guarded against None
importlib.util.spec_from_file_location returns None when the path cannot be found or resolved, and module_from_spec(None) raises AttributeError rather than a descriptive error. If the path to volume_mounts.py is ever wrong in a test environment, the failure message will be opaque. A defensive check would improve debuggability.
| spec = importlib.util.spec_from_file_location(name, path) | |
| module = importlib.util.module_from_spec(spec) | |
| spec.loader.exec_module(module) | |
| spec = importlib.util.spec_from_file_location(name, path) | |
| assert spec is not None, f"Could not find module at {path}" | |
| assert spec.loader is not None, f"No loader for module at {path}" | |
| module = importlib.util.module_from_spec(spec) | |
| spec.loader.exec_module(module) |
There was a problem hiding this comment.
Done in b17f0e3 — added the spec/spec.loader None asserts so a bad path fails with a clear message.
There was a problem hiding this comment.
Looks great, thanks! The None guards for spec and spec.loader make the failure mode much clearer. 👍
| RUN set -o pipefail \ | ||
| && export DOCKER_ISAACSIM_ROOT_PATH="${ISAACSIM_ROOT_PATH}" \ | ||
| DOCKER_ISAACLAB_PATH="${ISAACLAB_PATH}" \ | ||
| DOCKER_USER_HOME="${DOCKER_USER_HOME}" \ | ||
| && dirs="$(${ISAACLAB_PATH}/isaaclab.sh -p ${ISAACLAB_PATH}/docker/utils/volume_mounts.py | grep '^/')" \ | ||
| && test -n "${dirs}" \ | ||
| && mkdir -p ${dirs} \ | ||
| && chown -R isaaclab:isaaclab ${dirs} |
There was a problem hiding this comment.
test -n "${dirs}" guard is unreachable under set -o pipefail
With pipefail active, grep '^/' exits 1 whenever it sees no matching lines (including when its stdin is empty because main() failed). The assignment dirs="$(... | grep '^/')" therefore exits non-zero and the && chain short-circuits before test -n "${dirs}" is ever evaluated. The guard is dead code in every failure scenario — pipefail already does the job. This is harmless but the inline comment describing it as "the non-empty guard" overstates its actual role.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Good catch — removed in b17f0e3. Under set -o pipefail the dirs=$(... | grep "^/") assignment already exits non-zero (grep returns 1 on no matches) and short-circuits the chain, so test -n was unreachable. Reworded the comment to credit pipefail.
There was a problem hiding this comment.
Perfect, thanks for the clean fix! The pipefail short-circuit logic makes sense — much clearer with the updated comment.
- Guard the test's module loader against a None spec/loader so a wrong
path gives a clear message instead of an opaque AttributeError.
- Drop the redundant `test -n "${dirs}"` from the Dockerfile prep step:
under `set -o pipefail` the `dirs=$(... | grep '^/')` assignment already
fails (grep exits non-zero on no matches) and short-circuits the chain,
so the guard was unreachable. Reword the comment to credit pipefail and
update the wiring test accordingly.
…clab user (cherry-pick #6082 → 3.0.0-beta2) (#6095) ## Summary - Cherry-pick of #6082 onto `release/3.0.0-beta2`. Net diff is identical to the develop-side PR; applies cleanly with no conflicts. - Fixes training regressions in the non-root Docker images (base + cuRobo): `skrl` training aborts with `PermissionError` creating `logs/`, and Isaac Sim emits `omni.datastore` lock failures under `kit/cache` (tripping the CI training-log blacklist). - Root cause: a fresh Docker named volume inherits ownership from the image directory at its mount path; the non-root `isaaclab` user (uid/gid 1000) cannot write mount points the image left missing or root-owned. - Pre-creates and `chown`s every named-volume mount point before the `USER` switch, driven by a single source of truth — `docker-compose.yaml`, parsed by `docker/utils/volume_mounts.py` — so the list is never duplicated in the Dockerfiles. ## 1. Background #5618 switched the base / ROS 2 / cuRobo images to `USER isaaclab` (uid/gid 1000) so bind-mounted workspaces stay writable for the host/runner user. That change covered the `${DOCKER_USER_HOME}` mount points (swept by the recursive `chown`), but not the named volumes that mount outside the runtime home. ## 2. Mechanism When an empty named volume is mounted for the first time, Docker (the root daemon) seeds the volume's ownership from the image directory at that path. If that directory is missing or root-owned in the image, the volume comes up `root:root` and the uid-1000 runtime user is denied. The home-based volumes already inherited the right owner; the ones under `/isaac-sim` and `/workspace/isaaclab` did not (`kit/cache` was created root-owned by a build `mkdir`; `logs` / `data_storage` / `docs/_build` did not exist in the image at all). ## 3. Fix 3.1 `docker/utils/volume_mounts.py` — parses the `type: volume` targets out of `docker-compose.yaml` (the single source of truth) and resolves the `${VAR}` paths. Reuses PyYAML, already an IsaacLab dependency. 3.2 `Dockerfile.base` / `Dockerfile.curobo` — before `USER isaaclab`, invoke the parser and `mkdir -p` + `chown -R isaaclab:isaaclab` every mount point, so fresh volumes inherit `isaaclab` ownership. `set -o pipefail` plus a non-empty guard abort the build if the parse yields nothing, rather than silently shipping an unprepared mount point. The ROS 2 image builds `FROM` base and inherits the fix. 3.3 Adding a volume to `docker-compose.yaml` is now prepared automatically — no second list to keep in sync. ## 4. Test `docker/test/test_dockerfile_nonroot.py` unit-tests the parser (asserting it returns the mount points that triggered the regression) and asserts each non-root Dockerfile wires the parser in with the guard. Static, no image build required. ## 5. Validation - Static tests pass (`13 passed`); the parser test fails if compose drops a tracked volume. - Verified on a freshly built base image: all named-volume mount points come up `uid 1000`, and the exact QA training (`skrl` Anymal-C) runs to completion with no `PermissionError` and zero `omni.datastore` lock failures. ## 6. Notes - No changelog fragment: `docker/` is not under `source/<pkg>` (matches #5618). - cuRobo has no compose file today, but it creates `kit/cache` as root and runs Isaac Sim non-root, so its cache is unwritable even without a volume mount; the same step fixes it.
## Summary
Adds a named volume for `${DOCKER_ISAACSIM_ROOT_PATH}/kit/data` so the
non-root `isaaclab` user can write the Kit kernel config there.
Fixes the residual `omni.kit.app.plugin` error reported in nvbug
6288406:
```
[Error] [omni.kit.app.plugin] failed to open file '/isaac-sim/kit/data/Kit/IsaacLab/3.0/user.config.json' - unable to save the dictionary
```
## Root cause
PR #6082 made every docker-compose named-volume mount point writable by
the non-root user (driven by `docker/utils/volume_mounts.py` + the
build-time `chown`). But `kit/data` was **never a declared volume** —
pre–non-root migration it was writable simply because the container ran
as `root`. Under the non-root image it stays `root:root`, so the Kit
kernel cannot write `kit/data/.../user.config.json`. This is **universal
to every docker user** (the image bakes `USER isaaclab`; `kit/data` is
neither mounted nor chowned), independent of the launch flow.
(The sibling `omni.datastore` lock errors on `kit/cache` in the original
report were a QA workflow artifact of a raw `docker run` that bypasses
the compose volumes; the documented `container.py`/compose flow already
handles those via #6082.)
## Fix
Declare `kit/data` as a named volume. `docker-compose.yaml` is the
single source of truth, so this routes it through the existing
`volume_mounts.py` + `chown` machinery — the build pre-creates and
chowns `/isaac-sim/kit/data` to `isaaclab`. Consistent with
`kit/cache`/`kit/logs`, and it benignly persists Kit config across runs.
## Validation
Built the image (`nvcr.io/nvidian/isaac-sim:latest-release-6-0` base)
and verified as the non-root runtime user (uid 1000):
| | `/isaac-sim/kit/data` | non-root write of `user.config.json` |
|---|---|---|
| Pre-fix (base image, inherited) | not writable | **Permission denied**
|
| Post-fix (this PR) | owned by uid 1000 | **OK** |
`volume_mounts.py` now emits `/isaac-sim/kit/data`, and the running
container no longer logs the `user.config.json` / `omni.datastore` /
`DerivedDataCache` errors.
…6122) (#6123) Cherry-pick of #6122 to `release/3.0.0-beta2` for nvbug 6288406. Makes `/isaac-sim/kit/data` writable by the non-root `isaaclab` user by declaring it as a named volume, so the Kit kernel can write `user.config.json` (the residual that #6082/#6095 did not cover — `kit/data` was never a declared volume). Routes through the existing `volume_mounts.py` + build-time `chown`; `docker-compose.yaml` is the single source of truth. Clean cherry-pick (release already has the #6095 volume-prep mechanism). Validated on develop in #6122: on a clean rebuild, `/isaac-sim/kit/data` is owned by uid 1000 and `user.config.json` writes succeed (pre-fix: `Permission denied`).
Summary
skrltraining aborts withPermissionErrorcreatinglogs/, and Isaac Sim emitsomni.datastorelock failures underkit/cache(tripping the CI training-log blacklist).isaaclabuser (uid/gid 1000) cannot write mount points the image left missing or root-owned.chowns every named-volume mount point before theUSERswitch, driven by a single source of truth —docker-compose.yaml, parsed bydocker/utils/volume_mounts.py— so the list is never duplicated in the Dockerfiles.1. Background
#5618 switched the base / ROS 2 / cuRobo images to
USER isaaclab(uid/gid 1000) so bind-mounted workspaces stay writable for the host/runner user. That change covered the${DOCKER_USER_HOME}mount points (swept by the recursivechown), but not the named volumes that mount outside the runtime home.2. Mechanism
When an empty named volume is mounted for the first time, Docker (the root daemon) seeds the volume's ownership from the image directory at that path. If that directory is missing or root-owned in the image, the volume comes up
root:rootand the uid-1000 runtime user is denied. The home-based volumes already inherited the right owner; the ones under/isaac-simand/workspace/isaaclabdid not (kit/cachewas created root-owned by a buildmkdir;logs/data_storage/docs/_builddid not exist in the image at all).3. Fix
3.1
docker/utils/volume_mounts.py— parses thetype: volumetargets out ofdocker-compose.yaml(the single source of truth) and resolves the${VAR}paths. Reuses PyYAML, already an IsaacLab dependency.3.2
Dockerfile.base/Dockerfile.curobo— beforeUSER isaaclab, invoke the parser andmkdir -p+chown -R isaaclab:isaaclabevery mount point, so fresh volumes inheritisaaclabownership.set -o pipefailplus a non-empty guard abort the build if the parse yields nothing, rather than silently shipping an unprepared mount point. The ROS 2 image buildsFROMbase and inherits the fix.3.3 Adding a volume to
docker-compose.yamlis now prepared automatically — no second list to keep in sync.4. Test
docker/test/test_dockerfile_nonroot.pyunit-tests the parser (asserting it returns the mount points that triggered the regression) and asserts each non-root Dockerfile wires the parser in with the guard. Static, no image build required.5. Validation
13 passed); the parser test fails if compose drops a tracked volume.uid 1000, and the exact QA training (skrlAnymal-C) runs to completion with noPermissionErrorand zeroomni.datastorelock failures.6. Notes
docker/is not undersource/<pkg>(matches Make Isaac Lab Docker images run as non-root #5618).kit/cacheas root and runs Isaac Sim non-root, so its cache is unwritable even without a volume mount; the same step fixes it.