Skip to content

MJCF parser does not parse <include> tags #1389#1468

Merged
vreutskyy merged 5 commits into
newton-physics:mainfrom
vreutskyy:1389-mjcf-parser-does-not-parse-include-tags
Jan 28, 2026
Merged

MJCF parser does not parse <include> tags #1389#1468
vreutskyy merged 5 commits into
newton-physics:mainfrom
vreutskyy:1389-mjcf-parser-does-not-parse-include-tags

Conversation

@vreutskyy

@vreutskyy vreutskyy commented Jan 27, 2026

Copy link
Copy Markdown
Member

Description

Closes #1389

Adds support for <include> tags in the MJCF parser, enabling modular MJCF model definitions. This is particularly useful for the MuJoCo Menagerie pattern where scene.xml files include robot definitions along with additional options, ground planes, and collision filtering.

Changes

Core Implementation:

  • Added _load_and_expand_mjcf() function that recursively loads MJCF files and expands all <include> elements before parsing
  • Added _default_path_resolver() function for resolving file paths (supports both relative and absolute paths)
  • Added optional path_resolver callback parameter to parse_mjcf() and ModelBuilder.add_mjcf() for custom path resolution
  • Added is_xml_content() utility function to distinguish XML content from file paths

Key Features:

  • Recursive include expansion with cycle detection
  • Asset paths (mesh, texture, hfield) in included files are automatically resolved relative to the included file's directory, not the main file
  • Custom path_resolver callback allows virtual file systems or special path handling

Usage

# Basic usage - includes are resolved automatically
builder = newton.ModelBuilder()
builder.add_mjcf("scene.xml")  # Will process any <include file="..."/> tags

# Custom resolver for virtual file systems or special path handling
def custom_resolver(base_dir, file_path):
    # For includes: can return file path or XML content directly
    # For assets: must return absolute file path
    return my_vfs.resolve(base_dir, file_path)

builder.add_mjcf(mjcf_source, path_resolver=custom_resolver)

Tests Added

  • Basic includes (same directory, subdirectory, absolute paths)
  • Multi-section includes (asset, default, worldbody)
  • Nested includes
  • Circular include detection (including self-includes)
  • Missing include files
  • Include without file attribute
  • Invalid source handling
  • Relative includes without base directory
  • Custom path_resolver callbacks
  • Asset path resolution with mesh vertex verification

Newton Migration Guide

  • The migration guide in docs/migration.rst is up-to date

No migration needed - this is a new feature with backward-compatible defaults.

Before your PR is "Ready for review"

  • Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
  • Documentation is up-to-date
  • Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

  • New Features

    • Optional custom include resolver for MJCF imports; includes are expanded before parsing with cycle detection and resolved base locations used for asset/mesh paths.
  • Utilities

    • Added a small helper to detect XML-string inputs.
  • Tests

    • Extensive test coverage for include resolution: nesting, multiple files, cycles, missing files, callback-based resolvers, and XML-string inputs.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jan 27, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds MJCF expansion with a pluggable path_resolver wired through ModelBuilder.add_mjcf, an XML-input detector, and tests; includes are expanded (recursively) before parsing and resolved base_dir is used for asset/mesh resolution.

Changes

Cohort / File(s) Summary
MJCF include expansion & parsing
newton/_src/utils/import_mjcf.py
Added _default_path_resolver and _load_and_expand_mjcf to recursively expand <include> elements with cycle detection. parse_mjcf(...) gains a path_resolver parameter and now uses expanded XML and returned base_dir for asset/mesh resolution.
Builder API change
newton/_src/sim/builder.py
ModelBuilder.add_mjcf(...) signature extended with `path_resolver: Callable[[str
Import utilities
newton/_src/utils/import_utils.py
Added is_xml_content(source: str) -> bool to detect XML-string inputs versus file paths.
Tests for include behavior
newton/tests/test_import_mjcf.py
Added extensive tests covering same-dir/subdir/absolute includes, multiple and nested includes, circular include detection, missing includes, custom resolver callbacks (including XML strings), and integration with assets/meshes/worldbody.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Builder as ModelBuilder
    participant Parser as import_mjcf.parse_mjcf
    participant Loader as _load_and_expand_mjcf
    participant Resolver as path_resolver
    participant FS as FileSystem

    User->>Builder: add_mjcf(source, path_resolver)
    Builder->>Parser: parse_mjcf(source, path_resolver)
    Parser->>Loader: _load_and_expand_mjcf(source, path_resolver)
    Loader->>Loader: detect XML-string vs filepath
    alt filepath
        Loader->>FS: read source file
        FS-->>Loader: file content
    end
    loop for each <include>
        Loader->>Resolver: path_resolver(base_dir, include_path)
        Resolver-->>Loader: resolved path or XML content
        alt resolved path
            Loader->>FS: read resolved file
            FS-->>Loader: include content
        end
        Loader->>Loader: insert expanded content, check circular includes
    end
    Loader-->>Parser: expanded XML root, base_dir
    Parser->>Parser: parse expanded MJCF using base_dir for assets/meshes
    Parser-->>Builder: parsed model data
    Builder-->>User: constructed model
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • adenzler-nvidia
  • eric-heiden
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding support for MJCF tags, which is the primary objective addressed in the PR.
Linked Issues check ✅ Passed The PR fully addresses issue #1389 by implementing tag support with recursive expansion, cycle detection, and custom resolver callbacks for flexible path resolution.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing tag support: new resolver utilities, include expansion logic, API parameter additions, and comprehensive tests. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 84.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
newton/_src/utils/import_mjcf.py (1)

191-259: Asset paths in included files resolve against the top-level directory

After expansion, mjcf_dirname is derived only from the top-level file. Any meshes/textures defined inside an included file (especially from a subdirectory) will resolve relative to the main file, not the included file’s directory. That breaks common layouts (e.g., scene.xml includes robot/robot.xml where meshes are robot/meshes/...).

Consider tracking each included file’s base directory and rewriting relative asset paths to absolute during expansion, or carrying per-subtree base_dir information into asset resolution.

🤖 Fix all issues with AI agents
In `@newton/_src/utils/import_mjcf.py`:
- Around line 58-107: The cycle detection in _load_and_expand_mjcf currently
uses a global included_files set and falsely flags legitimate repeated includes
as cycles; change this to use a stack-based "active include path" for cycle
detection by replacing or augmenting the included_files param with an
active_stack (e.g., list[str]) that you push the resolved file path onto before
the recursive call and pop it after returning; check membership in this
active_stack to detect true cycles, and do not treat files already processed
elsewhere on the tree (but not on the current stack) as cycles—optionally keep a
separate processed_files set to skip re-parsing but do not use it for cycle
detection. Ensure this logic is applied only for resolved file paths (when not
is_xml_content(resolved)) and reference the function name _load_and_expand_mjcf,
the variables included_files/resolved/active_stack, and the recursive call to
_load_and_expand_mjcf.

In `@newton/_src/utils/import_utils.py`:
- Around line 169-182: The is_xml_content function is too permissive (returns
True for strings containing ? or *), so update is_xml_content to only detect
XML-like content by checking for a leading '<' after stripping whitespace or for
common XML prefixes (e.g. '<?xml') and/or simple tag patterns (e.g. '<tag'
followed later by '>') instead of scanning for characters like '?' or '*' that
appear in valid POSIX/URL paths; modify the logic in is_xml_content to trim the
input and then test for these stricter indicators (use the function name
is_xml_content and its docstring to guide the change).

In `@newton/tests/test_import_mjcf.py`:
- Around line 3690-3744: Tests define local resolver functions (custom_resolver,
tracking_resolver, tracking_resolver in
test_xml_string_input_with_custom_resolver) that accept parameters they don't
use, triggering Ruff ARG001; update those function signatures to silence the
lint by prefixing unused parameters with an underscore (e.g., base_dir ->
_base_dir, include_file -> _include_file) or add a local noqa comment (# noqa:
ARG001) to the function line so the resolver behavior remains unchanged while
removing the lint warning.

Comment thread newton/_src/utils/import_mjcf.py
Comment thread newton/_src/utils/import_utils.py Outdated
Comment thread newton/tests/test_import_mjcf.py Outdated
Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
@codecov

codecov Bot commented Jan 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@newton/_src/utils/import_mjcf.py`:
- Around line 81-87: The XML parsing in import_mjcf.py (and similarly in
import_urdf.py) is vulnerable because the code calls ET.fromstring(...) and
ET.parse(...) on untrusted input (see use in the is_xml_content branch and the
file-path branch); replace these uses with a hardened parser from defusedxml
(import defusedxml.ElementTree at module level, e.g., "from defusedxml import
ElementTree as defused_ET") and call
defused_ET.fromstring(sanitize_xml_content(source)) and
defused_ET.parse(source).getroot() instead of ET.fromstring/ET.parse, update the
module-level imports in both import_mjcf.py and import_urdf.py, and add
defusedxml to the project dependencies.
🧹 Nitpick comments (1)
newton/_src/sim/builder.py (1)

1657-1658: Clarify resolve_include behavior when parsing XML strings.

The new parameter is good, but the docstring doesn’t mention that base_dir can be None when source is XML content, which affects relative include resolution. Consider making that explicit to reduce surprises for callers.

📝 Suggested docstring tweak
-            resolve_include (Callable): Callback to resolve <include> file paths. Takes (base_dir, include_file) and returns either a file path or XML content directly. The default resolver concatenates paths and returns file paths.
+            resolve_include (Callable | None): Callback to resolve <include> file paths. Takes (base_dir, include_file) and returns either a file path or XML content directly. When parsing from an XML string, base_dir is None, so relative includes require a custom resolver. The default resolver concatenates paths and returns file paths.

Comment thread newton/_src/utils/import_mjcf.py
@adenzler-nvidia

Copy link
Copy Markdown
Member

How did you come up with the requirement/solution for the resolver, is this also done in MuJoCo or did you experience issues somewhere? I would expect the default resolver to cover many use-cases already.

I'm not arguing against it, I'm just wondering where this comes from because myself I would never have though about this.

@vreutskyy

Copy link
Copy Markdown
Member Author

Yes, default path resolver handles loading from files, but if user wants to load XML string directly, not from a file, importer doesn't know the directory path where to look for includes and assets (meshes, textures, etc.). With a custom path resolver user can: (1) for includes, provide the absolute path from where to load it or just return the XML content of that include, (2) for assets, provide the absolute path to load the asset. MuJoCo uses a similar approach, it allows user to provide a Virtual File System object importer can use to load includes and assets. I think my approach is simpler (I borrowed it from DirectX's shader loader system - at least how I remember it:) but the idea is the same.

@adenzler-nvidia

Copy link
Copy Markdown
Member

Thanks for the info, yes it's very clever. Nice job.

@vreutskyy vreutskyy added this pull request to the merge queue Jan 28, 2026
Merged via the queue into newton-physics:main with commit cd75ad1 Jan 28, 2026
22 checks passed
mmacklin pushed a commit to mmacklin/newton that referenced this pull request Apr 7, 2026
…-physics#1468)

Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MJCF parser does not parse <include> tags

3 participants