MJCF parser does not parse <include> tags #1389#1468
Conversation
Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
📝 WalkthroughWalkthroughAdds MJCF expansion with a pluggable path_resolver wired through ModelBuilder.add_mjcf, an XML-input detector, and tests; includes are expanded (recursively) before parsing and resolved base_dir is used for asset/mesh resolution. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Builder as ModelBuilder
participant Parser as import_mjcf.parse_mjcf
participant Loader as _load_and_expand_mjcf
participant Resolver as path_resolver
participant FS as FileSystem
User->>Builder: add_mjcf(source, path_resolver)
Builder->>Parser: parse_mjcf(source, path_resolver)
Parser->>Loader: _load_and_expand_mjcf(source, path_resolver)
Loader->>Loader: detect XML-string vs filepath
alt filepath
Loader->>FS: read source file
FS-->>Loader: file content
end
loop for each <include>
Loader->>Resolver: path_resolver(base_dir, include_path)
Resolver-->>Loader: resolved path or XML content
alt resolved path
Loader->>FS: read resolved file
FS-->>Loader: include content
end
Loader->>Loader: insert expanded content, check circular includes
end
Loader-->>Parser: expanded XML root, base_dir
Parser->>Parser: parse expanded MJCF using base_dir for assets/meshes
Parser-->>Builder: parsed model data
Builder-->>User: constructed model
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
newton/_src/utils/import_mjcf.py (1)
191-259: Asset paths in included files resolve against the top-level directoryAfter expansion,
mjcf_dirnameis derived only from the top-level file. Any meshes/textures defined inside an included file (especially from a subdirectory) will resolve relative to the main file, not the included file’s directory. That breaks common layouts (e.g.,scene.xmlincludesrobot/robot.xmlwhere meshes arerobot/meshes/...).Consider tracking each included file’s base directory and rewriting relative asset paths to absolute during expansion, or carrying per-subtree base_dir information into asset resolution.
🤖 Fix all issues with AI agents
In `@newton/_src/utils/import_mjcf.py`:
- Around line 58-107: The cycle detection in _load_and_expand_mjcf currently
uses a global included_files set and falsely flags legitimate repeated includes
as cycles; change this to use a stack-based "active include path" for cycle
detection by replacing or augmenting the included_files param with an
active_stack (e.g., list[str]) that you push the resolved file path onto before
the recursive call and pop it after returning; check membership in this
active_stack to detect true cycles, and do not treat files already processed
elsewhere on the tree (but not on the current stack) as cycles—optionally keep a
separate processed_files set to skip re-parsing but do not use it for cycle
detection. Ensure this logic is applied only for resolved file paths (when not
is_xml_content(resolved)) and reference the function name _load_and_expand_mjcf,
the variables included_files/resolved/active_stack, and the recursive call to
_load_and_expand_mjcf.
In `@newton/_src/utils/import_utils.py`:
- Around line 169-182: The is_xml_content function is too permissive (returns
True for strings containing ? or *), so update is_xml_content to only detect
XML-like content by checking for a leading '<' after stripping whitespace or for
common XML prefixes (e.g. '<?xml') and/or simple tag patterns (e.g. '<tag'
followed later by '>') instead of scanning for characters like '?' or '*' that
appear in valid POSIX/URL paths; modify the logic in is_xml_content to trim the
input and then test for these stricter indicators (use the function name
is_xml_content and its docstring to guide the change).
In `@newton/tests/test_import_mjcf.py`:
- Around line 3690-3744: Tests define local resolver functions (custom_resolver,
tracking_resolver, tracking_resolver in
test_xml_string_input_with_custom_resolver) that accept parameters they don't
use, triggering Ruff ARG001; update those function signatures to silence the
lint by prefixing unused parameters with an underscore (e.g., base_dir ->
_base_dir, include_file -> _include_file) or add a local noqa comment (# noqa:
ARG001) to the function line so the resolver behavior remains unchanged while
removing the lint warning.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@newton/_src/utils/import_mjcf.py`:
- Around line 81-87: The XML parsing in import_mjcf.py (and similarly in
import_urdf.py) is vulnerable because the code calls ET.fromstring(...) and
ET.parse(...) on untrusted input (see use in the is_xml_content branch and the
file-path branch); replace these uses with a hardened parser from defusedxml
(import defusedxml.ElementTree at module level, e.g., "from defusedxml import
ElementTree as defused_ET") and call
defused_ET.fromstring(sanitize_xml_content(source)) and
defused_ET.parse(source).getroot() instead of ET.fromstring/ET.parse, update the
module-level imports in both import_mjcf.py and import_urdf.py, and add
defusedxml to the project dependencies.
🧹 Nitpick comments (1)
newton/_src/sim/builder.py (1)
1657-1658: Clarifyresolve_includebehavior when parsing XML strings.The new parameter is good, but the docstring doesn’t mention that
base_dircan beNonewhensourceis XML content, which affects relative include resolution. Consider making that explicit to reduce surprises for callers.📝 Suggested docstring tweak
- resolve_include (Callable): Callback to resolve <include> file paths. Takes (base_dir, include_file) and returns either a file path or XML content directly. The default resolver concatenates paths and returns file paths. + resolve_include (Callable | None): Callback to resolve <include> file paths. Takes (base_dir, include_file) and returns either a file path or XML content directly. When parsing from an XML string, base_dir is None, so relative includes require a custom resolver. The default resolver concatenates paths and returns file paths.
Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
|
How did you come up with the requirement/solution for the resolver, is this also done in MuJoCo or did you experience issues somewhere? I would expect the default resolver to cover many use-cases already. I'm not arguing against it, I'm just wondering where this comes from because myself I would never have though about this. |
|
Yes, default path resolver handles loading from files, but if user wants to load XML string directly, not from a file, importer doesn't know the directory path where to look for includes and assets (meshes, textures, etc.). With a custom path resolver user can: (1) for includes, provide the absolute path from where to load it or just return the XML content of that include, (2) for assets, provide the absolute path to load the asset. MuJoCo uses a similar approach, it allows user to provide a Virtual File System object importer can use to load includes and assets. I think my approach is simpler (I borrowed it from DirectX's shader loader system - at least how I remember it:) but the idea is the same. |
|
Thanks for the info, yes it's very clever. Nice job. |
…-physics#1468) Signed-off-by: Viktor Reutskyy <vreutskyy@nvidia.com>
Description
Closes #1389
Adds support for
<include>tags in the MJCF parser, enabling modular MJCF model definitions. This is particularly useful for the MuJoCo Menagerie pattern wherescene.xmlfiles include robot definitions along with additional options, ground planes, and collision filtering.Changes
Core Implementation:
_load_and_expand_mjcf()function that recursively loads MJCF files and expands all<include>elements before parsing_default_path_resolver()function for resolving file paths (supports both relative and absolute paths)path_resolvercallback parameter toparse_mjcf()andModelBuilder.add_mjcf()for custom path resolutionis_xml_content()utility function to distinguish XML content from file pathsKey Features:
path_resolvercallback allows virtual file systems or special path handlingUsage
Tests Added
path_resolvercallbacksNewton Migration Guide
docs/migration.rstis up-to dateNo migration needed - this is a new feature with backward-compatible defaults.
Before your PR is "Ready for review"
newton/tests/test_examples.py)pre-commit run -aSummary by CodeRabbit
New Features
Utilities
Tests
✏️ Tip: You can customize this high-level summary in your review settings.