Skip to content

mjwarp-render#1113

Merged
erikfrey merged 28 commits into
mainfrom
render
Feb 6, 2026
Merged

mjwarp-render#1113
erikfrey merged 28 commits into
mainfrom
render

Conversation

@StafaH

@StafaH StafaH commented Feb 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR introduces a GPU-accelerated ray tracing renderer for MuJoCo Warp, enabling parallel rendering of RGB
and depth images across thousands of simulation worlds.

Key Features

Mesh Rendering with Textures

The renderer uses BVH accelerated mesh rendering with full texture support. Texture mapping includes CUDA texture support in provided through warp==1.12 with bilinear filtering

539952308-fd61f463-3960-4fc9-82a0-a7275fac7335 render_preview_Franka Panda Visual_cam0

Heightfield Rendering

Heighfield rendering from the MuJoCo heightfield data is possible. Rendering is optimized for high throughput through mesh optimizations during initialization.

render_preview_Apptronik Heightfield_cam0

Flex Rendering

The new renderer includes an initial prototype for flex rendering. Currently only 2D and 3D flex objects are supported. Flex rendering is still a WIP and performance and features will continue to improve.

Lighting and Shadows

Rendering supports dynamic lighting with configurable shadows, all of which can be domain randomized from the Model fields. Here is an example with two lights.

debug

Additional features include:

  • Multi-camera support with per-camera resolution, FOV, and intrinsics
  • Batched rendering across parallel worlds with Domain Randomization for sim2real robotics.
  • BVH accelerated ray/rays API for custom raycast sensors

Public API

To work with rendering, a few additions have been made to the public API.

All BVH accelerated rendering or raycasting requires a RenderContext. RenderContext hold the BVH structures as well as rendering specific model fields, and output buffers.

rc = mjw.create_render_context(
    mjm, m, d,
    cam_res=(256, 256),           # Override camera resolution (or per-camera list)
    render_rgb=True,              # Enable RGB output (or per-camera list)
    render_depth=True,            # Enable depth output (or per-camera list)
    use_textures=True,            # Apply material textures
    use_shadows=False,            # Enable shadow casting (slower)
    enabled_geom_groups=[0, 1],   # Only render geoms in groups 0 and 1
    cam_active=[True, False],     # Selectively enable/disable cameras
    flex_render_smooth=True,      # Smooth shading for soft bodies
)

In the render context, you can customize what you want each camera to do. Each setting can be applied globally or per-camera. The render context also has the ability to read from the new MuJoCo spec that allows for camera customization:

<camera name="front_camera" pos="3 0 2" xyaxes="0 1 0 -0.6 0 0.8" resolution="64 64" output="rgb depth"/>

To render all cameras, users must first call refit to update the BVH trees, and then call render which will render all cameras that are meant to be rendered into the output buffers.

mjw.refit_bvh(m, d, rc)
mjw.render(m, d, rc)

The result can be accessed through output buffers. The output buffers are linear of shape (nworld, pixels). We provide rgb_adr and depth_adr for users to correctly access camera data. RGB data is packed into a uint32 and need to be unpacked for downstream use. Here is an example:

world, cam = 0, 0
width, height = rc.cam_res.numpy()[cam]
rgb_adr = rc.rgb_adr.numpy()[cam]
depth_adr = rc.depth_adr.numpy()[cam]

# Extract RGB (packed as uint32: 0xAARRGGBB)
rgb_packed = rc.rgb_data.numpy()[world, rgb_adr : rgb_adr + width * height]
rgb_packed = rgb_packed.reshape(height, width).astype(np.uint32)
rgb_image = np.dstack([
    ((rgb_packed >> 16) & 0xFF).astype(np.uint8),  # R
    ((rgb_packed >> 8) & 0xFF).astype(np.uint8),   # G
    (rgb_packed & 0xFF).astype(np.uint8),          # B
])

# Extract depth
depth = rc.depth_data.numpy()[world, depth_adr : depth_adr + width * height]
depth_image = depth.reshape(height, width)

# Display or save
Image.fromarray(rgb_image).save("rgb.png")
Image.fromarray((np.clip(depth_image / 5.0, 0, 1) * 255).astype(np.uint8)).save("depth.png")

Benchmarks

Rendering can be benchmarked from the CLI using the mujoco-warp testspeed tool. Here is an example:

mjwarp-testspeed benchmarks/primitives.xml --function=render

Below are some benchmarks across a variety of scenes.

Cartpole

render_benchmark_total_fps_Cartpole

Franka (Visual + Primitive)

render_benchmark_total_fps_Franka

Primitives (100+ geometry)

render_benchmark_total_fps_Primitives Scene

Height Field + Apptronik

render_benchmark_total_fps_Apptronik Heightfield

Limitations

1. Visual Meshes vs Primitives

Where possible, users should use primitives over visual meshes. Mesh rendering is costly and scales with mesh complexity. For vision-based learning, especially for non-sim2real usage, it is recommended to use the primitives of roughly the same shape as the original mesh. Often the best practice here is to have both in your model, and label them in the MuJoCo xml/mjcf as group=3 or group=4 (collisions are often already marked as group=3). Rendering can be toggled between the visual and collision or primitive meshes by changing the enabled_geom_groups argument of the RenderContext

2. Flex Rendering

Flex rendering is currently WIP and will continue to evolve and improve.

3. Higher Resolution / More Cameras

The renderer currently scales linearly with resolution and number of cameras, as it scales with the total number of rays that need to be raycast. Higher resolutions or more cameras will lead to lower throughput. Improving high resolution rendering performance is WIP and on the roadmap.

Happy Rendering!

The renderer will continue to improve as new features and performance optimizations are added. Please post a GitHub issue if there are any bugs or concerns or specific features that you need.

StafaH and others added 17 commits December 13, 2025 19:42
* Integrate madrona_warp into mjwarp

* Fix bug for <8 views

* Move mesh construction to SAH, tile pixels

* Change to registry to pass jax io test

* Remove dist from bvh_query_ray

* Add render context

* Update testspeed

* Add render context registry wrapper

* Fix circular import

* fix ray box and ambient lighting

* Cleanup rc options in kernel

* Small fix to static nlight

* ray capsule fix

* Fix testspeed merge artifact

* Cleanup before PR into mjwarp

* Remove global fovy, use cam 0 values

* Cleanup bvh/render/ray

* Fix capsule calc comment

* Cleanup docstrings

* Cleanup testspeed for render

* Fix bug when no texid is set
* Integrate madrona_warp into mjwarp

* Fix bug for <8 views

* Move mesh construction to SAH, tile pixels

* Change to registry to pass jax io test

* Remove dist from bvh_query_ray

* Add render context

* Update testspeed

* Add render context registry wrapper

* Fix circular import

* fix ray box and ambient lighting

* Cleanup rc options in kernel

* Small fix to static nlight

* ray capsule fix

* Fix testspeed merge artifact

* Cleanup before PR into mjwarp

* Remove global fovy, use cam 0 values

* Cleanup bvh/render/ray

* Fix capsule calc comment

* Cleanup docstrings

* Cleanup testspeed for render

* Fix bug when no texid is set

* hetero camera support

* Add heightfield support

* Add flex mesh building

* Add correct flex 2d mesh building

* Fixes for downstream use

* Rename output buffers to match mujoco style

* Add flex support, improve render script

* Add selective camera rendering support via cam_active parameter

* Fix render output reading, fix bg color

* Add more optimzed hfield mesh building

* Add visual franka benchmark, fix merge artifacts

* Add 3d flex rendering support

* Add ellipsoid and cylinder primitives

* Cleanup flex 2d and 3d rendering

* Cleanup of bvh code

* Fix 2d flex side triangle update

* More cleanup for PR

* More cleanup of naming, paranthesis, casts

* Add correct frustrum calculation for rays

* Ruff check

* Fix naming, add todo

* Update kernel arguments using analyzer

* Remove comment

* Remove visual assets

* Add back camera

* Remove stray 1, dim

---------

Co-authored-by: Kevin Zakka <kevinarmandzakka@gmail.com>
* Add ray with normal (#940)

* Add ray with normal

* Ruff

* Propagate ray normal changes

* ruff

* Fix tuple type check

* Update ray to return -1 for non-hits

* Ray normal fixes (#960)

* Add ray with normal

* Ruff

* Propagate ray normal changes

* ruff

* Fix tuple type check

* Update ray to return -1 for non-hits

* Small fixes

* Ruff format

* Simplify assert for vec3

* Add comment and adjust docstring

* Fix bugs in ray hfield impl (#967)
* Add ray with normal (#940)

* Add ray with normal

* Ruff

* Propagate ray normal changes

* ruff

* Fix tuple type check

* Update ray to return -1 for non-hits

* Ray normal fixes (#960)

* Add ray with normal

* Ruff

* Propagate ray normal changes

* ruff

* Fix tuple type check

* Update ray to return -1 for non-hits

* Small fixes

* Ruff format

* Simplify assert for vec3

* Add comment and adjust docstring

* Fix bugs in ray hfield impl (#967)

* clamp pair_friction with minmu (#969)

* add --kernel_cache_dir pytest flag for warp kernel cache (#968)

* fix _NCONMAX info (#965)

* reuse xnorm (#959)

* Expose rays through the public API. (#975)

---------

Co-authored-by: Taylor Howell <taylorhowell@google.com>
Co-authored-by: Kevin Zakka <kevinzakka@users.noreply.github.com>
* Update to new cam proj field

* Fix frustrum aspect ratio
* ruff

* Move render script to contrib

* Add render tests

* Move create render context into io.py

* Cleanup bvh test, ruff

* cleanup render_context_test

* More cleanup
* Update context to read from new mujoco fields

* Add anyhit ray mesh query for fast shadows

* Cleanup tests, msgs, comments
* Fix naming, move mesh bvh build to bvh module

* Add batched cam intrinsic for randomization

* Fix indexing in render context

* ruff
* Fix failing tests with new batched cam fields

* Move hfield bvh build to bvh module

* Fix docstring
* Fix flex rendering for multiple flex

* Add flex bvh test
* Add cuda texture sampling

* Fix textures and mesh sample, cleanup

* ruff

* Clean up texture processing kernel

* Fix flipped texture uv

* Remove nested texture kernel
* Add bvh accelerated ray

* Move ray_bvh into ray

* Remove ray_bvh_test and merge with ray_test

* Use absolute import

* Cleanup public API for render/bvh
* Move RenderContext to types

* Cleanup flex fields and refit kernel

* Use array in render context type

* Fix all render related tests after refactor

* Move packing to render util

* Cleanup render file

* Cleanup based on pr comments

* Fix error with rc init
* Fix error for downstream mjlab

* Fix ray none check

* Change nested kernel to wp.kernel

* Remove TODO
@StafaH StafaH requested review from erikfrey and thowell February 3, 2026 20:54
Comment thread uv.lock
@erikfrey

erikfrey commented Feb 3, 2026

Copy link
Copy Markdown
Collaborator

@StafaH amazing!

Can you resolve the conflicts on the branch? Then I'll run tests and we'll see where we are.

Comment thread contrib/render.py Outdated
Comment thread mujoco_warp/__init__.py Outdated
Comment thread mujoco_warp/_src/benchmark.py Outdated
Comment thread mujoco_warp/_src/bvh.py Outdated
Comment thread mujoco_warp/_src/bvh.py Outdated
Comment thread mujoco_warp/_src/bvh.py Outdated
Comment thread mujoco_warp/_src/render.py Outdated
Comment thread mujoco_warp/_src/render.py


@event_scope
def render(m: Model, d: Data, rc: RenderContext):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikfrey should we update the kernel analyzer (probably in a separate pr) to check for something like # RenderContext in: and # RenderContext out:?

Comment thread mujoco_warp/_src/render.py Outdated
Comment thread mujoco_warp/_src/sensor.py Outdated
Comment thread mujoco_warp/_src/types.py Outdated
Comment thread mujoco_warp/_src/types.py Outdated
Comment thread mujoco_warp/_src/types.py Outdated
@thowell

thowell commented Feb 4, 2026

Copy link
Copy Markdown
Collaborator

great work @StafaH!

left a few comments in the code. additionally, in order to merge this pr we need compatibility with the latest release warp (1.11.0). since the texture features depend on pre-release warp (1.12.0) we will need to add guards in the code for the warp version.

we should follow the pattern in io.py for mujoco that checks the version and introduces a global constant BLEED_EDGE_MUJOCO see here. for warp we should check the version and introduce something like BLEEDING_EDGE_WARP to utilize for the guards in the code.

Comment thread uv.lock Outdated
wheels = [
{ url = "https://files.pythonhosted.org/packages/2e/54/647ade08bf0db230bfea292f893923872fd20be6ac6f53b2b936ba839d75/zipp-3.23.0-py3-none-any.whl", hash = "sha256:071652d6115ed432f5ce1d34c336c0adfd6a884660d1e9712a256d3d3bd4b14e", size = 10276, upload-time = "2025-06-08T17:06:38.034Z" },
]
] No newline at end of file

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix this? thanks!

@StafaH StafaH requested a review from thowell February 6, 2026 18:04

@erikfrey erikfrey left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean code!

@erikfrey erikfrey merged commit d8bf71f into main Feb 6, 2026
9 checks passed
@forrestthewoods

Copy link
Copy Markdown

@thowell this is very neat! I have several questions.

  1. can you help me understand these benchmark numbers. what GPUs were they performed on?
  2. how exactly do I run benchmarks other than primitive? how do I save out debug screenshots to see what I renderered? how do I run a render-only benchmark to test just rendering?
  3. actually, how do I get render framerate for even primitives?

Running the command uv run mjwarp-testspeed benchmarks/render/primitives.xml --function=render --nworld=512 on an RTX 4090 I got:

Summary for 512 parallel rollouts

Total JIT time: 0.02 s
Total simulation time: 0.98 s
Total steps per second: 522,101
Total realtime factor: 1,044.20 x
Total time per step: 1915.34 ns
Total converged worlds: 210 / 512

I don't really know how to interpret this. I don't know how to recreate the render FPS benchmark results you shared in this PR.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants