Skip to content

New inflation layer with optional OpenMP acceleration#5933

Merged
SteveMacenski merged 31 commits intoros-navigation:mainfrom
botsandus:openmp-inflation-layer
Mar 6, 2026
Merged

New inflation layer with optional OpenMP acceleration#5933
SteveMacenski merged 31 commits intoros-navigation:mainfrom
botsandus:openmp-inflation-layer

Conversation

@tonynajjar
Copy link
Contributor

@tonynajjar tonynajjar commented Feb 4, 2026


Basic Info

Info Please fill out this column
Ticket(s) this addresses (add tickets here #1)
Primary OS tested on Ubuntu
Robotic platform tested on Dexory robot and dev machine
Does this PR contain AI generated software? Yes
Was this PR description generated by AI software? Out of respect for maintainers, AI for human-to-human communications are banned

Description of contribution in a few bullet points

  • Refactored inflation layer for better performance with optional OpenMP parallel programming

Description of documentation updates required from your changes

ros-navigation/docs.nav2.org#861

Description of how this change was tested

  • Correctness test
  • Speed benchmarking script (see comment for results)

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists
  • Should this be backported to current distributions? If so, tag with backport-*.

@tonynajjar tonynajjar changed the title Add inflation benchmark script and enhance inflation tests OpenMP-based inflation layer Feb 4, 2026
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@tonynajjar tonynajjar closed this Feb 5, 2026
@tonynajjar tonynajjar force-pushed the openmp-inflation-layer branch from 01201af to 66f46c6 Compare February 5, 2026 13:23
@tonynajjar tonynajjar reopened this Feb 5, 2026
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@tonynajjar
Copy link
Contributor Author

tonynajjar commented Feb 5, 2026

Note: This is an AI summary from the benchmarking files

Benchmark Comparison Summary

Test Environments

Dev Machine (ubuntu@dexory)

  • CPU: 16 cores × 5400 MHz
  • OS: Ubuntu 24.04.2 LTS
  • L1 Data Cache: 48 KiB × 16 = 768 KiB
  • L1 Instruction Cache: 64 KiB × 16 = 1024 KiB
  • L2 Cache: 3072 KiB × 16 = 48 MB (unified)
  • L3 Cache: 24576 KiB = 24 MB (shared)
  • Total Cache: ~74 MB

Robot (arri-74)

  • CPU: 16 cores × 5000 MHz
  • L1 Data Cache: 48 KiB × 8 = 384 KiB
  • L1 Instruction Cache: 32 KiB × 8 = 256 KiB
  • L2 Cache: 1280 KiB × 8 = 10 MB (unified)
  • L3 Cache: 18432 KiB = 18 MB (shared)
  • Total Cache: ~28.6 MB

Performance Comparison (Key Benchmarks)

1000×1000 Grid (1M cells, 50% occupancy, 2m inflation radius)

Configuration Dev Time Dev Throughput Robot Time Robot Throughput vs Old Dev vs Old Robot
Old Implementation 24.1 ms 41.4 M cells/s 28.7 ms 34.9 M cells/s baseline baseline
New (OpenMP disabled) 6.89 ms 145.2 M cells/s 11.0 ms 91.1 M cells/s 3.5× faster 2.6× faster
New (OpenMP enabled) 2.50 ms 707.3 M cells/s 2.35 ms 482.5 M cells/s 9.6× faster 12.2× faster

2000×2000 Grid (4M cells, 50% occupancy, 2m inflation radius)

Configuration Dev Time Dev Throughput Robot Time Robot Throughput vs Old Dev vs Old Robot
Old Implementation 91.5 ms 43.7 M cells/s 105 ms 38.1 M cells/s baseline baseline
New (OpenMP disabled) 30.6 ms 130.6 M cells/s 48.9 ms 81.8 M cells/s 3.0× faster 2.1× faster
New (OpenMP enabled) 6.64 ms 893.5 M cells/s 9.11 ms 468.9 M cells/s 13.8× faster 11.5× faster

3333×3333 Grid (11.1M cells, 50% occupancy, 2m inflation radius)

Configuration Dev Time Dev Throughput Robot Time Robot Throughput vs Old Dev vs Old Robot
Old Implementation 311 ms 35.7 M cells/s 357 ms 31.2 M cells/s baseline baseline
New (OpenMP disabled) 115 ms 96.8 M cells/s 182 ms 61.2 M cells/s 2.7× faster 2.0× faster
New (OpenMP enabled) 20.3 ms 697.6 M cells/s 29.9 ms 395.1 M cells/s 15.3× faster 11.9× faster

4000×4000 Grid (16M cells, 30% occupancy, 1m inflation radius)

Configuration Dev Time Dev Throughput Robot Time Robot Throughput vs Old Dev vs Old Robot
Old Implementation 261 ms 61.2 M cells/s 302 ms 53.1 M cells/s baseline baseline
New (OpenMP disabled) 176 ms 90.9 M cells/s 268 ms 59.7 M cells/s 1.5× faster 1.1× faster
New (OpenMP enabled) 28.4 ms 660.7 M cells/s 46.3 ms 364.1 M cells/s 9.2× faster 6.5× faster

Key Findings

1. New Implementation Impact (OpenMP disabled)

  • Dev machine: 2.7-3.5× faster than old implementation
  • Robot: 1.1-2.6× faster than old implementation
  • Performance scales better on more powerful dev machine

2. OpenMP Parallelization Impact

  • Dev machine: 2.5-4.8× additional speedup over single-threaded new implementation
  • Robot: 1.8-5.6× additional speedup over single-threaded new implementation
  • Combined with new implementation: 6.5-15.3× faster than old code

3. Grid Size Scaling

  • Old implementation shows poor scaling with grid size (35-43 M cells/s)
  • New implementation (OpenMP disabled) maintains 90-145 M cells/s
  • New implementation (OpenMP enabled) maintains 395-893 M cells/s on dev, 364-483 M cells/s on robot

4. Occupancy Impact (1500×1500 tests)

  • All implementations show relatively consistent performance across 10%, 30%, 50%, 80% occupancy
  • New implementation handles varying occupancy much more efficiently

5. Inflation Radius Impact

  • Old implementation: significant slowdown with larger radii (41→35→31 M cells/s)
  • New implementation: minimal impact from radius variation

Detailed Results by Parameter

Varying Occupancy (1500×1500 grid, 2m inflation)

Occupancy Old Dev New OpenMP Off Dev New OpenMP On Dev Old Robot New OpenMP Off Robot New OpenMP On Robot
10% 8.16 ms (275.8 M/s) 14.7 ms (152.8 M/s) 4.57 ms (816.7 M/s) 11.9 ms (189.7 M/s) 24.0 ms (93.9 M/s) 5.50 ms (445.2 M/s)
30% 28.5 ms (79.1 M/s) 15.3 ms (147.3 M/s) 4.65 ms (763.5 M/s) 36.1 ms (62.3 M/s) 24.2 ms (93.0 M/s) 5.05 ms (483.0 M/s)
50% 67.5 ms (33.4 M/s) 16.1 ms (139.5 M/s) 5.07 ms (763.0 M/s) 80.7 ms (27.9 M/s) 24.8 ms (90.9 M/s) 5.76 ms (466.2 M/s)
80% 75.6 ms (29.8 M/s) 15.6 ms (144.7 M/s) 4.96 ms (784.4 M/s) 89.1 ms (25.3 M/s) 23.4 ms (96.1 M/s) 5.48 ms (458.2 M/s)

Key Observation: Old implementation degrades significantly with higher occupancy (8→75 ms on dev), while new implementation remains stable (14-16 ms without OpenMP, 4-5 ms with OpenMP).

Varying Inflation Radius (1000×1000 grid, 50% occupancy)

Radius Old Dev New OpenMP Off Dev New OpenMP On Dev Old Robot New OpenMP Off Robot New OpenMP On Robot
0.5m 11.2 ms (89.5 M/s) 6.78 ms (147.5 M/s) 2.45 ms (716.4 M/s) 14.8 ms (67.6 M/s) 11.1 ms (90.5 M/s) 2.55 ms (484.8 M/s)
1.0m 12.7 ms (78.9 M/s) 6.88 ms (145.5 M/s) 2.44 ms (717.3 M/s) 17.0 ms (58.9 M/s) 10.9 ms (91.6 M/s) 2.23 ms (518.3 M/s)
2.0m 14.6 ms (68.7 M/s) 6.84 ms (146.2 M/s) 2.56 ms (677.9 M/s) 20.1 ms (49.8 M/s) 11.1 ms (90.4 M/s) 2.21 ms (508.3 M/s)
3.0m 15.5 ms (64.6 M/s) 6.92 ms (144.5 M/s) 2.46 ms (710.3 M/s) 21.6 ms (46.4 M/s) 11.6 ms (85.9 M/s) 2.47 ms (474.5 M/s)
5.0m 16.9 ms (60.0 M/s) 6.96 ms (143.7 M/s) 2.67 ms (666.2 M/s) 23.3 ms (43.0 M/s) 11.2 ms (89.2 M/s) 2.49 ms (458.6 M/s)
10.0m 17.6 ms (56.7 M/s) 6.95 ms (143.8 M/s) 2.65 ms (657.8 M/s) 25.5 ms (39.3 M/s) 11.4 ms (88.1 M/s) 2.37 ms (472.6 M/s)

Key Observation: Old implementation shows 36% slowdown from smallest to largest radius (11.2→17.6 ms on dev). New implementation shows minimal variation (<3% difference).

Varying Cost Scale (1000×1000 grid, 50% occupancy, 2m radius)

Cost Scale Old Dev New OpenMP Off Dev New OpenMP On Dev Old Robot New OpenMP Off Robot New OpenMP On Robot
1.0 14.3 ms (70.1 M/s) 6.89 ms (145.1 M/s) 2.77 ms (651.6 M/s) 20.1 ms (49.8 M/s) 11.1 ms (89.9 M/s) 2.19 ms (503.2 M/s)
3.0 14.4 ms (69.5 M/s) 6.90 ms (144.9 M/s) 2.63 ms (686.5 M/s) 20.0 ms (50.1 M/s) 11.1 ms (90.1 M/s) 2.30 ms (492.1 M/s)
5.0 14.3 ms (70.2 M/s) 6.89 ms (145.2 M/s) 2.82 ms (655.3 M/s) 20.0 ms (50.1 M/s) 11.0 ms (90.8 M/s) 2.24 ms (496.9 M/s)
10.0 14.4 ms (69.6 M/s) 6.91 ms (144.7 M/s) 2.65 ms (660.0 M/s) 19.9 ms (50.3 M/s) 11.0 ms (90.6 M/s) 2.23 ms (523.8 M/s)

Key Observation: Cost scale factor has negligible impact on performance across all implementations.

Recommendations

Use new implementation with OpenMP enabled - Provides 6.5-15.3× speedup

✅ Even without OpenMP, new implementation is 1.1-3.5× faster

✅ Performance is more predictable and scales better with grid size

✅ Robot shows excellent speedup despite lower CPU frequency

✅ New implementation handles varying occupancy and inflation radii efficiently

Performance Summary Chart

Speedup Factor (vs Old Implementation)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Dev Machine (1000×1000):
Old:     ████ 1.0×
New-Off: █████████████ 3.5×
New-On:  ██████████████████████████████████████ 9.6×

Dev Machine (3333×3333):
Old:     ████ 1.0×
New-Off: ███████████ 2.7×
New-On:  ███████████████████████████████████████████████████████████ 15.3×

Robot (1000×1000):
Old:     ████ 1.0×
New-Off: ██████████ 2.6×
New-On:  ████████████████████████████████████████████████ 12.2×

Robot (3333×3333):
Old:     ████ 1.0×
New-Off: ████████ 2.0×
New-On:  ███████████████████████████████████████████████ 11.9×

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 87.65823% with 39 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nav2_costmap_2d/plugins/legacy_inflation_layer.cpp 86.89% 27 Missing ⚠️
...include/nav2_costmap_2d/legacy_inflation_layer.hpp 64.00% 9 Missing ⚠️
nav2_costmap_2d/plugins/inflation_layer.cpp 93.47% 3 Missing ⚠️
Files with missing lines Coverage Δ
..._2d/include/nav2_costmap_2d/distance_transform.hpp 100.00% <100.00%> (ø)
...map_2d/include/nav2_costmap_2d/inflation_layer.hpp 100.00% <100.00%> (ø)
...lude/nav2_costmap_2d/inflation_layer_interface.hpp 100.00% <100.00%> (ø)
nav2_mppi_controller/src/critics/cost_critic.cpp 84.41% <ø> (+0.20%) ⬆️
...2_mppi_controller/src/critics/obstacles_critic.cpp 81.39% <ø> (ø)
...2_smac_planner/include/nav2_smac_planner/utils.hpp 96.20% <100.00%> (+0.04%) ⬆️
nav2_costmap_2d/plugins/inflation_layer.cpp 97.77% <93.47%> (-1.77%) ⬇️
...include/nav2_costmap_2d/legacy_inflation_layer.hpp 64.00% <64.00%> (ø)
nav2_costmap_2d/plugins/legacy_inflation_layer.cpp 86.89% <86.89%> (ø)

... and 47 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tonynajjar tonynajjar changed the title OpenMP-based inflation layer New inflation layer with optional OpenMP acceleration Feb 5, 2026
@tonynajjar tonynajjar marked this pull request as ready for review February 5, 2026 18:01
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@SteveMacenski
Copy link
Member

SteveMacenski commented Feb 11, 2026

Yes, there is a problem due to a security PR. The security reports on github doesn't run CI (!!!!!!!!!) so I will be rejecting any patch made that way and requiring them to be public to run CI and properly reviewed. I've asked the contributor to make that fix but if I don't hear back in a couple of days I will do it myself. Ignore the message validation test failures for now, it is not your fault.

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last tangible comment.

Otherwise:

  • Some ~1 hour of experimentation with using Eigen operations to improve performance further would be nice
  • With the naming as it stands, I don't have any other issues. However, I wonder if we should make this the PotentialFieldInterfaceLayer and be more generic than specifically inflation. With that, getInflationRadius -> getPotentialFieldMaxDistance and getCostScalingFactor would need to return more generic weights or a lambda that represents the distance -> cost calculation for other methods to use.

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@tonynajjar
Copy link
Contributor Author

However, I wonder if we should make this the PotentialFieldInterfaceLayer and be more generic than specifically inflation.

Not a bad idea but we can also do the renaming once we have other layers that don't fit the description anymore

@SteveMacenski
Copy link
Member

OK - check DCO and otherwise LGTM after whatever optimization playgrounding / Dexory testing is required to make you comfortable with me merging this + the docs PR

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Feb 12, 2026

check DCO

I have auto-signoff in vscode, it just doens't work when I revert 😭

after whatever optimization playgrounding / Dexory testing

Will come back in a couple of weeks! If you want to shoutout to potential testers that also need a faster inflation layer, go ahead

btw did you play around with it or the benchmarks yourself?

@SteveMacenski
Copy link
Member

I did not - I trust you not to make up numbers 😉

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@SteveMacenski
Copy link
Member

LMK when you're ready for merge + link me back to your docs PR to merge at the same time :-)

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Mar 4, 2026

I tested (rather quickly) implementing some SIMD optimization with the help of Claude but it failed miserably (unreadable and not much performance improvement). If you have a certain direction in mind you can share. Otherwise, I'm ready to merge. I just fixed a merge conflict in ros-navigation/docs.nav2.org#861

cc @doisyg

@SteveMacenski
Copy link
Member

Codecov didn't upload, I want to see those before merging. I retriggered

@tonynajjar tonynajjar requested a review from SteveMacenski March 4, 2026 17:52
@SteveMacenski
Copy link
Member

@tonynajjar please update one of the nav2 system test configs / launch files to use the legacy inflation layer so it continues to be exercised. This removes all testing of it so its now completely untested 🙃

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@tonynajjar
Copy link
Contributor Author

@SteveMacenski re-added the legacy inflation tests. By the way, I also fixed a build error introduced by the rebase that was not detected by CI because the flag _OPENMP is not enabled. Would be nice to enable CI to build packages with all the possible cmake flag variations.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 4, 2026

not detected by CI because the flag _OPENMP is not enabled

Want to add that https://github.com/ros-navigation/navigation2/blob/main/.circleci/config.yml#L218-L221 or https://github.com/ros-navigation/navigation2/blob/main/.circleci/defaults.yaml?

Can we also have one of the system tests run the legacy version too?

@tonynajjar
Copy link
Contributor Author

Can we also have one of the system tests run the legacy version too?

Ah yeah

Want to add that https://github.com/ros-navigation/navigation2/blob/main/.circleci/config.yml#L218-L221 or https://github.com/ros-navigation/navigation2/blob/main/.circleci/defaults.yaml?

You mean have it build with OPENMP on by default? In this case there is barely any uncovered code for OPENMP disabled so we could go for it but in general I think we should be building with all possible values of flags.
For now I'm fine with turning OPENMP on.

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Mar 5, 2026

Now we get these warnings because ENABLE_OPENMP is set for all packages

Starting >>> dwb_plugins
--- stderr: dwb_plugins                                                    
CMake Warning:
  Manually-specified variables were not used by the project:

    ENABLE_OPENMP

Probably cleanest to use .meta files
https://colcon.readthedocs.io/en/released/user/configuration.html#package-specific-configuration
But since that infrastructure is not setup, I would ticket that for another time

Regarding the system test using legacy inflation layer, it works -> test 19 doesn't show the log OpenMP: 4 cores available, using 2 threads (auto) which means it's using the legacy inflation layer

@tonynajjar
Copy link
Contributor Author

@SteveMacenski again ready from my side

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 6, 2026

@tonynajjar disable openmp then for now; its not worth the trouble. We can deal with it later :-)

Then I can merge once the docs PR is un-conflicted 😉

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
@tonynajjar
Copy link
Contributor Author

@tonynajjar disable openmp then for now; its not worth the trouble. We can deal with it later :-)

done

Then I can merge once the docs PR is un-conflicted 😉

What do you mean, it's already merged: ros-navigation/docs.nav2.org#861

@SteveMacenski SteveMacenski merged commit 46172d7 into ros-navigation:main Mar 6, 2026
15 of 16 checks passed
@tonynajjar tonynajjar deleted the openmp-inflation-layer branch March 10, 2026 14:32
jplapp pushed a commit to logivations/navigation2 that referenced this pull request Mar 13, 2026
…#5933)

* OpenMP inflation layer

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* restart tests

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* disable openmp

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* enable flag

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* limit thread number

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* enable openmp for testing

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* check docker cores

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* set OMP_NUM_THREADS to 4

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* dynamic num_thread

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* remove openmp from packages.xml

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* turn on by default

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* hardcode COST_LUT_PRECISION

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* test out big test

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* add back legacy layer

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* set to 100

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* fix legacy inflation layer

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* Revert "fix legacy inflation layer"

This reverts commit 21fa149.

* off by default

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* base class approach

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* simplified interface

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* add test coverage

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* remove cpp file

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* Fix broken build

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* Add tests for legacy inflation layer

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* Enable OpenMP support and add legacy inflation layer test configuration

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

* lint

* lint

* remove  -DENABLE_OPENMP=ON

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

---------

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>
(cherry picked from commit 46172d7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants