Skip to content

Improve robot state update#3548

Closed
rhaschke wants to merge 2 commits intomoveit:masterfrom
ubi-agni:improve-robot-state-update
Closed

Improve robot state update#3548
rhaschke wants to merge 2 commits intomoveit:masterfrom
ubi-agni:improve-robot-state-update

Conversation

@rhaschke
Copy link
Copy Markdown
Contributor

Inspired by the results of moveit/moveit2#2628, I changed matrix multiplication in RobotState updates from affine*matrix to matrix*matrix.

New benchmarks show that this is faster surprisingly (needs more operations, but can be vectorized better?)

benchmark old new speed up
RobotStateBenchmark/update/10 0.007 ms 0.006 ms +16.6 %
RobotStateBenchmark/update/100 0.068 ms 0.066 ms +3.0 %
RobotStateBenchmark/update/1000 0.710 ms 0.668 ms +6.2 %
RobotStateBenchmark/update/10000 10.4 ms 9.28 ms +12.0 %

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 21, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (286f6b3) 62.17% compared to head (e47547a) 62.14%.

Files Patch % Lines
moveit_core/robot_state/src/robot_state.cpp 80.00% 2 Missing ⚠️
moveit_core/utils/src/robot_model_test_utils.cpp 66.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3548      +/-   ##
==========================================
- Coverage   62.17%   62.14%   -0.03%     
==========================================
  Files         385      385              
  Lines       34138    34139       +1     
==========================================
- Hits        21222    21211      -11     
- Misses      12916    12928      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

New benchmarks show that this is faster surprisingly
(needs more operations, but can be vectorized better?)

benchmark                               | old      | new
----------------------------------------|----------|----------
RobotStateBenchmark/update/10           | 0.007 ms | 0.006 ms
RobotStateBenchmark/update/100          | 0.068 ms | 0.066 ms
RobotStateBenchmark/update/1000         | 0.710 ms | 0.668 ms
RobotStateBenchmark/update/10000        | 10.4 ms  | 9.28 ms
Copy link
Copy Markdown
Contributor

@marioprats marioprats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!
We should apply this to the ros2 branch as well!

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented Jan 19, 2024

Currently I cannot reproduce these improvements consistently on my i5 laptop. If they are there, they drown in noise.

  • I built 838ec18 and 0d11ec9 respectively in release workspaces
  • ran both benchmarks via
$ ./devel/lib/moveit_core/robot_state_benchmark --benchmark_out=/tmp/robot_state_benchmark/matrix_**{matrix|affine}**.json --benchmark_filter=.*/update/.* --benchmark_repetitions=25

and see these analysis results on two different comparison runs (leaving out tests with pvalue > 0.05):

$ /usr/share/benchmark/compare.py benchmarks matrix_affine.json matrix_matrix.json | grep '_median\|pvalue'

Comparing matrix_affine.json to matrix_matrix.json
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
RobotStateBenchmark/update/10_pvalue                    0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10_median                   +0.0690         +0.0690             0             0             0             0
RobotStateBenchmark/update/100_pvalue                   0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/100_median                  +0.0707         +0.0702             0             0             0             0
RobotStateBenchmark/update/1000_pvalue                  0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/1000_median                 +0.0491         +0.0491             1             1             1             1
RobotStateBenchmark/update/10000_pvalue                 0.0008          0.0008      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10000_median                -0.0125         -0.0125             9             9             9             9
RobotStateBenchmark/update/10_pvalue                     0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10_median                    +0.0708         +0.0707             0             0             0             0
-
RobotStateBenchmark/update/1000_pvalue                   0.0003          0.0003      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/1000_median                  -0.0091         -0.0092             1             1             1             1
-
RobotStateBenchmark/update/100000_pvalue                 0.0066          0.0070      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/100000_median                -0.0041         -0.0042           124           124           124           124

Any comment on what I might do to get your results @rhaschke ?

I very much like the google benchmark transition, but I don't think littering the code base with .matrix seems reasonable when the scale is so tiny and more strongly affected by other dynamics.

@rhaschke
Copy link
Copy Markdown
Contributor Author

I cannot find those commit hashes. Thus, I'm not sure what you compared. Can you please clarify?
I agree, that we should adapt the code only if there is significant performance improvement.

@rhaschke
Copy link
Copy Markdown
Contributor Author

littering the code base with .matrix()

Note that I essentially replaced affine() with matrix(). So, no extra littering 😉

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented Jan 24, 2024

I built 838ec18 and 0d11ec9 respectively in release workspaces

I cannot find those commit hashes. Thus, I'm not sure what you compared. Can you please clarify?
I agree, that we should adapt the code only if there is significant performance improvement.

Obviously you couldn't find the hashes. I tried to be complete but forgot I rebased your branch locally... 🤦
I refer to the last two commits here: https://github.com/v4hn/moveit/commits/improve-robot-state-update

@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Feb 6, 2024

Ok. I did the very same as you did: i) running the benchmarks on both commits and saving the results as .json, ii) comparing the results. On my i7 I get significant improvements throughout:

$ /usr/share/benchmark/compare.py benchmarks matrix_affine.json matrix_matrix.json | grep '_median\|pvalue'
RobotStateBenchmark/update/10_pvalue                    0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10_median                   -0.6024         -0.6021             1             0             1             0
RobotStateBenchmark/update/100_pvalue                   0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/100_median                  -0.6030         -0.6028             9             3             9             3
RobotStateBenchmark/update/1000_pvalue                  0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/1000_median                 -0.6019         -0.6017            87            35            87            35
RobotStateBenchmark/update/10000_pvalue                 0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10000_median                -0.6011         -0.6011           912           364           912           364

Maybe, this is a i5 vs. i7 optimization thing? On your i5, the differences are not significant most of the time.
Maybe my i7 has better optimization for 4x4 matrix assignment compared to 3x4 into 4x4 assignment (what is required by .affine()).

@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Feb 6, 2024

I have to correct myself: Initially I have build with RelWithDebugInfo. Repeating everything with a Release build I get completely different numbers (1-2% slowdown):

$ /usr/share/benchmark/compare.py benchmarks matrix_affine.json matrix_matrix.json | grep '_median\|pvalue'
RobotStateBenchmark/update/10_pvalue                    0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10_median                   +0.0196         +0.0198             0             0             0             0
RobotStateBenchmark/update/100_pvalue                   0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/100_median                  +0.0142         +0.0143             0             0             0             0
RobotStateBenchmark/update/1000_pvalue                  0.0000          0.0000      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/1000_median                 +0.0181         +0.0183             1             1             1             1
RobotStateBenchmark/update/10000_pvalue                 0.0006          0.0006      U Test, Repetitions: 25 vs 25
RobotStateBenchmark/update/10000_median                -0.0104         -0.0104            10            10            10            10

I have no idea where this strong difference comes from. Obviously, optimization is strongly different in both cases.

@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Feb 6, 2024

Given the last result, I will close this PR.

@rhaschke rhaschke closed this Feb 6, 2024
@rhaschke rhaschke deleted the improve-robot-state-update branch February 6, 2024 18:15
@rhaschke rhaschke restored the improve-robot-state-update branch February 6, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants