Skip to content

PSM: Fix handling of flag state_update_pending_#3682

Merged
v4hn merged 7 commits intomoveit:masterfrom
ubi-agni:execute_while_planning
Jan 10, 2025
Merged

PSM: Fix handling of flag state_update_pending_#3682
v4hn merged 7 commits intomoveit:masterfrom
ubi-agni:execute_while_planning

Conversation

@rhaschke
Copy link
Copy Markdown
Contributor

@rhaschke rhaschke commented Jan 7, 2025

Follow up on #3676, implementing @v4hn's suggestion to update state_update_pending_ and last_robot_state_update_wall_time_ only within updateSceneWithCurrentState() - where the update is actually performed.

This nicely simplifies all the code 🎉

@rhaschke rhaschke requested a review from v4hn January 7, 2025 15:40
@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Jan 7, 2025

@rr-mark: Could you please test and review as well?

@rhaschke rhaschke changed the title Execute while planning PSM: Fix handling of flag state_update_pending_ Jan 7, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 7, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 48.00000% with 13 lines in your changes missing coverage. Please review.

Project coverage is 47.46%. Comparing base (643b43f) to head (2aaa516).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...nning_scene_monitor/src/planning_scene_monitor.cpp 48.00% 13 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3682       +/-   ##
===========================================
+ Coverage    0.00%   47.46%   +47.46%     
===========================================
  Files         600      604        +4     
  Lines       59271    61629     +2358     
  Branches     7031     7034        +3     
===========================================
+ Hits            0    29247    +29247     
+ Misses      59271    31964    -27307     
- Partials        0      418      +418     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@v4hn v4hn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this for quite a while now and am still not entirely convinced it works as it should with this patch (nor did I analyze the previous code in detail).

Key insights:

  • onStateUpdate, all direct ROS callbacks the CSM and PSM use, and stateUpdateTimerCallback are all served from a single threaded AsyncSpinner. Thus they are effectively serialized and don't need a mutex at all.
  • the offenders which might be called from other threads are
    • setStateUpdateFrequency
      • it is not used anywhere to the best of my knowledge and is a bad concept from my perspective - as one could just require stopping the monitor to change the rate
      • but it is a public API
      • this requires protecting dt_state_update_
    • waitForCurrentState
      • which users likely regularly, but not excessively, call
      • this can call (the equally public API) updateSceneWithCurrentState from outside the spinner and can thus affect last_robot_state_update_wall_time_, requiring the mutex.

@riv-mjohnson
Copy link
Copy Markdown
Contributor

@rr-mark: Could you please test and review as well?

Can do. I'm busy this morning, but should get to moveit things this afternoon (UK time).

Should I wait for this PR to go through before porting my PR to moveit2, or do you think it makes more sense to port them separately?

@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Jan 8, 2025

Should I wait for this PR to go through before porting my PR to moveit2, or do you think it makes more sense to port them separately?

@rr-mark Yes, you should wait for final approval / merging of this PR before tackling the ROS2 port.

Copy link
Copy Markdown
Contributor

@riv-mjohnson riv-mjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and seems to work on our system in the brief testing I've done. When it's merged I'll put it through our stress tests, which will hopefully throw up any problems in real usage.

My only suggestion (likely for future work rather than this PR) is it might be helpful to factor out all the checks to when updateSceneWithCurrentState() should be called (checking state_update_pending_ and/or now() - last_update_ >= dt_state_update_) into a single method, so it's obvious why which checks are happening when.

@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Jan 9, 2025

I will fix the formatting issue by upgrading to clang-format-12 or later after my current meeting.

@rhaschke rhaschke force-pushed the execute_while_planning branch from e7d6eab to d91eb77 Compare January 9, 2025 14:06
rhaschke and others added 7 commits January 9, 2025 15:27
... in onStateUpdate() - where it is easily possible.

Reading it in stateUpdateTimerCallback() is unprotected.
However, missing an update or performing an extra one, because the variable is corrupted, is not a big deal.
Co-authored-by: Michael Görner <me@v4hn.de>
While there are a few other locks except explicit user locks (getPlanningSceneServiceCallback(), collisionObjectCallback(), attachObjectCallback(), newPlanningSceneCallback(), and scenePublishingThread()), these occur rather seldom 
(scenePublishingThread() publishes at 2Hz).

Hence, we might indeed balance a non-blocking CSM vs. missed PS updates in favour of CSM.

Co-authored-by: Michael Görner <me@v4hn.de>
The timer callback and CSM's state update callbacks are served from the same callback queue, which would block CSM again.

Co-authored-by: Michael Görner <me@v4hn.de>
As @rhaschke suggested, reading dt_state_update_ and last_robot_state_update_wall_time_
does not lead to logic errors, but at most to a skipped or redundant update on corrupted data.
Alternatively we could be on the safe side and turn both variables into std::atomic, but that
would effectively mean locks on every read.

Instead, only set state_update_pending_ as an atomic, which is lockfree in this case.
@rhaschke rhaschke force-pushed the execute_while_planning branch from d91eb77 to 2aaa516 Compare January 9, 2025 14:28
@rhaschke
Copy link
Copy Markdown
Contributor Author

rhaschke commented Jan 9, 2025

@v4hn Can you give this a final review, please?

@v4hn v4hn merged commit b71e5f5 into moveit:master Jan 10, 2025
@riv-mjohnson
Copy link
Copy Markdown
Contributor

Thanks for these fixes.

I'll look into porting this PR and my PR to moveit2 next week.

@rhaschke rhaschke deleted the execute_while_planning branch January 12, 2025 20:10
@riv-mjohnson
Copy link
Copy Markdown
Contributor

Thanks for these fixes.

I'll look into porting this PR and my PR to moveit2 next week.

It's looking like this will happen next week now.

rhaschke added a commit to ubi-agni/moveit that referenced this pull request Feb 6, 2025
* Move update of state_update_pending_ to updateSceneWithCurrentState()

* Revert to try_lock

While there are a few other locks except explicit user locks (getPlanningSceneServiceCallback(), collisionObjectCallback(), attachObjectCallback(), newPlanningSceneCallback(), and scenePublishingThread()), these occur rather seldom 
(scenePublishingThread() publishes at 2Hz).

Hence, we might indeed balance a non-blocking CSM vs. missed PS updates in favour of CSM.

* Don't block for scene update from stateUpdateTimerCallback too

The timer callback and CSM's state update callbacks are served from the same callback queue, which would block CSM again.

* further locking adaptations

reading dt_state_update_ and last_robot_state_update_wall_time_
does not lead to logic errors, but at most to a skipped or redundant update on corrupted data.
Alternatively we could be on the safe side and turn both variables into std::atomic, but that
would effectively mean locks on every read.

Instead, only set state_update_pending_ as an atomic, which is lockfree in this case.

Co-authored-by: Michael Görner <me@v4hn.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants