Skip to content

[POC] Fix failing ros2param tests#632

Closed
hidmic wants to merge 3 commits intomasterfrom
hidmic/ros2param-fix-poc
Closed

[POC] Fix failing ros2param tests#632
hidmic wants to merge 3 commits intomasterfrom
hidmic/ros2param-fix-poc

Conversation

@hidmic
Copy link
Copy Markdown

@hidmic hidmic commented Apr 23, 2021

CI up to ros2param and ros2cli:

  • Linux Build Status
  • Linux repeated Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

hidmic added 3 commits April 22, 2021 18:41
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Copy Markdown
Author

hidmic commented Apr 23, 2021

Well, it's finally passing. I have yet to explain why this FIN_WAIT_2 state timeout made all the difference though. FYI @ivanpauno.

@hidmic
Copy link
Copy Markdown
Author

hidmic commented Apr 23, 2021

Alright, Ivan's still able to reproduce the failure. It appears it just made it less likely, so this won't do either.

@clalancette @cottsay what shall we do for Galactic? We have:

  • the original approach with SO_REUSEADDR, which caused flakes in ros2cli; or
  • Ensure only one daemon can run at a time #622 approach with SO_LINGER, which causes flakes in ros2param; or
  • an hypothetical third approach using file locks, which we haven't tried yet.

Do we keep trying? Do we stick to a known issue and defer resolution?

@cottsay
Copy link
Copy Markdown
Member

cottsay commented Apr 23, 2021

So the behavior that ros2cli will exhibit in Galactic is a flaky test, correct? How likely is it that an end-user would experience the bug "in the wild"?

@hidmic
Copy link
Copy Markdown
Author

hidmic commented Apr 23, 2021

So the behavior that ros2cli will exhibit in Galactic is a flaky test, correct? How likely is it that an end-user would experience the bug "in the wild"?

That depends on the implementation. With SO_REUSEADDR, you'd need two ros2 daemon start racing with each other. With SO_LINGER, I suspect you need to run into a simultaneous connection close situation (though that's mostly a guess, we don't really know why ros2param flakes happen).

@clalancette
Copy link
Copy Markdown
Contributor

Do we keep trying? Do we stick to a known issue and defer resolution?

I would say that we should still try to debug this for Galactic, but as a relatively low priority. So far it seems unlikely that a user will hit this in real-life, so I think it is worthwhile to debug the problem and figure out what is really going on before we make further changes.

@hidmic
Copy link
Copy Markdown
Author

hidmic commented Jul 12, 2021

Closing in favor of #652.

@hidmic hidmic closed this Jul 12, 2021
@hidmic hidmic deleted the hidmic/ros2param-fix-poc branch July 12, 2021 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants