Skip to content

Fix datarace on WriterProxy stop while TimedEvent being triggered [16341]#3097

Merged
MiguelCompany merged 6 commits intomasterfrom
datarace/perform-initial-acknack
Dec 13, 2022
Merged

Fix datarace on WriterProxy stop while TimedEvent being triggered [16341]#3097
MiguelCompany merged 6 commits intomasterfrom
datarace/perform-initial-acknack

Conversation

@juanlofer-eprosima
Copy link
Copy Markdown
Contributor

Description

This PR is a revision of #3046, in which the data race between WriterProxy::stop and perform_initial_acknack was addressed by forcing the thread invoking the former to wait for the completion of the latter. However, a deadlock may occur as the reader waiting for the event to end does so with Endpoint::mp_mutex locked, which is also locked when sending an acknack at perform_initial_acknack.

Two different solutions are suggested:

  • Introducing a new mutex in StatefulReader which is taken before reader's, and released after modifying the writer proxy's state and before waiting for the executing event to finish.
  • Releasing reader's mutex before stopping a WriterProxy, and locking it again after doing so. This solution implies that StatefulReader::matched_writer_add and StatefulReader::matched_writer_remove may no longer be executed "atomically", a relevant change that should be carefully analyzed.

In addition, this PR also addresses the potential (rarely or never seen in our tests) datarace that may occur between WriterProxy::stop and perform_heartbeat_response, as StatefulReader::send_acknack reads attributes from WriterProxy which are modified when being stopped.

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added.
  • Any new/modified methods have been properly documented using Doxygen.
  • Fast DDS test suite has been run locally.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • Documentation builds and tests pass locally.
  • New feature has been added to the versions.md file (if applicable).
  • New feature has been documented/Current behavior is correctly described in the documentation.

Reviewer Checklist

  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@juanlofer-eprosima juanlofer-eprosima changed the title Fix datarace on WriterProxy stop while TimedEvent being triggered Fix datarace on WriterProxy stop while TimedEvent being triggered [16341] Nov 28, 2022
@EduPonz EduPonz added this to the v2.9.0 milestone Dec 7, 2022
This reverts commit a374872.
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
This reverts commit b18a5272dd8f7d2bc1f01eda0134b8d572ea86a1.
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
@jparisu jparisu force-pushed the datarace/perform-initial-acknack branch from d3a4b3a to 8f7bc26 Compare December 7, 2022 08:50
@jsan-rt
Copy link
Copy Markdown
Contributor

jsan-rt commented Dec 7, 2022

@richiprosima please test this

@jsan-rt jsan-rt added the ci-pending PR which CI is running label Dec 7, 2022
jsan-rt
jsan-rt previously approved these changes Dec 9, 2022
@EduPonz
Copy link
Copy Markdown

EduPonz commented Dec 10, 2022

@richiprosima please test this

@MiguelCompany
Copy link
Copy Markdown
Member

MiguelCompany commented Dec 12, 2022

CI for 2.9.0-rc:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status
  • Discovery server Build Status

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
@MiguelCompany MiguelCompany merged commit edf8ef7 into master Dec 13, 2022
@MiguelCompany MiguelCompany deleted the datarace/perform-initial-acknack branch December 13, 2022 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-pending PR which CI is running

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants