Skip to content

Fix datarace on WriterProxy stop while perform_initial_ack_nack [16041]#3046

Merged
MiguelCompany merged 5 commits intomasterfrom
datarace/clear-writer-proxy
Nov 8, 2022
Merged

Fix datarace on WriterProxy stop while perform_initial_ack_nack [16041]#3046
MiguelCompany merged 5 commits intomasterfrom
datarace/clear-writer-proxy

Conversation

@juanlofer-eprosima
Copy link
Copy Markdown
Contributor

@juanlofer-eprosima juanlofer-eprosima commented Oct 26, 2022

Signed-off-by: Juan López Fernández juanlopez@eprosima.com

Description

A WriterProxy can be stopped (and cleared) from the reception thread (PDP::remove_remote_participant -> WLP::removeRemoteEndpoints -> StatefulReader::matched_writer_remove) while this same WriterProxy is executing perform_initial_ack_nack callback on the events thread. Both methods (WriterProxy::clear and WriterProxy::perform_initial_ack_nack) may access the WriterProxy's attributes concurrently without protection.

Taking the reader's mutex from within the callback is one of the possible fixes, but it must be carefully checked whether this would introduce any deadlocks.

Alternative solutions are:

  • Refactor ResourceEvent so that it is not possible to trigger an event after having cancelled (TimedEvent::cancel_timer) the corresponding TimedEvent.
  • Atomize WriterProxy's attributes and refactor perform_initial_ack_nack.
  • Take reader's mutex within perform_initial_ack_nack but with more granularity (unlocking before executing deadlock-prone instructions).
  • Add a new mutex to WriterProxy to prevent this particular data race.

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added.
  • Any new/modified methods have been properly documented using Doxygen.
  • Fast DDS test suite has been run locally.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • Documentation builds and tests pass locally.
  • New feature has been added to the versions.md file (if applicable).
  • New feature has been documented/Current behavior is correctly described in the documentation.

Reviewer Checklist

  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@MiguelCompany
Copy link
Copy Markdown
Member

I think this introduces a deadlock. I would suggest adding a new API to TimedEvent that does the following:

void TimedEvent::recreate_timer()
{
    service_.unregister_timer(impl_);
    impl_->go_cancel();
    service_.register_timer(impl_);
}

And call that one on WriterProxy::stop()

@juanlofer-eprosima juanlofer-eprosima changed the title Fix datarace on WriterProxy stop while perform_initial_ack_nack Fix datarace on WriterProxy stop while perform_initial_ack_nack [16041] Oct 27, 2022
@juanlofer-eprosima juanlofer-eprosima force-pushed the datarace/clear-writer-proxy branch 2 times, most recently from 35f0823 to 28401fd Compare October 27, 2022 07:46
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
@juanlofer-eprosima juanlofer-eprosima force-pushed the datarace/clear-writer-proxy branch from 046c1be to 11d4a3a Compare October 27, 2022 14:38
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
@MiguelCompany
Copy link
Copy Markdown
Member

@richiprosima Please test this

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
@juanlofer-eprosima
Copy link
Copy Markdown
Contributor Author

@richiprosima Please test this

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
@MiguelCompany
Copy link
Copy Markdown
Member

@richiprosima Please test this

Copy link
Copy Markdown
Member

@MiguelCompany MiguelCompany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with green CI

@MiguelCompany MiguelCompany added this to the v2.8.1 milestone Nov 8, 2022
@MiguelCompany MiguelCompany added the ci-pending PR which CI is running label Nov 8, 2022
@MiguelCompany MiguelCompany merged commit 3e0d1d6 into master Nov 8, 2022
@MiguelCompany MiguelCompany deleted the datarace/clear-writer-proxy branch November 8, 2022 10:07
MiguelCompany added a commit that referenced this pull request Nov 10, 2022
…ck (#3046)"

This reverts commit 3e0d1d6.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
MiguelCompany added a commit that referenced this pull request Nov 11, 2022
This reverts commit 3e0d1d6.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
jparisu pushed a commit that referenced this pull request Dec 7, 2022
This reverts commit a374872.
EduPonz pushed a commit that referenced this pull request Dec 12, 2022
This reverts commit a374872.
MiguelCompany added a commit that referenced this pull request Dec 13, 2022
)

* Revert "Revert #3046 (#3083)"

This reverts commit a374872.

* Add regression test

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* New writer proxies manipulation mutex

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Revert "New writer proxies manipulation mutex"

This reverts commit b18a5272dd8f7d2bc1f01eda0134b8d572ea86a1.

* Unlock reader's mutex before stopping WriterProxy

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Refs #16341. Fix link error on unit test

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <miguelcompany@eprosima.com>
roscan-tech pushed a commit to roscan-tech/Fast-DDS that referenced this pull request Sep 6, 2024
…sima#3046)

* Fix datarace on WriterProxy stop while perform_initial_ack_nack

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Add test

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Uncrustify

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Recreate timer only if being triggered

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

* Apply suggestions

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>

Signed-off-by: Juan López Fernández <juanlopez@eprosima.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-pending PR which CI is running

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants