Skip to content

[19256] Remove mutex from TimedEventImpl#3745

Merged
MiguelCompany merged 2 commits intoeProsima:masterfrom
cvilas:vilas/lockless_timedevent
Jul 27, 2023
Merged

[19256] Remove mutex from TimedEventImpl#3745
MiguelCompany merged 2 commits intoeProsima:masterfrom
cvilas:vilas/lockless_timedevent

Conversation

@cvilas
Copy link
Copy Markdown
Contributor

@cvilas cvilas commented Jul 24, 2023

Eliminates the overhead of mutex locking in a method in TimeEventImpl that is frequently called.

Description

While investigating scalability of discovery service on a large distributed system, hotspot analysis using Intel vtune showed that most time is spent with mutexes. Further analysis identifies a particular culprit whose removal increases the performance by 2x.

The following is from a machine with Haswell CPU running a SERVER participant. Evidently, 50% of the compute is spent locking and unlocking a mutex. The CPU consumption measured by top is of the order of 37%.

hotspot

The performance analysis with vtune allows the construction of a so-called flamegraph that makes it very easy to understand the caller-callee relationship. The graph makes it very clear that the vast majority of the mutex locks and unlocks can be attributed to the function eProsima::fastrtps::rtps::event_compare.

flame_graph

Looking at the source code we can see that the mutex is locked and unlocked during a sort operation:

// src/cpp/rtps/resources/TimedEventImpl.h
std::chrono::steady_clock::time_point next_trigger_time()
{   
    std::unique_lock<std::mutex> lock(mutex_);
    return next_trigger_time_;
}

// src/cpp/rtps/resources/ResourceEvent.cpp
static bool event_compare(
        TimedEventImpl* lhs,
        TimedEventImpl* rhs)
{
    return lhs->next_trigger_time() < rhs->next_trigger_time();
}

void ResourceEvent::sort_timers()
{
    std::sort(active_timers_.begin(), active_timers_.end(), event_compare);
}

The mutex protects a couple of data members that can be wrapped in std::atomic instead. After eliminating the mutex, the flame graph becomes

flame_graph_after

At least in our use case, we are able to see an improvement in how well the discovery service scales without the lock.

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • N/A Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally
  • Any new/modified methods have been properly documented using Doxygen.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • N/A New feature has been added to the versions.md file (if applicable).
  • N/A New feature has been documented/Current behavior is correctly described in the documentation.
  • Applicable backports have been included in the description.

Reviewer Checklist

  • The PR has a milestone assigned.
  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@Mergifyio backport 2.11.x 2.10.x 2.6.x

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
@MiguelCompany
Copy link
Copy Markdown
Member

@richiprosima Please test mac

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Copy link
Copy Markdown
Member

@MiguelCompany MiguelCompany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cvilas for such a well-crafted description.

I fully agree that using atomics here is better than a mutex, especially since next_trigger_time_ is always updated from the same thread.

I added a commit to please the linter, but other than that, the code changes look good.

@MiguelCompany MiguelCompany added this to the v2.12.0 milestone Jul 25, 2023
@MiguelCompany MiguelCompany added the ci-pending PR which CI is running label Jul 25, 2023
@MiguelCompany MiguelCompany changed the title Removes mutex from TimedEventImpl [19256] Remove mutex from TimedEventImpl Jul 25, 2023
@cvilas
Copy link
Copy Markdown
Contributor Author

cvilas commented Jul 25, 2023

Thank you @cvilas for such a well-crafted description.

Thanks @MiguelCompany. Credit goes to https://github.com/iamholger for identifying the issue and proposing a potential fix. I am just the messenger, as he is away from work at the moment.

@MiguelCompany
Copy link
Copy Markdown
Member

@Mergifyio backport 2.11.x 2.10.x 2.6.x

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jul 25, 2023

backport 2.11.x 2.10.x 2.6.x

✅ Backports have been created

Details

@MiguelCompany MiguelCompany added ready-to-merge Ready to be merged. CI and changes have been reviewed and approved. and removed ci-pending PR which CI is running labels Jul 25, 2023
@MiguelCompany
Copy link
Copy Markdown
Member

MiguelCompany commented Jul 27, 2023

Thanks to @iamholger, then! Adding him as co-author on the merge commit!

@MiguelCompany MiguelCompany merged commit a7697c7 into eProsima:master Jul 27, 2023
mergify bot pushed a commit that referenced this pull request Jul 27, 2023
* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)
mergify bot pushed a commit that referenced this pull request Jul 27, 2023
* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)
mergify bot pushed a commit that referenced this pull request Jul 27, 2023
* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)
MiguelCompany pushed a commit that referenced this pull request Aug 1, 2023
* [19256] Remove mutex from TimedEventImpl (#3745)

* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)

* Added atomic time_point initialization (#3760)

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>

---------

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Co-authored-by: Vilas Kumar Chitrakaran <cvilas@gmail.com>
Co-authored-by: jsantiago-eProsima <90755661+jsantiago-eProsima@users.noreply.github.com>
MiguelCompany pushed a commit that referenced this pull request Aug 7, 2023
* [19256] Remove mutex from TimedEventImpl (#3745)

* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)

* Added atomic time_point initialization (#3760)

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>

---------

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Co-authored-by: Vilas Kumar Chitrakaran <cvilas@gmail.com>
Co-authored-by: jsantiago-eProsima <90755661+jsantiago-eProsima@users.noreply.github.com>
EduPonz pushed a commit that referenced this pull request Aug 10, 2023
* [19256] Remove mutex from TimedEventImpl (#3745)

* Removes mutex from TimedEventImpl

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>

* Please linters.

Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>

---------

Signed-off-by: Vilas Chitrakaran <cvilas@gmail.com>
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Miguel Company <MiguelCompany@eprosima.com>
Co-authored-by: Holger Schulz <holger.schulz@durham.ac.uk>
(cherry picked from commit a7697c7)

* Added atomic time_point initialization (#3760)

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>

---------

Signed-off-by: Javier Santiago <javiersantiago@eprosima.com>
Co-authored-by: Vilas Kumar Chitrakaran <cvilas@gmail.com>
Co-authored-by: jsantiago-eProsima <90755661+jsantiago-eProsima@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Ready to be merged. CI and changes have been reviewed and approved.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants