Skip to content

[DNM] Performance changes to reduce false sharing - for discussion.#9431

Closed
joemario wants to merge 1 commit intoceph:hammerfrom
joemario:jpm_false_sharing
Closed

[DNM] Performance changes to reduce false sharing - for discussion.#9431
joemario wants to merge 1 commit intoceph:hammerfrom
joemario:jpm_false_sharing

Conversation

@joemario
Copy link
Copy Markdown

@joemario joemario commented Jun 1, 2016

This commit demonstrates the benefit of reducing cacheline contention, in two ways:

  1. Align classes and contended hot data members.
  2. Demonstrate how expensive OpTracker::_mark_event can be.

This commit is not to be checked in as-is.
Although I am not a Ceph developer, I have been able to realize worthwhile performance gains from these changes. Ben England and I would like to share these findings at a upcoming weekly Ceph performance meeting.

1) First is aligning classes and contended hot data members.
2) Demonstrate the expense OpTracker::_mark_event can be,
   and discuss opportunities.
@tchaikov tchaikov changed the title Performance changes to reduce false sharing - for discussion. [DNM] Performance changes to reduce false sharing - for discussion. Jun 2, 2016
@liewegas
Copy link
Copy Markdown
Member

@joemario I'd would love to resurrect this and get the simple changes (e.g., cacheline alignements) into mergeable patches...

@bengland2
Copy link
Copy Markdown
Contributor

I went back and found the presentation (because Mark indexed all the recordings, thank you Mark).

https://bluejeans.com/playback/s/ha97eh8zOC5OPKy63VBWM1XSIDCAs6UVV8dWWhu04RyvrRjOuEG13JZ6dEpOiczK
(use Chrome to play)

slides at:
http://people.redhat.com/jmario/ceph/false-sharing_findings.odp

Joe's primary finding was that the C++ iostream class usage in Ceph's logging code was the primary cause of cacheline contention, or "false sharing". Joe's PR was there so he could present an analysis of exactly where the problem was in the Ceph code, and was admittedly not ready to be pulled in as is. But a huge part of the work of performance optimization is analysis and Joe did just about all that was needed there IMHO. Is there a Ceph developer who could help? This probably needs to be broken up into commit(s) to remove the iostream problem and commit(s) to add the cacheline alignment to Ceph data structures. IMHO optimizations like this are extremely relevant to help Ceph keep up with faster storage and networks.

@joemario
Copy link
Copy Markdown
Author

joemario commented Mar 31, 2017 via email

@bengland2
Copy link
Copy Markdown
Contributor

{
if (!tracking_enabled)
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joemario Please keep PR title < 50 characters for future PRs, also prepend component name before PR in PR title.
https://github.com/ceph/ceph/blob/master/SubmittingPatches.rst#title-of-pull-requests-and-title-of-commits

@liewegas
Copy link
Copy Markdown
Member

@aclamk do you mind looking at this and seeing if we can/should apply it?

@aclamk
Copy link
Copy Markdown
Contributor

aclamk commented Sep 14, 2017

@liewegas
Idea is sound. Having a cache entry changed by one CPU and read by another is known to be slower.
Strategies like the one proposed here are likely to slightly improve performance.
Downside: it is difficult to measure precisely.

Implementation could use some polishing:

  1. hot variables that are used together should go into one cache raw, for example
    pthread_mutex_t m_queue_mutex attribute((aligned (64)));
    pthread_t m_queue_mutex_holder attribute((aligned (64)));
    should become a struct
  2. I guess alignas should replace attribute((aligned))
  3. we can't skip _mark_event method, can we?

Should I make adaptation of this for master?

@joemario
Copy link
Copy Markdown
Author

joemario commented Sep 14, 2017 via email

@stale
Copy link
Copy Markdown

stale bot commented Oct 18, 2018

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Oct 18, 2018
@batrick batrick closed this Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants