Skip to content

common/log: Speed improvement for log. #17938

Closed
aclamk wants to merge 3 commits intoceph:masterfrom
aclamk:wip-log-reuse-streambuf
Closed

common/log: Speed improvement for log. #17938
aclamk wants to merge 3 commits intoceph:masterfrom
aclamk:wip-log-reuse-streambuf

Conversation

@aclamk
Copy link
Copy Markdown
Contributor

@aclamk aclamk commented Sep 25, 2017

streambuf and ostream are reused as much as possible.

Signed-off-by: Adam Kupczyk akupczyk@redhat.com

@jcsp
Copy link
Copy Markdown
Contributor

jcsp commented Sep 25, 2017

Are there any benchmarks that show the speedup?

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
@aclamk aclamk force-pushed the wip-log-reuse-streambuf branch from d3ae910 to a059e46 Compare September 26, 2017 11:27
@aclamk
Copy link
Copy Markdown
Contributor Author

aclamk commented Sep 26, 2017

@jcsp
I added test.
Before:
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1657 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (59 ms)
After:
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1306 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (54 ms)

@jcsp
Copy link
Copy Markdown
Contributor

jcsp commented Sep 28, 2017

Seems to me like a reasonable use of thread local data to cut down on allocations.

Any other opinions? @liewegas ?

@tchaikov
Copy link
Copy Markdown
Contributor

tchaikov commented Oct 2, 2017

@aclamk
Copy link
Copy Markdown
Contributor Author

aclamk commented Oct 3, 2017

@tchaikov
Reading from http://pulpito.ceph.com/kchai-2017-10-01_17:38:10-rados-wip-kefu-testing-2017-10-01-2202-distro-basic-mira/ .

1692886 : Failed on: "cbt - Final Pool Health Check."
1692795: Failed to run "osd-scrub-repair.sh"
1692959: "RuntimeError: Scrubbing terminated -- not all pgs were active and clean."

And additional 5 jobs were running happily for 12h before being terminated.

Should I proceed to clean this problems up ?

@tchaikov
Copy link
Copy Markdown
Contributor

tchaikov commented Oct 3, 2017

@aclamk the reason why i didn't merge your PR is that i wanted to review it before doing so. but neither did i want to to be a blocker of it. so i leave the link to the rados qa tests. because none of the failures is relevant to your PR, so i didn't add a negative review.

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Oct 4, 2017

hi @aclamk, i'm having trouble figuring out what exactly was slow about current logging, and how this change set addresses it. can you explain a bit more about what you learned from your investigation?

PrebufferedStreambuf is such a simple and useful class on its own, and i worry that these changes are so specific to logging that it would be less useful for other things

: public std::basic_streambuf<char, std::basic_string<char>::traits_type>
{
prebuffered_data* m_data;
std::ostream os;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems backwards for the streambuf to contain an ostream, instead of the other way around. why can't this stay on the stack in the dout_impl macro?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really seems wierd that streambuf specialization has and ostream.
I did it just to avoid specialization of ostream, just to have streambuf in it.

Reusing ostream and streambuf between logs is the gist of this change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. so the ostream and streambuf will both be thread_local and have the same lifetime 👍


PrebufferedStreambuf(): m_data(nullptr), os(this) {};
~PrebufferedStreambuf();
static PrebufferedStreambuf* get_streambuf(prebuffered_data* data);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this factory function is the only way to construct a valid PrebufferedStreambuf, the constructor and destructor should be made private

this unfortunately means that you can't ever put a PrebufferedStreambuf on the stack. so if you ever want more than one in a thread, you're stuck with dynamic allocation

Copy link
Copy Markdown
Contributor Author

@aclamk aclamk Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Those should be private.
The dynamic allocation is performed once per thread, and in seldom cases when we log from inside log.

@aclamk
Copy link
Copy Markdown
Contributor Author

aclamk commented Oct 5, 2017

@cbodley
This pull request stemmed from joemarios work on false sharing. ( #9431 )
Initial analysis showed that there is a lot of false sharing on elements of locale. (http://en.cppreference.com/w/cpp/locale/locale)
This was impossible to get rid of, because first element of locale is atomic reference counter, that was modified on creation of every streambuf and ostream.
This false sharing turned out to have little impact on performance, however to prove this, I made this modification to logging, which seems to provide speed up.
The rationale here is to skip creation and destruction of streambuf and ostream, as they are reused in each log.
The additional logic is responsible for creating extra streambuf and ostream, if we happen to call function that produces logs during construction of log.

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Oct 5, 2017

ok thanks @aclamk! i ran a profiler on the unit test and saw that, without your changes, the std::locale ctor/dtor were each using ~3-5% of the cpu time. so your strategy to reuse them makes sense

would you object to making this a separate class though, like ThreadLocalPrebufferedStreambuf or something? then PrebufferedStreambuf in its current form could be reused for simple things without interfering with logging performance

@liewegas
Copy link
Copy Markdown
Member

liewegas commented Nov 2, 2017

ping

@aclamk
Copy link
Copy Markdown
Contributor Author

aclamk commented Nov 2, 2017

@cbodley
I have not seen PrebufferedStreambuf used anywhere else, but I agree that making it backward-compatilble would be nicer.
I will do.
@liewegas
Thanks for the ping.

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Nov 3, 2017

thanks @aclamk, there are a lot of stringstreams in rgw that could take advantage of it someday :)

…usage

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
@tchaikov
Copy link
Copy Markdown
Contributor

@aclamk i updated your PR, and posted the updated version at #19100. could you please take a look?

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Copy link
Copy Markdown
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclamk please see the comment at #19100 (comment). we can clean this up a little bit before merging it.

: m_buf(buf), m_buf_len(buf_len), m_pptr(nullptr) {}

/// return a string copy (inefficiently)
std::string get_str() const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see https://github.com/ceph/ceph/pull/19100/files#diff-8dcb74089b1227130d768a47852508bdR48, get_str(), size() and snprintf() methods are not needed here.

@aclamk
Copy link
Copy Markdown
Contributor Author

aclamk commented Nov 30, 2017

Closing, because #19100 does it better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants