log: avoid heap allocations for most log entries#23721
Conversation
2c0c441 to
1f91448
Compare
|
LGTM |
caa1a32 to
3e359d3
Compare
tchaikov
left a comment
There was a problem hiding this comment.
in this change, in addition to replacing the cached streambuf on heap with non-cached streambuf on stack, we trade the overhead of allocating Entry on heap for memcpy() the entries on stack to m_new on heap.
i don't think this adds up to a win in performance.
before this change
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1883 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (137 ms)
after this change
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (2809 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (140 ms)
| #include <string_view> | ||
| #include <vector> | ||
|
|
||
| template<std::size_t SIZE> |
There was a problem hiding this comment.
could just s/SIZE/stack_size/, and remove the member variable of stack_size.
There was a problem hiding this comment.
My understanding was exposing the template parameters in the class is generally good practice?
There was a problem hiding this comment.
i am not sure though. if size is not part of the interface, there is no need to expose it, IMHO.
src/common/StackStringStream.h
Outdated
| #include <vector> | ||
|
|
||
| template<std::size_t SIZE> | ||
| class StackStringBuf : public std::streambuf |
There was a problem hiding this comment.
could mark this class final if it is not supposed to be inherited.
src/common/StackStringStream.h
Outdated
| StackStringBuf& operator=(const StackStringBuf&) = delete; | ||
| StackStringBuf(StackStringBuf&& o) | ||
| : | ||
| vec(std::move(o.vec)), |
There was a problem hiding this comment.
might want to move this line to the previous one.
There was a problem hiding this comment.
like:
StackStringBuf(StackStringBuf&& o)
: vec(std::move(o.vec)),
src/common/StackStringStream.h
Outdated
| ~StackStringBuf() override {} | ||
| StackStringBuf(const StackStringBuf&) = delete; | ||
| StackStringBuf& operator=(const StackStringBuf&) = delete; | ||
| StackStringBuf(StackStringBuf&& o) |
There was a problem hiding this comment.
the default implementation would suffice, as it performs member-wise move/copy of its members. so we can just put
StackStringBuf(StackStringBuf&&) = default;
StackStringBuf& operator=(StackStringBuf&&) = default;There was a problem hiding this comment.
As @cbodley noted, I'm only moving the part of the array used.
There was a problem hiding this comment.
I guess small_vector could have a nice side effect here: ... = default;-ability. :-)
There was a problem hiding this comment.
Yup, I'm looking at that too. I was trying to find something like that in boost but missed it! Thanks for pointing it out!
There was a problem hiding this comment.
I'm only moving the part of the array used.
i knew, just wanted to remind that default move constructor is good enough in our case.
src/common/StackStringStream.h
Outdated
|
|
||
| void push(std::string_view sv) | ||
| { | ||
| if (vec.size()) { |
There was a problem hiding this comment.
might want to use empty() if we don't care about the "size".
src/common/StackStringStream.h
Outdated
| vec.insert(vec.end(), sv.begin(), sv.end()); | ||
| } else { | ||
| if (sv.size()+arr_used < stack_size) { | ||
| std::memcpy(arr.data()+arr_used, sv.data(), sv.size()); |
There was a problem hiding this comment.
please add spaces around +.
src/common/StackStringStream.h
Outdated
| } else { | ||
| vec.reserve(arr_used+sv.size()); | ||
| std::string_view stack_sv(arr.data(), arr_used); | ||
| vec.insert(vec.end(), stack_sv.begin(), stack_sv.end()); |
There was a problem hiding this comment.
could just put
vec.insert(vec.end(), arr.begin(), arr.begin() + arr_used);so we can avoid creating a temporary object.
src/log/Entry.h
Outdated
| delete(this); | ||
| } | ||
| private: | ||
| StackStringStream os; |
There was a problem hiding this comment.
before this change, the streambuf used by log entries is cached on a per-thread basis, and each thread has its own streambuf, so log entries created on a given thread do not need to create and destroy streambuf for stringifying the message to print.
after this change, the streambuf is created and destroyed on the stack for every single log Entry.
basic_string::_M_mutate is also another allocation via PrebufferedStreambuf.
i think what you meant is CachedPrebufferedStreambuf. PrebufferedStreambuf is not used anymore. as i explained above, the creation and destructure of CachedPrebufferedStreambuf is a one-shot operation on each thread. so i think the overhead introduced by it is fine. and it will be offset by the reuse of the cached streambuf.
for more details on the cached streambuf, see #17938
|
i think the main benefit from @aclamk's work in #17938 was to avoid the cost of re-constructing std::locale (which requires synchronization on global locale state), both in streambuf and ostream it would be great if we could preserve those improvements, while also avoiding the allocations - maybe we could find a way to apply these changes to |
src/common/StackStringStream.h
Outdated
| vec(std::move(o.vec)), | ||
| arr_used(o.arr_used) | ||
| { | ||
| std::memcpy(arr.data(), o.arr.data(), o.arr_used); |
There was a problem hiding this comment.
I guess restricting the memcpy to o.arr_used bytes is the reason for having non-default move- constructor and assignment operator. Might be worth a comment.
src/common/StackStringStream.h
Outdated
| if (sv.size()+arr_used < stack_size) { | ||
| std::memcpy(arr.data()+arr_used, sv.data(), sv.size()); | ||
| arr_used += sv.size(); | ||
| } else { |
There was a problem hiding this comment.
What I see: if the embedded buffer isn't big enough to hold sv, then switch entirely to dynamically allocated vector. Shuffle everything that was inside as strv is supposed to provide continuous memory.
Is there any reason for not using Boost's small_vector?
There was a problem hiding this comment.
I'll give that a try, thanks for pointing it out!
src/common/StackStringStream.h
Outdated
| StackStringStream(StackStringStream&& o) : std::ostream(&ssb), ssb(std::move(o.ssb)) {} | ||
| StackStringStream& operator=(StackStringStream&& o) | ||
| { | ||
| ssb = std::move(o.ssb); |
There was a problem hiding this comment.
What about the base class subobject?
There was a problem hiding this comment.
Yes, I shamefully missed that. I'll add it in the next version of this patchset...
| m_queue_mutex_holder = pthread_self(); | ||
|
|
||
| if (m_inject_segv) | ||
| if (unlikely(m_inject_segv)) |
There was a problem hiding this comment.
:-) I bet a CPU had a lot of fun while encountering the assignment due to misprediction.
There was a problem hiding this comment.
c7a9a4a to
394e716
Compare
|
New version: Need to do some cleanup but this is mostly done I think. |
394e716 to
43de8e1
Compare
src/common/StackStringStream.h
Outdated
|
|
||
| void push(std::string_view sv) | ||
| { | ||
| vec.reserve(sv.size()); |
There was a problem hiding this comment.
vec.size() + sv.size() right?
src/log/Entry.h
Outdated
| auto strv = e.strv(); | ||
| str.clear(); | ||
| str.reserve(strv.size()); | ||
| str.insert(str.end(), strv.begin(), strv.end()); |
There was a problem hiding this comment.
can use str.assign(strv.begin(), strv.end()); instead of clear() + insert()
43de8e1 to
0ecb393
Compare
|
So this looks nearly done except there is still a mysterious allocation being done: compare to allocations before this patch: |
|
Uploaded svg of the allocations. (Remove .txt extension and view locally.) |
71d0710 to
7af4e61
Compare
| } | ||
|
|
||
| void Log::submit_entry(Entry *e) | ||
| void Log::submit_entry(Entry&& e) |
There was a problem hiding this comment.
can we templatize this method, if we know the actual type of Entry at compile time? so we don't need to resolve the address of virtual method at run-time.
src/log/Log.cc
Outdated
|
|
||
| /// | ||
| void Log::set_coarse_timestamps(bool coarse) { | ||
| Locker lock(m_flush_mutex); |
There was a problem hiding this comment.
i think std::scoped_lock is a better fit in this case. as it more light-weighted, and we are not using the extra features provided by std::unique_lock.
There was a problem hiding this comment.
You mean at this location or everywhere? We are using the unlock/lock methods of Locker in other locations. I don't really see a need to optimize for these methods which are called only once at startup?
There was a problem hiding this comment.
Also, the condition variables only take a unique_lock.
There was a problem hiding this comment.
You mean at this location or everywhere?
i mean the wherever std::scoped_lock applies.
We are using the unlock/lock methods of Locker in other locations.
yes, i knew. but we are not required to use the same lock type in Log, are we?
Also, the condition variables only take a unique_lock.
yeah, condition_variables takes unique_lock. in addition to an optimisation, i'd take this as an improvement in readability. when one reads a function accepts std::string instead of const std::string& or std::string_view, he/she would wonder why. the same applies to unique_lock. when i was reading the code, i asked myself, why we wanted to use unique_lock here?
src/log/Log.cc
Outdated
| m_flush.swap(m_new); | ||
| m_cond_loggers.notify_all(); | ||
| m_queue_mutex_holder = 0; | ||
| lock2.unlock(); |
There was a problem hiding this comment.
nit, if we want to unlock a lock first, why not just use the RAII style locking? like
std::scoped_lock flush_lock{m_flush_mutex};
{
std::scoped_lock queue_lock{m_queue_mutex};
// update m_flush* and m_queue*
}
_flush(m_flush, true, false);
m_flush_mutex_holder = 0;
src/log/Log.cc
Outdated
|
|
||
| line_used += e->snprintf(line + line_used, line_size - line_used - 1); | ||
| ceph_assert(line_used < line_size - 1); | ||
| used += (std::size_t)append_time(stamp, pos+used, allocated-used); |
There was a problem hiding this comment.
nit, space before and after +.
891c845 to
93a7082
Compare
|
|
||
| Entry() = delete; | ||
| Entry(short pr, short sub) : | ||
| m_stamp(clock().now()), |
There was a problem hiding this comment.
now() is a static function, so log_clock::now() would work without a helper function
There was a problem hiding this comment.
I tried that but it doesn't work:
In file included from /home/pdonnell/ceph/src/log/Log.h:19:0,
from /home/pdonnell/ceph/src/common/dout.h:26,
from /home/pdonnell/ceph/src/common/debug.h:18,
from /home/pdonnell/ceph/src/erasure-code/shec/ErasureCodePluginShec.cc:22:
/home/pdonnell/ceph/src/log/Entry.h: In constructor ‘ceph::logging::Entry::Entry(short int, short int)’:
/home/pdonnell/ceph/src/log/Entry.h:26:28: error: cannot call member function ‘ceph::logging::log_clock::time_point ceph::logging::log_clock::now()’ without object
m_stamp(log_clock::now()),
^
The method is a trick to have a static log_clock without requiring a definition in a corresponding Entry.cc file. Do you think it's not worth doing?
There was a problem hiding this comment.
sneaky - works for me :) i had assumed it was just one of the clocks from ceph_time.h
| virtual ~Entry() = default; | ||
|
|
||
| virtual std::string_view strv() const = 0; | ||
| virtual std::size_t size() const = 0; |
There was a problem hiding this comment.
as @tchaikov mentioned in #23721 (comment), there is still some overhead from these virtual functions. i think we could remove its vtable entirely if Log::submit_entry() took MutableEntry&& directly, and ConcreteEntry's converting constructor took const MutableEntry& instead of const Entry&. think that's worth doing?
There was a problem hiding this comment.
I just templatized submit_entry. Does it look okay now?
There was a problem hiding this comment.
it looks like m_new.emplace_back(std::move(e)) still calls ConcreteEntry(const Entry&), which relies on the virtual Entry::strv() to perform the copy. if it was ConcreteEntry(const MutableEntry&), we wouldn't need any virtuals in Entry
There was a problem hiding this comment.
Jeeze, is this all really worth it? The cost to code readability seems really high.
There was a problem hiding this comment.
think that's worth doing?
apparently not, and that's fine - i just wanted to clarify that the change to submit_entry didn't address @tchaikov's point about virtual function overhead
There was a problem hiding this comment.
Right, thanks for pointing it out! As you saw, I undid the templated submit_entry method. I don't think it's worth the trouble.
|
do you see what's causing these allocations? Zheng and I are at a loss for the cause. Looking at the dot graph didn't help shed any light on the cause. |
|
@batrick I created additional test to evaluate logger in multithread environment. #23982. Unfortunatelly, I did not run your improvements on actual test environment. |
kchai-2018-09-07_07:49:32-rados-wip-kefu-testing-2018-09-07-1352-distro-basic-mira/2988359 kchai-2018-09-07_07:49:32-rados-wip-kefu-testing-2018-09-07-1352-distro-basic-mira/2988364 |
|
Seems like a repeatable issue: From: /ceph/teuthology-archive/kchai-2018-09-07_07:49:32-rados-wip-kefu-testing-2018-09-07-1352-distro-basic-mira/2988565/remote/mira010/coredump/1536416245.27817.core and from: /ceph/teuthology-archive/kchai-2018-09-07_07:49:32-rados-wip-kefu-testing-2018-09-07-1352-distro-basic-mira/2988364/remote/mira012/coredump/1536310598.11264.core Nothing pops out at me as being obviously wrong to cause this. |
|
@aclamk does this look like the same issue you encountered in common/CachedPrebufferedStreambuf.cc? // std::unique cannot be used here, as deletion will not clear value,
// but thread variable will still exist, causing free-memory-read
struct cached_os_t |
8fa4b5a to
5c64f92
Compare
5c64f92 to
0a06dbc
Compare
|
|
I don't see the issue from the stacktrace: Looks almost like the small_vector was destructed but I don't see how. I'll keep thinking about this although hopefully someone sees the issue. |
Each log Entry now exists on the stack and uses a large (4k) buffer for its log
stream. This Entry is std::move'd to the queues (std::vector and
boost::circular_buffer) in the Log, involving only memory copies in the general
case. There are two memory copies (std::move) for any given Entry, once in
Log::submit_entry and again in Log::_flush
In practice, this eliminates 100% of allocations outside of startup
allocations
I've run a simple experiment with the MDS that copies /usr/bin to CephFS. I got
measurements for the number of allocations from the heap profiler and the
profile of CPU usage in the MDS.
** Before this patch **
== Heap profile: ==
$ google-pprof --alloc_objects --text bin/ceph-mds out/mds.a.profile.0001.heap
Total: 1105048 objects
433329 39.2% 39.2% 433329 39.2% ceph::logging::Log::create_entry
209311 18.9% 58.2% 209311 18.9% __gnu_cxx::__aligned_membuf::_M_addr (inline)
192963 17.5% 75.6% 192963 17.5% __gnu_cxx::new_allocator::allocate (inline)
61774 5.6% 81.2% 61774 5.6% std::__cxx11::basic_string::_M_mutate
37689 3.4% 84.6% 37689 3.4% ceph::buffer::raw_combined::create (inline)
22773 2.1% 86.7% 22773 2.1% mempool::pool_allocator::allocate (inline)
17761 1.6% 88.3% 20523 1.9% std::pair::pair (inline)
15795 1.4% 89.7% 15797 1.4% std::swap (inline)
11011 1.0% 90.7% 130061 11.8% std::__cxx11::list::_M_insert (inline)
10822 1.0% 91.7% 10822 1.0% std::__cxx11::basic_string::reserve
9108 0.8% 92.5% 32721 3.0% __gnu_cxx::new_allocator::construct (inline)
8608 0.8% 93.3% 8610 0.8% std::_Deque_base::_M_initialize_map (inline)
7694 0.7% 94.0% 7694 0.7% std::__cxx11::basic_string::_M_capacity (inline)
6160 0.6% 94.5% 6160 0.6% Journaler::wrap_finisher
6084 0.6% 95.1% 70892 6.4% std::map::operator[] (inline)
5347 0.5% 95.6% 5347 0.5% MutationImpl::add_projected_fnode
4381 0.4% 96.0% 7706 0.7% mempool::pool_allocator::construct (inline)
3588 0.3% 96.3% 182966 16.6% Locker::_do_cap_update
3049 0.3% 96.6% 5280 0.5% std::__shared_count::__shared_count (inline)
3043 0.3% 96.9% 3043 0.3% MDSLogContextBase::MDSLogContextBase (inline)
3038 0.3% 97.1% 14763 1.3% std::__uninitialized_copy::__uninit_copy (inline)
So approximately 430k heap allocations for Entry were created! The
basic_string::_M_mutate is also another allocation via PrebufferedStreambuf
== Profile data ==
Selecting interesting functions
Samples: 798K of event 'cycles:pp', Event count (approx
Children Self Com Shared Object Symbol
+ 1.04% 1.04% log libceph-common.so.0 [.] ceph::logging::Log::_flush
+ 0.05% 0.05% log libceph-common.so.0 [.] ceph::logging::Log::flush
+ 0.00% 0.00% log libceph-common.so.0 [.] ceph::logging::Log::_log_safe_write
+ 0.00% 0.00% log libceph-common.so.0 [.] ceph::logging::Log::entry
+ 0.00% 0.00% log libceph-common.so.0 [.] ceph::logging::Log::_flush_logbuf
...
Children Self Command Shared Object Symbol
+ 3.69% 0.00% safe_timer libceph-common.so.0 [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+ 0.53% 0.00% ms_dispatch libceph-common.so.0 [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+ 0.13% 0.00% fn_anonymous libceph-common.so.0 [.] CachedPrebufferedStreambuf::~CachedPrebufferedStreambuf
+ 0.00% 0.00% ms_dispatch libceph-common.so.0 [.] CachedPrebufferedStreambuf::create
+ 0.00% 0.00% fn_anonymous libceph-common.so.0 [.] CachedPrebufferedStreambuf::create
Children Self Command Shared Object Symbol
+ 0.07% 0.07% fn_anonymous libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% ms_dispatch libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% ms_dispatch ceph-mds [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+ 0.00% 0.00% md_submit libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% fn_anonymous ceph-mds [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+ 0.00% 0.00% safe_timer libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% mds_rank_progr libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% mds_rank_progr ceph-mds [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+ 0.00% 0.00% msgr-worker-0 libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% msgr-worker-2 libceph-common.so.0 [.] ceph::logging::Log::create_entry
+ 0.00% 0.00% md_submit ceph-mds [.] _ZN4ceph7logging3Log12create_entryEiiPm@plt
+ 0.00% 0.00% msgr-worker-1 libceph-common.so.0 [.] ceph::logging::Log::create_entry
Children Self Command Shared Object Symbol
+ 8.29% 0.00% ms_dispatch libstdc++.so.6.0.24 [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+ 7.55% 1.46% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::_M_insert<long>
+ 3.87% 0.00% fn_anonymous libstdc++.so.6.0.24 [.] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+ 2.92% 0.00% md_submit libstdc++.so.6.0.24 [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+ 2.38% 0.00% fn_anonymous libstdc++.so.6.0.24 [.] virtual thunk to std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+ 2.22% 2.22% fn_anonymous libstdc++.so.6.0.24 [.] std::ostream::sentry::sentry
+ 1.89% 0.13% fn_anonymous libstdc++.so.6.0.24 [.] std::__ostream_insert<char, std::char_traits<char> >
+ 0.71% 0.00% ms_dispatch libstdc++.so.6.0.24 [.] std::basic_ostream<char, std::char_traits<char> >::~basic_ostream
+ 0.39% 0.06% fn_anonymous libstdc++.so.6.0.24 [.] std::ostream::_M_insert<long>
+ 0.29% 0.21% ms_dispatch libstdc++.so.6.0.24 [.] std::__ostream_insert<char, std::char_traits<char> >
+ 0.27% 0.27% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::sentry::~sentry
+ 0.27% 0.27% fn_anonymous libstdc++.so.6.0.24 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>
+ 0.22% 0.22% ms_dispatch libstdc++.so.6.0.24 [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn
+ 0.20% 0.20% ms_dispatch libstdc++.so.6.0.24 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>
+ 0.15% 0.15% fn_anonymous libstdc++.so.6.0.24 [.] std::ostream::sentry::~sentry
+ 0.14% 0.14% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::sentry::sentry
+ 0.13% 0.00% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::_M_insert<unsigned long>
+ 0.13% 0.13% fn_anonymous libstdc++.so.6.0.24 [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn
+ 0.00% 0.00% fn_anonymous libstdc++.so.6.0.24 [.] std::ostream::_M_insert<unsigned long>
+ 0.00% 0.00% ms_dispatch libstdc++.so.6.0.24 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>
And the unittest_log time:
$ bin/unittest_log
[==========] Running 15 tests from 1 test case
[----------] Global test environment set-up
[----------] 15 tests from Log
[ RUN ] Log.Simple
[ OK ] Log.Simple (0 ms)
[ RUN ] Log.ReuseBad
[ OK ] Log.ReuseBad (1 ms)
[ RUN ] Log.ManyNoGather
[ OK ] Log.ManyNoGather (0 ms)
[ RUN ] Log.ManyGatherLog
[ OK ] Log.ManyGatherLog (12 ms)
[ RUN ] Log.ManyGatherLogStringAssign
[ OK ] Log.ManyGatherLogStringAssign (27 ms)
[ RUN ] Log.ManyGatherLogStringAssignWithReserve
[ OK ] Log.ManyGatherLogStringAssignWithReserve (27 ms)
[ RUN ] Log.ManyGatherLogPrebuf
[ OK ] Log.ManyGatherLogPrebuf (15 ms)
[ RUN ] Log.ManyGatherLogPrebufOverflow
[ OK ] Log.ManyGatherLogPrebufOverflow (15 ms)
[ RUN ] Log.ManyGather
[ OK ] Log.ManyGather (8 ms)
[ RUN ] Log.InternalSegv
[WARNING] /home/pdonnell/cephfs-shell/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads
[ OK ] Log.InternalSegv (8 ms)
[ RUN ] Log.LargeLog
[ OK ] Log.LargeLog (43 ms)
[ RUN ] Log.TimeSwitch
[ OK ] Log.TimeSwitch (1 ms)
[ RUN ] Log.TimeFormat
[ OK ] Log.TimeFormat (0 ms)
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1779 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (64 ms)
[----------] 15 tests from Log (2000 ms total)
[----------] Global test environment tear-down
[==========] 15 tests from 1 test case ran. (2000 ms total)
[ PASSED ] 15 tests
** After Patch **
The StackStreamBuf uses 4k for its default buffer. This appears to be more than
reasonable for preventing allocations for logging
== Heap profile: ==
$ google-pprof --alloc_objects --text bin/ceph-mds out/mds.a.profile.0001.heap
Total: 1052127 objects
384957 36.6% 36.6% 384957 36.6% __gnu_cxx::new_allocator::allocate (inline)
274720 26.1% 62.7% 274720 26.1% __gnu_cxx::__aligned_membuf::_M_addr (inline)
68496 6.5% 69.2% 68496 6.5% std::__cxx11::basic_string::_M_mutate
44140 4.2% 73.4% 51828 4.9% std::pair::pair (inline)
43091 4.1% 77.5% 43091 4.1% ceph::buffer::raw_combined::create (inline)
27706 2.6% 80.1% 236407 22.5% std::__cxx11::list::_M_insert (inline)
25699 2.4% 82.6% 25699 2.4% std::__cxx11::basic_string::reserve
23183 2.2% 84.8% 23183 2.2% mempool::pool_allocator::allocate (inline)
19466 1.9% 86.6% 81716 7.8% __gnu_cxx::new_allocator::construct (inline)
17606 1.7% 88.3% 17606 1.7% std::__cxx11::basic_string::_M_capacity (inline)
16879 1.6% 89.9% 16881 1.6% std::swap (inline)
8572 0.8% 90.7% 8574 0.8% std::_Deque_base::_M_initialize_map (inline)
8477 0.8% 91.5% 11808 1.1% mempool::pool_allocator::construct (inline)
6166 0.6% 92.1% 6166 0.6% Journaler::wrap_finisher
6080 0.6% 92.7% 134975 12.8% std::map::operator[] (inline)
6079 0.6% 93.3% 6079 0.6% MutationImpl::add_projected_fnode
== Profile data ==
Samples: 62K of event 'cycles:u', Event count (approx.)
Overhead Command Shared Object Symbol
+ 5.91% log libc-2.23.so [.] vfprintf
+ 5.75% ms_dispatch libstdc++.so.6.0.24 [.] std::__ostream_insert<char, std::char_traits<char> >
+ 4.59% ms_dispatch ceph-mds [.] StackStringBuf<4096ul>::xsputn
+ 4.26% ms_dispatch libc-2.23.so [.] __memmove_ssse3_back
+ 4.07% log libceph-common.so.0 [.] ceph::logging::Log::_flush
+ 2.48% ms_dispatch libstdc++.so.6.0.24 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<l
+ 2.13% fn_anonymous libstdc++.so.6.0.24 [.] std::__ostream_insert<char, std::char_traits<char> >
+ 2.09% ms_dispatch ceph-mds [.] CDir::check_rstats
+ 2.06% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::sentry::sentry
+ 1.98% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::sentry::~sentry
+ 1.87% log libc-2.23.so [.] __strcpy_sse2_unaligned
+ 1.60% fn_anonymous ceph-mds [.] StackStringBuf<4096ul>::xsputn
+ 1.46% log libc-2.23.so [.] _IO_default_xsputn
+ 1.45% log libc-2.23.so [.] _itoa_word
+ 1.43% fn_anonymous libc-2.23.so [.] __memmove_ssse3_back
+ 1.40% ms_dispatch libstdc++.so.6.0.24 [.] std::ostream::_M_insert<long>
+ 0.98% fn_anonymous libstdc++.so.6.0.24 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<l
+ 0.89% ms_dispatch libstdc++.so.6.0.24 [.] 0x
+ 0.88% ms_dispatch libstdc++.so.6.0.24 [.] std::_Rb_tree_increment
And the unittest_log time:
$ bin/unittest_log
[==========] Running 13 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 13 tests from Log
[ RUN ] Log.Simple
[ OK ] Log.Simple (1 ms)
[ RUN ] Log.ReuseBad
[ OK ] Log.ReuseBad (0 ms)
[ RUN ] Log.ManyNoGather
[ OK ] Log.ManyNoGather (0 ms)
[ RUN ] Log.ManyGatherLog
[ OK ] Log.ManyGatherLog (83 ms)
[ RUN ] Log.ManyGatherLogStringAssign
[ OK ] Log.ManyGatherLogStringAssign (79 ms)
[ RUN ] Log.ManyGatherLogStackSpillover
[ OK ] Log.ManyGatherLogStackSpillover (81 ms)
[ RUN ] Log.ManyGather
[ OK ] Log.ManyGather (80 ms)
[ RUN ] Log.InternalSegv
[WARNING] /home/pdonnell/ceph/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads.
[ OK ] Log.InternalSegv (7 ms)
[ RUN ] Log.LargeLog
[ OK ] Log.LargeLog (55 ms)
[ RUN ] Log.TimeSwitch
[ OK ] Log.TimeSwitch (4 ms)
[ RUN ] Log.TimeFormat
[ OK ] Log.TimeFormat (1 ms)
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1441 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (63 ms)
[----------] 13 tests from Log (1895 ms total)
[----------] Global test environment tear-down
[==========] 13 tests from 1 test case ran. (1895 ms total)
[ PASSED ] 13 tests.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
In favor of the simpler StackStreamBuffer. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This brings down the static size of the memory used by the logging infrastructure:
If we used 1024, we'd have 1088*10000 = 10880000 = 10MB in use by the ring
buffer and 2*1088*100 = 2*108800 = 2*106KB for the m_new and m_flush
vectors.
In my testing, 1024 covers most log entries.
Note, I've kept 4096 for the StackStringStream via MutableEntry as these are
already allocated on the heap and cached in a thread local vector. Generally
there should only be about a dozen of these allocated so it's worth keeping a
larger buffer.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
0a06dbc to
9c10509
Compare
|
valgrind to the rescue, I think I found the issue (an errant |
|
I believe I've fixed the issue. It passes tests for an fs run: http://pulpito.ceph.com/pdonnell-2018-09-15_04:37:26-fs-wip-pdonnell-testing-20180915.024633-testing-basic-smithi/ The valgrind failure et al. are unrelated. |
Please see commit message for more information and trivial experimental results.
This is my first time using this flame graph tool. It's results are not always reliably repeatable. I think it's reasonable to suggest this new version is about 50% faster in addition to obviating nearly 100% of heap allocations for logging.