[RFC] test/objectstore/store_test: First pass at an OmapBench test by markhpc · Pull Request #39976 · ceph/ceph

markhpc · 2021-03-09T23:48:07Z

This is a quick and dirty first pass at a new gtest microbenchmark for omap that heavily borrows from the generic OmapSimple test but could be extended further. It creates a configurable number of random kv pairs with configurable (if you modify the test) key and value lengths. Arguably we may want something a little more sophisticated for kv pairs with common prefixes or other cases that this isn't covering.

For now, here's a quick set of tests from Officinalis using key/value sizes roughly based on what we saw when investigating for the trocksdb folks:

https://docs.google.com/spreadsheets/d/1fNFI8U-JRkU5uaRJzgg5rNxqhgRJFlDB4TsTAVsuYkk/edit?usp=sharing

1M 64 byte keys with 256 byte values:

Test	memstore	filestore	kstore	bluestore (buffered off)	bluestore (buffered on)
omap_setkeys	1.138	1.141	1.140	1.171	1.164
omap_get	0.253	2.40	1.974	1.352	2.052
iteration (seek_to_first)	0.489	0.518	0.872	0.650	0.711
iteration (lower_bound)	0.462	0.772	0.695	2.32	1.06
remove	0.000	0.000	0.359	0.302	0.405

Edit: A quick couple of passes make these results look relatively repeatable, though literally I just populated the OmapSimple tests with more random data and timed the tests and haven't thought about the specifics much yet.

If anyone would like to give it a try, you can run something like:

bin/ceph_test_objectstore --gtest_filter="*StoreTest.OmapBench/2" --bluefs_buffered_io=false

Signed-off-by: Mark Nelson mnelson@redhat.com

Checklist

[] References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

markhpc · 2021-03-09T23:57:56Z

Additional thoughts: Create multiple collections? Create/Remove lots of kv pairs between repeated iteration tests?

markhpc · 2021-03-10T00:20:31Z

Quick wallclock profile of thread 1 in both bluefs_buffered_io cases repeatedly doing the lower_bound iteration test over and over:

bluefs_buffered_io=false

Thread: 1 (ceph_test_objec) - 1000 samples 

+ 100.00% main
  + 100.00% RUN_ALL_TESTS
    + 100.00% testing::UnitTest::Run
      + 100.00% testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>
        + 100.00% testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>
          + 100.00% testing::internal::UnitTestImpl::RunAllTests
            + 100.00% testing::TestSuite::Run
              + 100.00% testing::TestSuite::Run
                + 100.00% testing::TestInfo::Run
                  + 100.00% testing::TestInfo::Run
                    + 100.00% testing::Test::Run
                      + 100.00% testing::Test::Run
                        + 100.00% testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>
                          + 100.00% testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>
                            + 100.00% StoreTest_OmapBench_Test::TestBody
                              + 55.90% BlueStore::OmapIteratorImpl::next
                              | + 55.90% ShardMergeIteratorImpl::next
                              |   + 55.90% rocksdb::DBIter::Next
                              |     + 34.00% rocksdb::DBIter::FindNextUserEntry
                              |     | + 34.00% rocksdb::DBIter::FindNextUserEntryInternal
                              |     |   + 34.00% rocksdb::UserComparatorWrapper::Compare
                              |     + 21.90% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              |       + 21.90% rocksdb::MergingIterator::NextAndGetResult
                              |         + 21.90% rocksdb::MergingIterator::Next
                              |           + 21.90% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              |             + 21.90% rocksdb::(anonymous namespace)::LevelIterator::NextAndGetResult
                              |               + 21.90% rocksdb::(anonymous namespace)::LevelIterator::NextImpl
                              |                 + 21.90% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              |                   + 21.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::NextAndGetResult
                              |                     + 21.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::Next
                              |                       + 21.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindKeyForward
                              |                         + 21.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindBlockForward
                              |                           + 21.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::InitDataBlock
                              |                             + 21.90% rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>
                              |                               + 21.90% rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::Block>
                              |                                 + 21.90% rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block>
                              |                                   + 11.60% rocksdb::BlockFetcher::ReadBlockContents
                              |                                   | + 11.60% rocksdb::BlockFetcher::TryGetFromPrefetchBuffer
                              |                                   |   + 11.60% rocksdb::FilePrefetchBuffer::TryReadFromCache
                              |                                   |     + 11.60% rocksdb::FilePrefetchBuffer::Prefetch
                              |                                   |       + 11.60% rocksdb::RandomAccessFileReader::Read
                              |                                   |         + 11.60% rocksdb::LegacyRandomAccessFileWrapper::Read
                              |                                   |           + 11.60% BlueRocksRandomAccessFile::Read
                              |                                   |             + 11.60% BlueFS::read_random
                              |                                   |               + 11.60% BlueFS::_read_random
                              |                                   |                 + 11.60% KernelDevice::read_random
                              |                                   |                   + 11.60% pread64
                              |                                   + 10.30% rocksdb::BlockBasedTable::GetDataBlockFromCache<rocksdb::Block>
                              |                                     + 10.30% rocksdb::BlockBasedTable::GetEntryFromCache
                              |                                       + 10.30% rocksdb_cache::ShardedCache::Lookup
                              |                                         + 10.30% rocksdb_cache::ShardedCache::HashSlice
                              |                                           + 10.30% ceph_str_hash
                              |                                             + 10.30% ceph_str_hash_rjenkins
                              + 33.90% std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::operator[]
                              | + 33.90% std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::piecewise_construct_t const&, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&>&&, std::tuple<>&&)
                              |   + 33.90% std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_insert_node
                              |     + 33.90% std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)
                              + 5.20% BlueStore::OmapIteratorImpl::valid
                              | + 5.20% ShardMergeIteratorImpl::raw_key[abi:cxx11]()
                              |   + 5.20% ShardMergeIteratorImpl::key[abi:cxx11]()
                              |     + 5.20% rocksdb::Slice::ToString[abi:cxx11](bool) const
                              |       + 5.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign
                              |         + 5.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace
                              + 5.00% BlueStore::OmapIteratorImpl::value
                                + 5.00% ShardMergeIteratorImpl::value
                                  + 5.00% to_bufferlist
                                    + 5.00% ceph::buffer::v15_2_0::ptr::ptr
                                      + 5.00% ceph::buffer::v15_2_0::copy
                                        + 5.00% ceph::buffer::v15_2_0::create_aligned
                                          + 5.00% ceph::buffer::v15_2_0::create_aligned_in_mempool
                                            + 5.00% ceph::buffer::v15_2_0::raw_combined::create
                                              + 5.00% tc_posix_memalign
                                                + 5.00% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                                                  + 5.00% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                                                    + 5.00% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                                                      + 5.00% tcmalloc::CentralFreeList::Populate()

bluefs_buffered_io=true

Thread: 1 (ceph_test_objec) - 1000 samples 

+ 100.00% main
  + 100.00% RUN_ALL_TESTS
    + 100.00% testing::UnitTest::Run
      + 100.00% testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>
        + 100.00% testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>
          + 100.00% testing::internal::UnitTestImpl::RunAllTests
            + 100.00% testing::TestSuite::Run
              + 100.00% testing::TestSuite::Run
                + 100.00% testing::TestInfo::Run
                  + 100.00% testing::TestInfo::Run
                    + 100.00% testing::Test::Run
                      + 100.00% testing::Test::Run
                        + 100.00% testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>
                          + 100.00% testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>
                            + 100.00% StoreTest_OmapBench_Test::TestBody
                              + 44.50% BlueStore::OmapIteratorImpl::next
                              | + 21.30% ceph::mono_clock::now
                              | | + 21.30% clock_gettime@GLIBC_2.2.5
                              | |   + 21.30% clock_gettime
                              | + 15.10% ShardMergeIteratorImpl::next
                              | | + 15.10% rocksdb::DBIter::Next
                              | |   + 15.10% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              | |     + 15.10% rocksdb::MergingIterator::NextAndGetResult
                              | |       + 15.10% rocksdb::MergingIterator::Next
                              | |         + 15.10% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              | |           + 15.10% rocksdb::(anonymous namespace)::LevelIterator::NextAndGetResult
                              | |             + 15.10% rocksdb::(anonymous namespace)::LevelIterator::NextImpl
                              | |               + 14.90% rocksdb::IteratorWrapperBase<rocksdb::Slice>::Next
                              | |               | + 14.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::NextAndGetResult
                              | |               |   + 14.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::Next
                              | |               |     + 14.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindKeyForward
                              | |               |       + 14.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindBlockForward
                              | |               |         + 14.90% rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::InitDataBlock
                              | |               |           + 9.30% rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>
                              | |               |           | + 9.30% rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::Block>
                              | |               |           |   + 7.40% rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block>
                              | |               |           |   | + 4.10% rocksdb::BlockFetcher::ReadBlockContents
                              | |               |           |   | | + 4.10% rocksdb::RandomAccessFileReader::Read
                              | |               |           |   | |   + 4.10% rocksdb::LegacyRandomAccessFileWrapper::Read
                              | |               |           |   | |     + 4.10% BlueRocksRandomAccessFile::Read
                              | |               |           |   | |       + 4.10% BlueFS::read_random
                              | |               |           |   | |         + 4.10% BlueFS::_read_random
                              | |               |           |   | |           + 4.10% ceph::buffer::v15_2_0::list::iterator_impl<false>::copy
                              | |               |           |   | |             + 4.10% __memmove_avx_unaligned_erms
                              | |               |           |   | + 3.20% rocksdb::BlockBasedTable::PutDataBlockToCache<rocksdb::Block>
                              | |               |           |   |   + 3.20% rocksdb_cache::ShardedCache::Insert
                              | |               |           |   |     + 3.20% rocksdb_cache::BinnedLRUCacheShard::Insert
                              | |               |           |   |       + 3.20% rocksdb_cache::BinnedLRUHandle::Free
                              | |               |           |   |         + 3.20% rocksdb::(anonymous namespace)::DeleteCachedEntry<rocksdb::Block>
                              | |               |           |   |           + 3.20% rocksdb::(anonymous namespace)::DeleteCachedEntry<rocksdb::Block>
                              | |               |           |   |             + 3.20% tc_free
                              | |               |           |   + 1.90% rocksdb::Status::operator=
                              | |               |           + 5.60% rocksdb::RandomAccessFileReader::Prefetch
                              | |               |             + 5.60% rocksdb::LegacyRandomAccessFileWrapper::Prefetch
                              | |               |               + 5.60% BlueRocksRandomAccessFile::Prefetch
                              | |               |                 + 5.60% BlueFS::read
                              | |               |                   + 5.60% BlueFS::_read
                              | |               |                     + 5.60% KernelDevice::read
                              | |               |                       + 5.60% pread64
                              | |               + 0.20% rocksdb::(anonymous namespace)::LevelIterator::SkipEmptyFileForward
                              | + 8.10% BlueStore::log_latency
                              |   + 8.10% ceph::common::PerfCounters::tinc
                              |     + 0.20% std::__atomic_base<unsigned long>::operator++
                              + 39.00% BlueStore::OmapIteratorImpl::value
                              | + 37.60% ShardMergeIteratorImpl::value
                              |   + 37.60% to_bufferlist
                              |     + 31.50% ceph::buffer::v15_2_0::ptr::ptr
                              |     | + 31.50% ceph::buffer::v15_2_0::copy
                              |     |   + 31.50% ceph::buffer::v15_2_0::create_aligned
                              |     |     + 31.50% ceph::buffer::v15_2_0::create_aligned_in_mempool
                              |     |       + 31.50% ceph::buffer::v15_2_0::raw_combined::create
                              |     |         + 31.50% tc_posix_memalign
                              |     |           + 29.40% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                              |     |           | + 29.40% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                              |     |           |   + 29.40% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                              |     |           |     + 29.40% tcmalloc::CentralFreeList::Populate()
                              |     |           + 2.10% tc_memalign
                              |     + 6.10% ceph::buffer::v15_2_0::list::append
                              |       + 6.10% ceph::buffer::v15_2_0::list::push_back
                              |         + 6.10% ceph::buffer::v15_2_0::ptr_node::create<ceph::buffer::v15_2_0::ptr>
                              |           + 6.10% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                              |             + 6.10% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                              |               + 6.10% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                              |                 + 6.10% tcmalloc::CentralFreeList::Populate()
                              + 8.40% std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::operator[]
                              | + 8.00% std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::lower_bound
                              | | + 8.00% std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::lower_bound
                              | |   + 8.00% std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_lower_bound
                              | |     + 8.00% std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator()
                              | |       + 8.00% std::operator< <char, std::char_traits<char>, std::allocator<char> >
                              | |         + 8.00% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare
                              | |           + 8.00% std::char_traits<char>::compare
                              | |             + 8.00% __memcmp_avx2_movbe
                              | + 0.40% std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::piecewise_construct_t const&, std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&>&&, std::tuple<>&&)
                              + 4.90% BlueStore::OmapIteratorImpl::valid
                              | + 4.30% ShardMergeIteratorImpl::raw_key[abi:cxx11]()
                              |   + 4.30% ShardMergeIteratorImpl::key[abi:cxx11]()
                              |     + 3.10% rocksdb::Slice::ToString[abi:cxx11](bool) const
                              |       + 3.10% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign
                              |         + 3.10% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace
                              |           + 3.10% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate
                              |             + 3.10% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                              |               + 3.10% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                              |                 + 3.10% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                              |                   + 3.10% tcmalloc::CentralFreeList::Populate()
                              + 3.20% BlueStore::OmapIteratorImpl::key[abi:cxx11]()
                                + 2.20% BlueStore::Onode::decode_omap_key
                                | + 2.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::substr
                                |   + 2.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string
                                |     + 2.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
                                |       + 2.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char const*>
                                |         + 2.20% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
                                |           + 2.20% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                                |             + 2.20% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                                |               + 2.20% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                                |                 + 2.20% tcmalloc::CentralFreeList::Populate()
                                + 1.00% ShardMergeIteratorImpl::raw_key[abi:cxx11]()

markhpc · 2021-03-10T00:28:49Z

Going to retry profile using the detach/attach method in gdbpmp which is much slower but may provide better sample quality.

mattbenjamin · 2021-03-10T00:36:13Z

+ 29.40% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
                          |     |           | + 29.40% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
                          |     |           |   + 29.40% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
                          |     |           |     + 29.40% tcmalloc::CentralFreeList::Populate()

markhpc · 2021-03-10T00:52:08Z

@mattbenjamin I probably should run with:

env -i TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728

which will probably help (and I think/hope how we are still running OSDs in production). This is why I keep harping on trying to recycle memory (or objects!) for short lived stuff though. What a pain.

Having said that, the thing I'm especially noticing is that we are indeed following the different prefetch paths in rocksdb depending on whether buffered IO is used. With bluefs_buffered_io = false rocksdb does prefetch in the FilePrefetchBuffer via the BlockFetcher, but in the bluefs_buffered_io=true case, it does it directly in the NewDataBlockIterator. This was what we thought happened based on the analysis in #38044 (comment) so it's nice to see we were right.

Also I'm noticing that in the direct case we are spending a ton of time in ShardedCache::Lookup, but never in ShardedCache::Insert like in the buffered case! I'm not sure what that means exactly yet, but clearly we're seeing very different behavior in rocksdb when buffered IO is enabled.

markhpc · 2021-03-10T07:04:06Z

Ok, final update for tonight. I manually disabled Prefetch in bluerocksenv and saw no change at all in performance but I haven't profiled yet. What I have notices is that in the direct IO case, the first low_bound iteration test is slow, then subsequent ones are fast for a little while, before becoming forever slow again:

lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.307001
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.723705
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.661072
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.683104
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.810730
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.782861
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.733968
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.705092
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.673418
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.769384
lower_bound - 1000000 random 64 byte keys, 256 byte values in 0.870825
lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.334309
lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.299980
lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.338020
lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.303266
lower_bound - 1000000 random 64 byte keys, 256 byte values in 2.315797

This may be key to understanding what's going on.

markhpc · 2021-03-10T15:51:28Z

I think the drop in performance is likely due to hitting the 4GB memory limit defined by the osd_memory_target. Going to rewrite the benchmark to submit data in batches.

markhpc · 2021-03-10T23:49:59Z

Ok, the slowdown in the previous results was due to in-memory datastructures laying around causing the prioritycachemanager to shrink the in-memory caches since it wants to keep process memory below the osd_memory_target. Now we can optionally do omap_setkeys in batches to stop that from happening. Also now the benchmark can span OMAP across multiple objects which also reduces the submission size.

Current test results for 100K objects wtih 100 keys each:

Test	memstore	filestore	kstore	bluestore(buffered off)	bluestore(buffered on)
omap_setkeys	59.575	15.756	7.998	64.254	64.143
omap_get	57.137	12.453	177.809	128.341	63.937
iteration (seek_to_first)	3.421	7.069	9.164	49.293	8.744
iteration (lower_bound)	3.299	6.946	8.978	50.309	8.770
remove	88.393	28.828	56.890	128.338	92.671

These results are pretty odd looking with some suspiciously good looking results for filestore (and kstore for setkeys). The speed difference for bluestore with buffered IO seems to be pretty apparent in the iteration tests though.

Edit: I should add that filestore has access to a lot (384GB!) of pagecache on this node. I probably need to figure out if I can use a cgroup to limit pagecache or something.

markhpc · 2021-03-11T14:46:18Z

Ok, I ran some tests with different cgroup memory limits. I'm still wary of these filestore numbers. We should also keep in mind that this doesn't exactly represent how an OSD would work in real situations (much less parallelism!) Also remember that filestore completely ignores the osd_memory_target and uses rocksdb much less aggressively (relying on the filesystem for more). In any event, one thing I notice here in the iteration tests is that bluestore can do very well with direct IO when there is enough memory to seemingly keep all omap blocks present in the blockcache. I'm back to strongly suspecting that the block cache is being polluted by something else when iteration gets slow. Since this is master it should no longer be onodes, but we still might be thrashing the cache with other stuff (allocation data seems like a strong contender). Need to investigate more.

omap_setkeys

cgroup limit	osd_memory_target	filestore	bluestore (buffered)	bluestore (direct)
4GB	4GB	15.782	64.990	64.948
8GB	4GB	--	64.296	65.248
8GB	8GB	15.609	65.106	64.738
16GB	4GB	--	64.720	64.775
16GB	8GB	--	64.624	64.710
16GB	16GB	15.403	64.818	64.763

omap_get

cgroup limit	osd_memory_target	filestore	bluestore (buffered)	bluestore (direct)
4GB	4GB	12.315	114.196	129.34
8GB	4GB	--	94.645	131.341
8GB	8GB	12.406	94.043	130.289
16GB	4GB	--	64.611	133.593
16GB	8GB	--	64.606	132.085
16GB	16GB	12.490	64.741	135.179

seek_to_first iteration

cgroup limit	osd_memory_target	filestore	bluestore (buffered)	bluestore (direct)
4GB	4GB	7.057	41.895	49.188
8GB	4GB	--	13.373	48.902
8GB	8GB	6.965	5.757	5.907
16GB	4GB	--	8.933	49.448
16GB	8GB	--	5.632	6.018
16GB	16GB	7.029	5.903	6.099

lower_bound iteration

cgroup limit	osd_memory_target	filestore	bluestore (buffered)	bluestore (direct)
4GB	4GB	6.890	43.269	50.368
8GB	4GB	--	8.865	50.187
8GB	8GB	6.921	6.028	5.741
16GB	4GB	--	8.851	50.731
16GB	8GB	--	6.228	5.653
16GB	16GB	7.042	5.971	5.771

remove

cgroup limit	osd_memory_target	filestore	bluestore (buffered)	bluestore (direct)
4GB	4GB	32.930	171.825	128.401
8GB	4GB	--	93.665	131.681
8GB	8GB	29.363	93.512	105.095
16GB	4GB	--	94.279	128.096
16GB	8GB	--	93.005	105.833
16GB	16GB	29.277	93.272	105.529

markhpc · 2021-03-11T17:18:09Z

I'm trusting the filestore results a little more. We do see some slowdown (but ti's still surprisingly good) even at much lower cgroup memory limits:

Filestore cgroup memory.limit_in_bytes tests:

test	1GB	2GB	4GB	8GB	16GB
omap_setkeys	14.739	14.918	15.782	15.609	15.403
omap_get	38.723	37.615	12.315	12.406	12.490
iteration (seek_to_first)	24.981	21.962	7.057	6.965	5.903
iteration (lower_bound)	23.549	21.329	6.890	6.922	5.971
remove	63.016	58.143	32.930	29.363	29.277

Filestore really appears to be faster in these tests, likely due to significantly less rocksdbs usage and thus higher ability to keep things cached with less memory (but I should say that bluestore is still far faster in many non-omap tests!)

aclamk

Nice, simple and very functional bench test.

src/test/objectstore/store_test.cc

markhpc · 2021-03-24T20:02:50Z

@ifed01 @aclamk I've updated the version here to mirror what I've been testing with locally. Please take a look!

markhpc · 2021-03-24T20:05:56Z

Latest test results are available here:

https://docs.google.com/spreadsheets/d/1l2gJ3xsmh4AdpOUFmv5oLyEofoJ9FEMbkBm9DhfqVn0/edit?usp=sharing

aclamk

Good OMAP test.
Only cosmetic changes required.

src/test/objectstore/store_test.cc

Signed-off-by: Mark Nelson <mnelson@redhat.com>

markhpc · 2021-03-29T17:24:52Z

Before we merge this, we probably want to figure out if we want to be able to pass in user-supplied options and what the default options should be. This test can take an extremely long time (1 hour+) to run in some configurations (low memory, direct IO, unsorted).

ronen-fr · 2021-03-30T14:29:24Z

Before we merge this, we probably want to figure out if we want to be able to pass in user-supplied options and what the default options should be. This test can take an extremely long time (1 hour+) to run in some configurations (low memory, direct IO, unsorted).

At least: controlling the number of objects (to enable short verification tests)

stale · 2021-06-11T01:55:03Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

neha-ojha · 2021-06-15T00:56:30Z

@markhpc ping

markhpc · 2021-06-25T13:31:39Z

Probably the next step for this is to figure out how to turn it into an independent benchmark. It talks to the objectstore directly so might need some rethinking if we want to run it (or something like it) against existing clusters. Wouldn't be impossible though to just rip the gtest part of it out and replace it with some kind of cli wrapper.

stale · 2022-01-09T15:27:09Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

neha-ojha · 2022-01-10T19:44:20Z

we still want this!

rzarzynski · 2022-03-22T16:46:04Z

As Neha already said:

we still want this!

github-actions · 2022-09-04T00:05:03Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2022-10-28T22:03:49Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

markhpc added core tests performance labels Mar 9, 2021

markhpc requested review from aclamk, ifed01, jdurgin and neha-ojha March 9, 2021 23:48

markhpc requested a review from mattbenjamin March 10, 2021 00:53

markhpc force-pushed the wip-omap-bench-test branch from 41e44a0 to 7bf8ebe Compare March 10, 2021 23:49

aclamk requested changes Mar 11, 2021

View reviewed changes

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

src/test/objectstore/store_test.cc Show resolved Hide resolved

markhpc mentioned this pull request Mar 23, 2021

os/bluestore: [RFC] onode_map changes, better documentation for bluefs_buffered_io #40351

Closed

3 tasks

ifed01 reviewed Mar 24, 2021

View reviewed changes

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

markhpc force-pushed the wip-omap-bench-test branch from 7bf8ebe to 701a943 Compare March 24, 2021 19:59

aclamk approved these changes Mar 25, 2021

View reviewed changes

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved

test/objectstore/store_test: First pass at an OmapBench test

6ed95b8

Signed-off-by: Mark Nelson <mnelson@redhat.com>

markhpc force-pushed the wip-omap-bench-test branch from 701a943 to 6ed95b8 Compare March 29, 2021 17:22

stale bot added the stale label Jun 11, 2021

stale bot removed the stale label Jun 15, 2021

stale bot added the stale label Jan 9, 2022

stale bot removed the stale label Jan 10, 2022

djgalloway changed the base branch from master to main July 6, 2022 00:00

github-actions bot added the stale label Sep 4, 2022

github-actions bot closed this Oct 28, 2022

markhpc mentioned this pull request Oct 31, 2024

test/objectstore: Add omap_bench #60315

Closed

14 tasks

Conversation

markhpc commented Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

markhpc commented Mar 9, 2021

Uh oh!

markhpc commented Mar 10, 2021

Uh oh!

markhpc commented Mar 10, 2021

Uh oh!

mattbenjamin commented Mar 10, 2021

Uh oh!

markhpc commented Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markhpc commented Mar 10, 2021

Uh oh!

markhpc commented Mar 10, 2021

Uh oh!

markhpc commented Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markhpc commented Mar 11, 2021

Uh oh!

markhpc commented Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aclamk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markhpc commented Mar 24, 2021

Uh oh!

markhpc commented Mar 24, 2021

Uh oh!

aclamk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markhpc commented Mar 29, 2021

Uh oh!

ronen-fr commented Mar 30, 2021

Uh oh!

stale bot commented Jun 11, 2021

Uh oh!

neha-ojha commented Jun 15, 2021

Uh oh!

markhpc commented Jun 25, 2021

Uh oh!

stale bot commented Jan 9, 2022

Uh oh!

neha-ojha commented Jan 10, 2022

Uh oh!

rzarzynski commented Mar 22, 2022

Uh oh!

github-actions bot commented Sep 4, 2022

Uh oh!

github-actions bot commented Oct 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

markhpc commented Mar 9, 2021 •

edited

Loading

markhpc commented Mar 10, 2021 •

edited

Loading

markhpc commented Mar 10, 2021 •

edited

Loading

markhpc commented Mar 11, 2021 •

edited

Loading