[RFC][WIP]os/bluestore: framework for more intelligent DB space usage#28960
[RFC][WIP]os/bluestore: framework for more intelligent DB space usage#28960ifed01 wants to merge 3 commits intoceph:masterfrom
Conversation
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
ae3585d to
02d8e91
Compare
…r BlueFS. It allows excessive space usage for higher DB levels. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
02d8e91 to
7f3e46e
Compare
|
One more example for L4 using DB space, with a spillover this time. |
|
@ifed01 |
|
@aclamk - thanks a lot for pointing this function out, I wasn't aware of it. |
|
Simplified version available at #29687 |
|
Closed in favour of #29687 |
The idea is to force RocksDB to "hint" corresponding DB level when opening a file.
Which is implemented via passing level size aligned folders when opening a DB. RocksDB opens files using these folders hence denoting DB level this file belongs to. The accuracy of such hints looks pretty good.
As a result one is able to build volume usage matrix (DEVICE x DB_LEVEL) which allows to make more intelligent decisions where allocate bluefs extents for specific file from.
Currently this patch is mainly about infrastructure rather than taking such a decision except some improvement for levels 4+ which allows partial DB space usage even when the whole L4 doesn't fit into DB.
One more improvement we can consider is using WAL device for L0/L1 when WAL is underused...
Here is an example for DB levels and bluefs volume usage statistics collected by both the new framework and existing methods. In fact new framework keeps two DEVICE x LEVEL matrices:
a) current values
b) maximum observed values
In the reports below one can check the matching between REAL column in current values matrix and rocksdb per-level stats.
Or TOTALS row values vs. bluefs-bdev-sizes output
Case 1: Original allocation policy:
2019-07-10T15:40:13.791+0300 7f32bb255700 1 RocksDBBlueFSVolumeSelector: wal_total:5368709120, db_total:96636764160, slow_total:107374182400, db_cut_level:4, policy:0 usage matrix:
**** current values matrix starts here ****
LEVEL, WAL, DB, SLOW, ****, **, REAL
L0-1 0,0,0,0,0,0
L2 0,1009778688,0,0,0,1002696637
L3 0,15006171136,0,0,0,14928509229
L4+ 0,0,78671511552,0,0,78313236820
WAL 530579456,1048576,0,0,0,345309959
UNSORTED 0,6291456,0,0,0,1285822
TOTALS 530579456,16023289856,78671511552,0,0,0
^^^^^ current values matrix ends here, maximums matrix follow
MAXIMUMS:
0,3279945728,0,0,0,3264581164
0,10412359680,0,0,0,10369342915
0,48643440640,0,0,0,48425182335
0,0,93072654336,0,0,92586709652
538968064,1048576,0,0,0,528943749
0,7340032,0,0,0,1285822
538968064,55071211520,93072654336,0,0,0
2019-07-10T15:40:13.791+0300 7f32bb255700 1 bluestore(/home/if/ceph/build/dev/osd0) bluefs bdev sizes: bluefs bdev sizes: bluefs bdev sizes:
0 : device size 0x140000000 : own 0x[1000
13ffff000] = 0x13ffff000 : using 0x1faff000(507 MiB)167fffe000] = 0x167fffe000 : using 0x3bb1fe000(15 GiB)1 : device size 0x1680000000 : own 0x[2000
2 : device size 0x1900000000 : own 0x[3600000
300000,3a0000052ac00000,52ea00000600d00000,b33d00000a60d00000,1595000000~3dc00000] = 0x15ca500000 : using 0x1251300000(73 GiB)db_statistics {
"rocksdb_compaction_statistics": "",
"": "",
"": " Compaction Stats [default] **",
"": "Level Files Size
"": "---------------------
"": " L0 0/0 0.00 KB ...
"": " L1 0/0 0.00 KB ...
"": " L2 19/1 956.25 MB ...
"": " L3 234/27 13.00 GB ...
"": " L4 1161/0 72.93 GB ...
"": " Sum 1414/28 86.87 GB ...
Case 2: Use some extra space for L4+ policy:
2019-07-10T16:21:45.827+0300 7f5f6f6d7700 1 RocksDBBlueFSVolumeSelector: wal_total:5368709120, db_total:96636764160, slow_total:107374182400, db_cut_level:4, policy:1 usage matrix:
LEVEL, WAL, DB, SLOW, ****, **, REAL
L0-1 0,419430400,0,0,0,417663585
L2 0,2645557248,0,0,0,2634051067
L3 0,26923237376,0,0,0,26823171800
L4+ 0,21681405952,0,0,0,21550869858
WAL 530579456,1048576,0,0,0,524066537
UNSORTED 0,5242880,0,0,0,511651
TOTALS 530579456,51675922432,0,0,0,0
MAXIMUMS:
0,6149898240,0,0,0,6121377338
0,19858980864,0,0,0,19782445877
0,46491762688,0,0,0,46268971009
0,21681405952,0,0,0,21550869858
538968064,1048576,0,0,0,531368421
0,7340032,0,0,0,511651
538968064,53198454784,0,0,0,0
2019-07-10T16:21:45.831+0300 7f5f6f6d7700 1 bluestore(/home/if/ceph/build/dev/osd0) bluefs bdev sizes: bluefs bdev sizes: bluefs bdev sizes:
0 : device size 0x140000000 : own 0x[1000
13ffff000] = 0x13ffff000 : using 0x1faff000(507 MiB)167fffe000] = 0x167fffe000 : using 0xc0c3fe000(48 GiB)1 : device size 0x1680000000 : own 0x[2000
2 : device size 0x1900000000 : own 0x[c00000000~100000000] = 0x100000000 : using 0x0(0 B)
db_statistics {
"rocksdb_compaction_statistics": "",
"": "",
"": " Compaction Stats [default] **",
"": "Level Files Size
"": "-----------------------",
"": " L0 1/0 204.04 MB ...
"": " L1 3/0 194.28 MB ...
"": " L2 41/0 2.45 GB ...
"": " L3 396/0 24.98 GB ...
"": " L4 317/0 20.07 GB ...
Relates to: http://tracker.ceph.com/issues/38745
Signed-off-by: Igor Fedotov ifedotov@suse.com