os/bluestore: Fast WAL for RocksDB#62224
Conversation
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
31e4731 to
c114aea
Compare
c114aea to
ab34d64
Compare
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
ab34d64 to
41e8d24
Compare
|
jenkins test api |
6780770 to
10842c6
Compare
src/common/options/global.yaml.in
Outdated
| type: bool | ||
| level: advanced | ||
| desc: Enables a faster backend in BlueFS for WAL writes. | ||
| long_desc: Enabling this feature will reduce ~50% the amount of fdatasync syscalls issued by WAL writes. This happens because we embed metadata |
There was a problem hiding this comment.
fdatasync double backticks
src/common/options/global.yaml.in
Outdated
| desc: Enables a faster backend in BlueFS for WAL writes. | ||
| long_desc: Enabling this feature will reduce ~50% the amount of fdatasync syscalls issued by WAL writes. This happens because we embed metadata | ||
| with the data itself. Downgrading from a version that uses v2 to v1 will require running `ceph-bluestore-tool --command downgrade-wal-to-v1` | ||
| to move wal files to previous format. |
There was a problem hiding this comment.
to migrate WAL entries to the previous format.
src/include/encoding.h
Outdated
| using ::ceph::encode; \ | ||
| do { | ||
|
|
||
| #define ENCODE_START_FILLER(v, compat, filler_in) \ |
There was a problem hiding this comment.
To be removed, not used any more
src/include/encoding.h
Outdated
| * @param bl bufferlist we were encoding to | ||
| * @param new_struct_compat struct-compat value to use | ||
| */ | ||
| #define ENCODE_FINISH_FILLER() \ |
There was a problem hiding this comment.
To be removed, not used any more
src/os/bluestore/BlueFS.cc
Outdated
| h->file->fnode.wal_size += content_length; | ||
| ceph_le64 flush_length_le(content_length); | ||
| h->wal_header_filler.copy_in(sizeof(flush_length_le), (char*)&flush_length_le); | ||
| uint64_t length = sizeof(File::wal_flush_t::wal_length_t) + content_length + sizeof(h->file->wal_marker); |
There was a problem hiding this comment.
Wouldn't it be better to use head_size()/tail_size() helpers instead of sizeof?
There was a problem hiding this comment.
Replaced all sizeofs with header_size() / tail_size() calls.
| hashed_ino ^= hashed_ino << 11; | ||
| hashed_ino ^= hashed_ino << 23; | ||
| // use hashed ino in a endiness-agnostic way | ||
| for (int i = 0; i < 8; i++) { |
There was a problem hiding this comment.
8 to be replaced with sizeof(wal_marker) or something
There was a problem hiding this comment.
I'm not sure with "8", but I added comment to explain the code.
There was a problem hiding this comment.
AFAIU 8 comes from marker's length. Shouldn't it be sizeof(marker) then?
src/os/bluestore/BlueFS.cc
Outdated
| } | ||
| if (const unsigned tail = bl.length() & ~super.block_mask(); tail) { | ||
| const auto padding_len = super.block_size - tail; | ||
| unsigned padding_len = 0; |
There was a problem hiding this comment.
Apaprently no need to declare outside 'if' statement
1032eea to
c2fdd21
Compare
src/common/options/global.yaml.in
Outdated
| to move wal files to previous format. | ||
| long_desc: In envelope mode BlueFS files do not need to update metadata. When applied to RocksDB WAL files, | ||
| it reduces by ~50% the amount of fdatasync syscalls. | ||
| Downgrading from an envelope mode to legacy mode requires `ceph-bluestore-tool --command downgrade-wal-to-v1`. |
There was a problem hiding this comment.
'wal-to-v1' is kinda weird now...
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
WAL file mode allows to cut by 50% of fdatasync(). For regular files, we independently sync file data and file metadata. In WAL mode we are able to recover most metadata from file bytestream. Hence - we only sync file data. Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com> (cherry picked from commit 38519ff) Amended-by: Adam Kupczyk
Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com> (cherry picked from commit 4961dca)
This is useful to have ability to create a contigous_filler variable. Otherwise one needs std::optional<>. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Main changes:
- remove bluefs_wal_header_t; we no longer use denc directly,
instead we read length and marker directly
- get rid of wal_limit; this is replaced with splitting
WAL_V2=>{WAL_V2,WAL_V2_FIN}. WAL_V2 means there could be more wal data,
WAL_V2_FIN means file was orderly closed = no more data.
- modified wal_marker_t (previously WALMarker) to be calculated in
endiness-agnostic way.
- replaced _wal_update_size with _wal_index_file; it does a similar
function, but is less convoluted now
- simplified _wal_read
- now BlueFS::close_writer() flushes data
- BlueRocksEnv's BlueRocksWritableFile::truncate() now does the action
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Macros are not longer used. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
It is just not used. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
If we happen to shutdown with dirty.pending_releases, after mount we will most likely want to free them. And its an error - they will already be free. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
The fact that test was broken was hidden by the error in BlueFS::_init_alloc, that created the allocator with invalid alloc unit size. After that was fixed, also test needed to be fixed. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com> (cherry picked from commit 041edad)
Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com> (cherry picked from commit a65118d)
Dropped checks around free space. Now downgrades one file at a time. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Changed logic for selecting bluefs_super_t wal_version field. Now it is more a result of the data that is in bluefs log and less directly controlled. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
The check involves timing check. Jenkins is too unpredictable with performance. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
This refactor turns WAL_v2 into globally available ENVELOPE_MODE. The RocksDB WAL files are now just using existing feature. Now adding new file encoding schemes will be easier. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
…pe_mode No logic change, just conf name and constants adaptation. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
This is the variant of DENC_START that manually controls encoding version. As being incompatible with static compile-time checking of StructVChecker, new (legacy) mode had to be introduced. Also, Removed struct_v and struct_compat from denc_finish. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
@markhpc I removed your "need-qa" label. This thing passed preliminary teuthology, and this version is not final. Note: critical bluestore PRs we are weekly testing / reviewing with aclamk-testing-* labels. |
13ed631 to
11f9a67
Compare
+ relocated config bluefs_wal_envelope_mode close to other bluefs configs Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
jenkins test make check arm64 |
|
Approved by https://tracker.ceph.com/issues/70787. |
The work here is based on #56927.
It implements the design principles showed there, but with a bit different implementation details.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition