Enable a multi-level db to smoothly migrate to FIFO via DB::Open#10348
Enable a multi-level db to smoothly migrate to FIFO via DB::Open#10348yhchiang-sol wants to merge 1 commit intofacebook:mainfrom
Conversation
|
Hello, @ajkr. This PR extends FIFO compaction to work multi-level DB. Initially, I am thinking about extending the ldb tool to support converting between level and FIFO in place, but I later found out it's easier and less error-prone if we simply extend FIFO compaction to allow it to work with multi-level DB. If the idea makes sense to you, I am more than happy to add additional test cases. cc @siying |
cc39cdd to
0a0c76a
Compare
|
Added a new test that covers the migration from level to fifo where the db has multiple levels. |
There was a problem hiding this comment.
The intra-level0 compaction logic is untouched. It will still only do compaction for level 0.
0a0c76a to
88f668e
Compare
88f668e to
9a33d01
Compare
|
Hello, @ajkr. I need your input about this PR. Do you mind taking a look when you're more available? Thank you! |
9a33d01 to
99a4c06
Compare
3b40b6f to
0253672
Compare
|
Hello, @ajkr and @siying. This PR extends FIFO compaction to open a multi-level DB. By default, FIFO only produces sst files in L0 (this part is the same as the current state.) When FIFO opens a multi-level DB, it will pick the sst file to delete from Lmax to L0. Our use case needs this PR to enable smooth migration from level to FIFO. Otherwise it requires a full-compaction in order to achieve the migration. Can you help review or assign someone to review the PR? Thank you! |
siying
left a comment
There was a problem hiding this comment.
Thanks for your proposal. It is indeed a good thing if we can make FIFO to structured as multi-level. It doesn't appear to be trivial tough.
Trying to understand the logic here. Wit the change, do we still compact all files to L0. In which case we place files in non-L0?
So FIFO still only generates SST files in L0 as usual. The difference here is that this PR makes FIFO able to "open" or "process" a db with multiple levels. The main change is to generalize FIFO's compaction picker so that it instead picks the last file in the linked-list from the bottommost non-empty level. In other words:
|
ea99914 to
b829d48
Compare
|
Rebased and resolved initial comments. |
a49bdbb to
36b2dcb
Compare
|
Created #10726 on top of this PR that further improves the CompactionPicker logic for FIFO on non-L0 levels so that it will use the file creation time to pick the file to evict (compact). |
|
Hello @siying, @ajkr. I've updated the PR and created #10726 which improves the multi-level FIFO further. You can view this one as the base support and view #10726 as the completion of the multi-level FIFO. I know PRs like this require a deeper look, but we also hope these PRs will be included in the next RocksDB release. To speed up the discussion, I am happy to VC with you sometime next week to go over the PRs and answer possible questions you might have. Does it sound like a good plan? |
b2d9b05 to
16fd058
Compare
|
@yhchiang-sol has updated the pull request. You must reimport the pull request before landing. |
|
@siying: I am not able to reproduce the error locally. Does the following error message indicate hang or time out? Do you happen to have any additional logs available internally? And, there's another type of error that I don't understand what the error is: |
|
@siying: "Rerun Job with SSH' is disabled for me. Are you able to grant me permission? |
|
@yhchiang-sol here is the stack trace of the hanging test: and the latest LOG file looks like this: |
16fd058 to
56ac710
Compare
|
@yhchiang-sol has updated the pull request. You must reimport the pull request before landing. |
56ac710 to
93858d3
Compare
|
@yhchiang-sol has updated the pull request. You must reimport the pull request before landing. |
|
Thanks for the stack-trace, @siying! Originally I was wondering whether it is the case that the test is waiting for compaction but the compaction has already finished so the test ends up waiting forever, and the stack-trace confirms this is exactly the case! Has fixed the PR, and all tests pass! |
|
@siying has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@siying The Linter says there's one warning internally at Meta. Do you happen to know what it might be? |
93858d3 to
5ba17a6
Compare
|
@yhchiang-sol has updated the pull request. You must reimport the pull request before landing. |
|
@siying has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
5ba17a6 to
4ce5ffe
Compare
|
@yhchiang-sol has updated the pull request. You must reimport the pull request before landing. |
|
|
@siying has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…s_size (#10955) Summary: **Context** #10348 allows multi-level FIFO but accidentally made change to the logic of deleting files in `FIFOCompactionPicker::PickSizeCompaction`. With [this](https://github.com/facebook/rocksdb/pull/10348/files#diff-d8fb3d50749aa69b378de447e3d9cf2f48abe0281437f010b5d61365a7b813fdR156) and [this](https://github.com/facebook/rocksdb/pull/10348/files#diff-d8fb3d50749aa69b378de447e3d9cf2f48abe0281437f010b5d61365a7b813fdR235) together, it deletes one file in non-L0 even when `total_size <= mutable_cf_options.compaction_options_fifo.max_table_files_size`, which is incorrect. As a consequence, FIFO exercises more file deletion in our crash testing, which is not able to verify correctly on deleted keys in the file deleted by compaction. This results in errors `error : inconsistent values for key 000000000000239F000000000000012B000000000000028B: expected state has the key, Get() returns NotFound. Verification failed :(` or `Expected state has key 00000000000023A90000000000000003787878, iterator is at key 00000000000023A9000000000000004178 Column family: default, op_logs: S 00000000000023A90000000000000003787878` **Summary**: - Delete file for non-L0 only if `total_size <= mutable_cf_options.compaction_options_fifo.max_table_files_size` - Add some helpful log to LOG file Pull Request resolved: #10955 Test Plan: - Errors repro-ed by ``` ./db_stress --preserve_unverified_changes=1 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=10 --bottommost_compression_type=none --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=2 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8589934591 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=1048576 --delpercent=0 --delrangepercent=0 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=1000000 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=0 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=40000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=7 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=65 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=1 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_iterator_with_expected_state_one_in=0 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=20 ``` is gone after this fix - CI Reviewed By: ajkr Differential Revision: D41319441 Pulled By: hx235 fbshipit-source-id: 6939753767007f7449ea7055b1420aabd03d7709
|
Hi @yhchiang-sol :) I'd like to ask a follow-up question on how does multi-level FIFO interact with file ingestion. Can new files be ingested to non-L0 level under multi-level FIFO?
|
|
Hello @hx235!
Not currently supported. The current design of the multi-level FIFO is to allow the migration to run smoothly. During the migration phase, the non-L0 files will be removed one by one until there're only L0 files, and all new files will only live in L0. Do you happen to have a use case on your side? |
FIFO compaction can theoretically open a DB with any compaction style.
However, the current code only allows FIFO compaction to open a DB with
a single level.
This PR relaxes the limitation of FIFO compaction and allows it to open a
DB with multiple levels. Below is the read / write / compaction behavior:
is opened with multiple levels, all new files will still be in level 0, and no files
will be moved to a different level.
Then, it will delete the oldest file in that level.
Test Plan
Added a new test to verify the migration from level to FIFO where the db has multiple levels.
Extended existing test cases in db_test and db_basic_test to also verify
all entries of a key after reopening the DB with FIFO compaction.