avoid IterKey::UpdateInternalKey() in BlockIter#6843
avoid IterKey::UpdateInternalKey() in BlockIter#6843ajkr wants to merge 1 commit intofacebook:masterfrom
IterKey::UpdateInternalKey() in BlockIter#6843Conversation
|
There are a few next steps that were too hard to do all at once.
But this PR should be enough to unblock #6646... |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
2a280ed to
50f8d5f
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
50f8d5f to
8f0dd79
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
8f0dd79 to
9c3e78e
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
9c3e78e to
cdd9c22
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
pdillinger
left a comment
There was a problem hiding this comment.
Still digging deeper in my understanding, but sharing this partial feedback
cdd9c22 to
5cdb42d
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
pdillinger
left a comment
There was a problem hiding this comment.
Maybe also run a seekrandom benchmark (without cache_index_and_filter_blocks) to see what kind of worst-case performance impact this has.
ajkr
left a comment
There was a problem hiding this comment.
Thanks for the review!
Maybe also run a seekrandom benchmark (without cache_index_and_filter_blocks) to see what kind of worst-case performance impact this has.
Sure, will do.
5cdb42d to
82b2f26
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
Added to the description. The results aren't that stable, but I do consistently see slowdown in both the "ingestion_db" and "normal_db" case. That is different from what I expected as I thought the main drawback of this PR was introducing a copy into |
pdillinger
left a comment
There was a problem hiding this comment.
I think we can accept minor perf regression to help bring some sanity to this part of the code.
|
(I believe the CI failures are all unrelated. Do another rebase if you want.) |
`IterKey::UpdateInternalKey()` is an error-prone API as it's
incompatible with `IterKey::TrimAppend()`, which is used for
decoding delta-encoded internal keys. This PR stops using it in
`BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s
buffer when needed. The logic for safely getting a Slice with global
seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`.
`BinarySeek()` is also migrated to use this API (previously it ignored
global seqno entirely).
Test Plan:
benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
created by regular flush; "ingestion_db" created by ingesting a file. Both
DBs have same contents.
```
$ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
$ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
$ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
```
benchmark run command:
```
TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000
```
results:
DB code throughput
normal_db master 267.9
normal_db PR6843 254.2 (-5.1%)
ingestion_db master 259.6
ingestion_db PR6843 250.5 (-3.5%)
82b2f26 to
089f675
Compare
|
@ajkr has updated the pull request. Re-import the pull request |
|
Thanks for the review!
The "normal_db" regression can be avoided if we change The "ingestion_db" regression is due to the extra copy into |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
1 similar comment
…175) Summary: `IterKey::UpdateInternalKey()` is an error-prone API as it's incompatible with `IterKey::TrimAppend()`, which is used for decoding delta-encoded internal keys. This PR stops using it in `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s buffer when needed. The logic for safely getting a Slice with global seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`. `BinarySeek()` is also migrated to use this API (previously it ignored global seqno entirely). Pull Request resolved: facebook#6843 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000 ``` results: | DB | code | throughput | |---|---|---| | normal_db | master | 267.9 | | normal_db | PR6843 | 254.2 (-5.1%) | | ingestion_db | master | 259.6 | | ingestion_db | PR6843 | 250.5 (-3.5%) | Reviewed By: pdillinger Differential Revision: D21562604 Pulled By: ajkr fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264 Signed-off-by: Yi Wu <yiwu@pingcap.com> Co-authored-by: Andrew Kryczka <andrewkr@fb.com>
…174) Summary: `IterKey::UpdateInternalKey()` is an error-prone API as it's incompatible with `IterKey::TrimAppend()`, which is used for decoding delta-encoded internal keys. This PR stops using it in `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s buffer when needed. The logic for safely getting a Slice with global seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`. `BinarySeek()` is also migrated to use this API (previously it ignored global seqno entirely). Pull Request resolved: facebook#6843 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000 ``` results: | DB | code | throughput | |---|---|---| | normal_db | master | 267.9 | | normal_db | PR6843 | 254.2 (-5.1%) | | ingestion_db | master | 259.6 | | ingestion_db | PR6843 | 250.5 (-3.5%) | Reviewed By: pdillinger Differential Revision: D21562604 Pulled By: ajkr fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264 Signed-off-by: Yi Wu <yiwu@pingcap.com> Co-authored-by: Andrew Kryczka <andrewkr@fb.com>
…173) Summary: `IterKey::UpdateInternalKey()` is an error-prone API as it's incompatible with `IterKey::TrimAppend()`, which is used for decoding delta-encoded internal keys. This PR stops using it in `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s buffer when needed. The logic for safely getting a Slice with global seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`. `BinarySeek()` is also migrated to use this API (previously it ignored global seqno entirely). Pull Request resolved: facebook#6843 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000 ``` results: | DB | code | throughput | |---|---|---| | normal_db | master | 267.9 | | normal_db | PR6843 | 254.2 (-5.1%) | | ingestion_db | master | 259.6 | | ingestion_db | PR6843 | 250.5 (-3.5%) | Reviewed By: pdillinger Differential Revision: D21562604 Pulled By: ajkr fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264 Signed-off-by: Yi Wu <yiwu@pingcap.com> Co-authored-by: Andrew Kryczka <andrewkr@fb.com>
IterKey::UpdateInternalKey()is an error-prone API as it'sincompatible with
IterKey::TrimAppend(), which is used fordecoding delta-encoded internal keys. This PR stops using it in
BlockIter. Instead, it assigns global seqno in a separateIterKey'sbuffer when needed. The logic for safely getting a Slice with global
seqno properly assigned is encapsulated in
GlobalSeqnoAppliedKey.BinarySeek()is also migrated to use this API (previously it ignoredglobal seqno entirely).
Test Plan:
benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
created by regular flush; "ingestion_db" created by ingesting a file. Both
DBs have same contents.
benchmark run command:
results: