Confirm badgerMove entry required before rewrite#1302
Confirm badgerMove entry required before rewrite#1302jarifibrahim merged 4 commits intodgraph-io:masterfrom
badgerMove entry required before rewrite#1302Conversation
jarifibrahim
left a comment
There was a problem hiding this comment.
Thanks for the PR @ou05020 . I'll get this PR reviewed by @manishrjain
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @ou05020)
manishrjain
left a comment
There was a problem hiding this comment.
This might be expensive. Instead, we should check why the current mechanisms of movekey removal don't work.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @manishrjain, and @ou05020)
|
@manishrjain However, the discard calculation includes movekey entry size in the total number of bytes (not the discard bytes) in the discard ratio calculation even if that movekey is for a key that has been deleted. If your discard ratio is 50% and half of a file is used by movekey entries for deleted keys, it will never satisfy the discard ratio allowing it to be rewritten and subsequent movekey deletion. There are only two approaches I can think of to solve the problem. Always check if underlying key for each movekey has been deleted to determine if an entry should be discarded. Or, automatically write a delete I added db_test.go If you update |
|
The Line 685 in 62b7a10 because the current implementation of iterators does not return duplicates (keys with same version). After rewriting value log file The badger/table/merge_iterator.go Lines 153 to 159 in 62b7a10 The iterator will see only the vlog;y key because it is at the top.
Even if the iterator were to not skip this key, we will end up losing data if Compaction would drop all the keys except the one with the delete marker and a |
manishrjain
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @ou05020)
|
Thanks for fixing this @ou05020 🎉 |
Value log badgerMove entries for deleted keys are always rewritten. This leads to exponential DB size growth in heavy write/delete use cases with value pointer entries. To prevent this, a check has been added to confirm the value is still needed for badgerMove entries. (cherry picked from commit 3e1cdd9)
) (#1502) * Confirm `badgerMove` entry required before rewrite (#1302) Value log badgerMove entries for deleted keys are always rewritten. This leads to exponential DB size growth in heavy write/delete use cases with value pointer entries. To prevent this, a check has been added to confirm the value is still needed for badgerMove entries. (cherry picked from commit 3e1cdd9) * remove KeepL0InMemory in db_test Co-authored-by: Julian Hegler <julian.hegler@tanium.com>
Value log badgerMove entries for deleted keys are always rewritten. This leads to exponential DB size growth in heavy write/delete use cases with value pointer entries. To prevent this, a check has been added to confirm the value is still needed for badgerMove entries.
…raph-io#1302) (dgraph-io#1502) * Confirm `badgerMove` entry required before rewrite (dgraph-io#1302) Value log badgerMove entries for deleted keys are always rewritten. This leads to exponential DB size growth in heavy write/delete use cases with value pointer entries. To prevent this, a check has been added to confirm the value is still needed for badgerMove entries. (cherry picked from commit 3e1cdd9) * remove KeepL0InMemory in db_test Co-authored-by: Julian Hegler <julian.hegler@tanium.com>
Value log
badgerMoveentries for deleted keys are always rewritten.This leads to exponential DB size growth in heavy write/delete use cases with value pointer entries.
To prevent this, a check has been added to confirm the value is still needed for
badgerMoveentries.This change is