bps: add support for logarithmic offset#9478
Merged
locker merged 5 commits intotarantool:masterfrom Apr 27, 2024
Merged
Conversation
5ff3202 to
b2140e1
Compare
4e0d494 to
9fde5bd
Compare
e170b46 to
2d1986f
Compare
ochaplashkin
approved these changes
Jan 15, 2024
4dc0e9b to
4344d72
Compare
4344d72 to
63eeab1
Compare
e007fcd to
8d2a0e6
Compare
8d2a0e6 to
0957a8d
Compare
49eae17 to
119a124
Compare
30f5a92 to
4887d27
Compare
locker
reviewed
Apr 22, 2024
The checkpatch does not permit to modify several parts of inner debug check functions complaining about too big indentation. The modification will be required further to implement the LogN offset in the BPS tree, so this patch refactors the functions and introduces a helper function for this: bps_tree_debug_insert_and_move_next. The refactored functions are: - bps_tree_debug_check_insert_and_move_to_right_inner - bps_tree_debug_check_insert_and_move_to_left_inner - bps_tree_debug_check_insert_and_move_to_right_leaf - bps_tree_debug_check_insert_and_move_to_left_leaf NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
alyapunov
approved these changes
Apr 26, 2024
locker
approved these changes
Apr 26, 2024
New BPS tree flavors are to be introduced and tested with the existing test suite. There're a bunch of problems though: 1. The white box test uses magic constants to performs its checks, it is better to use constants defined by the bps_tree.h instead. 2. The bps_tree.cc test itself is not TAP-compatible, fix this by introducing more assertions. 3. The bps_tree_iteartor.c test is not TAP-compatible too, is uses the result file to check some cases. Let's remove the manual printing tests and modify the automated ones to cover the removed cases. By the way performed minor bps_tree.cc test refactoring. NO_DOC=test update NO_CHANGELOG=test update
The current tree does not allow to find offset of an element or create an iterator to an element based on its offset. This patch is meant to fix this by expanding the data structure with additional information and introducing methods using it: subtree cardinalities. A subtree cardinality is the amount of elements in it. For example, cardinality of a leaf block is count of elements in it (effectively it equals to leaf.header.size), cardinality of an inner block is the sum of cardinalities of its chlidren. The patch includes two chosable ways to store this information: `BPS_INNER_CARD` and `BPS_INNER_CHILD_CARDS`. The first implementation sores block cardinality in each inner block. This implementation has minimal memory overhead (it just introduces a new 64-bit field in `struct bps_inner`), but calculation of offsets is not that fast, since in order to find an offset of a particular child of an inner node we have to look into each of its children prior to the looking one. The second one sores an array of children cardinalities in inner blocks. The memory overhead of this implementation is visible since it significantly decreases the children capacity of inner blocks. The max count in inner block is decreased from 42 to 25 for tree of 8-byte elements with 512-byte blocks and from 25 to 18 for tree of 16-byte elements with 512-byte blocks. Offset calcluations are faster though. It's possible (though impractical) to enable both solutions, the tree will use the best ways to perform offset-based tasks, but will have to maintain both children cardinality array and inner own cardinalities. Along with the theoretical support this patch introduces a bunch of functions using it: - `iterator_at(t, offset)`: gives an iterator to an element of a tree or tree view by its offset; - `find_get_offset(t, key, offset_ptr)`: the same as `find` but also provides to the user the offset of the found element in the output parameter; - `[lower|upper]_bound[_elem]_get_offset(t, key, exact, offset_ptr)`: the same as upper/lower bound functions but provide to the user the offset to the found position (end of the tree included). - `insert_get_offset(t, new_elem, replaced, offset_ptr)`: the same as `insert`, but also provides the offset to the inserted element. - `delete_get_offset(t, elem, offset_ptr)`: same as `delete`, but also returns offset to the deleted element prior to the deletion in the output parameter. Another new function introduced is bps_tree_view_debug_check(t). This function is similar to the bps_tree_debug_check(t), but is applicable to tree views. It's used to adopt default tree view tests to the new tree variations. Each new implementation is tested by old tree tests (these now support several tree variations selected with a C definition, the definitions are specified in the test/unit/CMakeLists.txt). New offset API-related test introduced (this one tests both of two tree variations - BPS_INNER_CARD and BPS_INNER_CHILD_CARDS). Part of tarantool#8204 NO_DOC=internal NO_CHANGELOG=internal
These add three new configs to be tested in the benchmarks: tree with child cardinalities enabled, with inner cardinality enabled and with both of these. By the way simplified the performance analisys by reducing the memory allocation overhead (it's not required to be zero-initialized) and by moving the test tree build into a separated function. NO_DOC=perf test NO_TEST=perf test NO_CHANGELOG=perf test
Since the performance benchmarks for three additional flavors of the BPS tree had been introduced, the amount of test in this suite has increased to 228. Given some tests work with datasets of 10M entries, the amount of time required to run these increased significantly. Mitigate this by reducing the test datasets. NO_DOC=perf test NO_TEST=perf test NO_CHANGELOG=perf test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bps: add 2-way support for logarithmic offsets
The current tree does not allow to find offset of an element or create
an iterator to an element based on its offset. This patch is meant to
fix this by expanding the data structure with additional information
and introducing methods using it: subtree cardinalities.
A subtree cardinality is the amount of elements in it. For example,
cardinality of a leaf block is count of elements in it (effectively
it equals to leaf.header.size), cardinality of an inner block is the
sum of cardinalities of its chlidren.
The patch includes two chosable ways to store this information:
BPS_INNER_CARDandBPS_INNER_CHILD_CARDS.The first implementation sores block cardinality in each inner block.
This implementation has minimal memory overhead (it just introduces
a new 64-bit field in
struct bps_inner), but calculation of offsetsis not that fast, since in order to find an offset of a particular
child of an inner node we have to look into each of its children
prior to the looking one.
The second one sores an array of children cardinalities in inner
blocks. The memory overhead of this implementation is visible since
it significantly decreases the children capacity of inner blocks. The
max count in inner block is decreased from 42 to 25 for tree of 8-byte
elements with 512-byte blocks and from 25 to 18 for tree of 16-byte
elements with 512-byte blocks. Offset calcluations are faster though.
It's possible (though impractical) to enable both solutions, the tree
will use the best ways to perform offset-based tasks, but will have to
maintain both children cardinality array and inner own cardinalities.
Along with the theoretical support this patch introduces a bunch of
functions using it:
iterator_at(t, offset): gives an iterator to an element of a treeor tree view by its offset;
find_get_offset(t, key, offset_ptr): the same asfindbut alsoprovides to the user the offset of the found element in the output
parameter;
[lower|upper]_bound[_elem]_get_offset(t, key, exact, offset_ptr):the same as upper/lower bound functions but provide to the user the
offset to the found position (end of the tree included).
insert_get_offset(t, new_elem, replaced, offset_ptr): the same asinsert, but also provides the offset to the inserted element.delete_get_offset(t, elem, offset_ptr): same asdelete, but alsoreturns offset to the deleted element prior to the deletion in the
output parameter.
Another new function introduced is bps_tree_view_debug_check(t). This
function is similar to the bps_tree_debug_check(t), but is applicable
to tree views. It's used to adopt default tree view tests to the new
tree variations.
Each new implementation is tested by old tree tests (these now support
several tree variations selected with a C definition, the definitions
are specified in the test/unit/CMakeLists.txt).
Since creation of new tests with .result files is prohibited by the
checkpatch, the new flavor tests from the test/unit/bps_tree.cc are
made TAP-compatible. Given their output is not checked, the printing
test from the suite had been disabled for them.
New offset API-related test introduced (this one tests both of two
tree variations - BPS_INNER_CARD and BPS_INNER_CHILD_CARDS).
Part of #8204
NO_DOC=internal
NO_CHANGELOG=internal