Skip to content

bps: add support for logarithmic offset#9478

Merged
locker merged 5 commits intotarantool:masterfrom
mkostoevr:m.kostoev/gh-xxxx-log-offset
Apr 27, 2024
Merged

bps: add support for logarithmic offset#9478
locker merged 5 commits intotarantool:masterfrom
mkostoevr:m.kostoev/gh-xxxx-log-offset

Conversation

@mkostoevr
Copy link
Contributor

@mkostoevr mkostoevr commented Dec 13, 2023

bps: add 2-way support for logarithmic offsets

The current tree does not allow to find offset of an element or create
an iterator to an element based on its offset. This patch is meant to
fix this by expanding the data structure with additional information
and introducing methods using it: subtree cardinalities.

A subtree cardinality is the amount of elements in it. For example,
cardinality of a leaf block is count of elements in it (effectively
it equals to leaf.header.size), cardinality of an inner block is the
sum of cardinalities of its chlidren.

The patch includes two chosable ways to store this information:
BPS_INNER_CARD and BPS_INNER_CHILD_CARDS.

The first implementation sores block cardinality in each inner block.
This implementation has minimal memory overhead (it just introduces
a new 64-bit field in struct bps_inner), but calculation of offsets
is not that fast, since in order to find an offset of a particular
child of an inner node we have to look into each of its children
prior to the looking one.

The second one sores an array of children cardinalities in inner
blocks. The memory overhead of this implementation is visible since
it significantly decreases the children capacity of inner blocks. The
max count in inner block is decreased from 42 to 25 for tree of 8-byte
elements with 512-byte blocks and from 25 to 18 for tree of 16-byte
elements with 512-byte blocks. Offset calcluations are faster though.

It's possible (though impractical) to enable both solutions, the tree
will use the best ways to perform offset-based tasks, but will have to
maintain both children cardinality array and inner own cardinalities.

Along with the theoretical support this patch introduces a bunch of
functions using it:

  • iterator_at(t, offset): gives an iterator to an element of a tree
    or tree view by its offset;
  • find_get_offset(t, key, offset_ptr): the same as find but also
    provides to the user the offset of the found element in the output
    parameter;
  • [lower|upper]_bound[_elem]_get_offset(t, key, exact, offset_ptr):
    the same as upper/lower bound functions but provide to the user the
    offset to the found position (end of the tree included).
  • insert_get_offset(t, new_elem, replaced, offset_ptr): the same as
    insert, but also provides the offset to the inserted element.
  • delete_get_offset(t, elem, offset_ptr): same as delete, but also
    returns offset to the deleted element prior to the deletion in the
    output parameter.

Another new function introduced is bps_tree_view_debug_check(t). This
function is similar to the bps_tree_debug_check(t), but is applicable
to tree views. It's used to adopt default tree view tests to the new
tree variations.

Each new implementation is tested by old tree tests (these now support
several tree variations selected with a C definition, the definitions
are specified in the test/unit/CMakeLists.txt).

Since creation of new tests with .result files is prohibited by the
checkpatch, the new flavor tests from the test/unit/bps_tree.cc are
made TAP-compatible. Given their output is not checked, the printing
test from the suite had been disabled for them.

New offset API-related test introduced (this one tests both of two
tree variations - BPS_INNER_CARD and BPS_INNER_CHILD_CARDS).

Part of #8204

NO_DOC=internal
NO_CHANGELOG=internal

@coveralls
Copy link

coveralls commented Dec 13, 2023

Coverage Status

coverage: 87.074% (-0.02%) from 87.091%
when pulling b48048b on mkostoevr:m.kostoev/gh-xxxx-log-offset
into 735b0dc
on tarantool:master
.

@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch from 5ff3202 to b2140e1 Compare December 20, 2023 09:26
@mkostoevr mkostoevr changed the title M.kostoev/gh xxxx log offset bps: add optional child powers to inner blocks Dec 20, 2023
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 5 times, most recently from 4e0d494 to 9fde5bd Compare December 20, 2023 19:20
@mkostoevr mkostoevr requested a review from alyapunov December 21, 2023 08:37
@mkostoevr mkostoevr marked this pull request as ready for review December 21, 2023 08:38
@mkostoevr mkostoevr requested a review from a team as a code owner December 21, 2023 08:38
@mkostoevr mkostoevr assigned mkostoevr and unassigned alyapunov Jan 10, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 6 times, most recently from e170b46 to 2d1986f Compare January 15, 2024 09:00
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 8 times, most recently from 4dc0e9b to 4344d72 Compare January 22, 2024 11:25
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch from 4344d72 to 63eeab1 Compare January 23, 2024 13:08
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 2 times, most recently from e007fcd to 8d2a0e6 Compare March 6, 2024 19:46
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch from 8d2a0e6 to 0957a8d Compare March 21, 2024 10:04
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 8 times, most recently from 49eae17 to 119a124 Compare April 5, 2024 08:42
@mkostoevr mkostoevr added full-ci Enables all tests for a pull request and removed full-ci Enables all tests for a pull request labels Apr 5, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-xxxx-log-offset branch 4 times, most recently from 30f5a92 to 4887d27 Compare April 5, 2024 11:34
@mkostoevr mkostoevr changed the title bps: add optional child cardinalities to inner blocks bps: bps: add support for logarithmic offsets Apr 5, 2024
@mkostoevr mkostoevr changed the title bps: bps: add support for logarithmic offsets bps: add support for logarithmic offsets Apr 5, 2024
@mkostoevr mkostoevr changed the title bps: add support for logarithmic offsets bps: add support for logarithmic offset Apr 5, 2024
The checkpatch does not permit to modify several parts of inner debug
check functions complaining about too big indentation. The modification
will be required further to implement the LogN offset in the BPS tree,
so this patch refactors the functions and introduces a helper function
for this: bps_tree_debug_insert_and_move_next.

The refactored functions are:
- bps_tree_debug_check_insert_and_move_to_right_inner
- bps_tree_debug_check_insert_and_move_to_left_inner
- bps_tree_debug_check_insert_and_move_to_right_leaf
- bps_tree_debug_check_insert_and_move_to_left_leaf

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring
New BPS tree flavors are to be introduced and tested with the existing
test suite. There're a bunch of problems though:
1. The white box test uses magic constants to performs its checks, it
   is better to use constants defined by the bps_tree.h instead.
2. The bps_tree.cc test itself is not TAP-compatible, fix this by
   introducing more assertions.
3. The bps_tree_iteartor.c test is not TAP-compatible too, is uses
   the result file to check some cases. Let's remove the manual
   printing tests and modify the automated ones to cover the removed
   cases.

By the way performed minor bps_tree.cc test refactoring.

NO_DOC=test update
NO_CHANGELOG=test update
The current tree does not allow to find offset of an element or create
an iterator to an element based on its offset. This patch is meant to
fix this by expanding the data structure with additional information
and introducing methods using it: subtree cardinalities.

A subtree cardinality is the amount of elements in it. For example,
cardinality of a leaf block is count of elements in it (effectively
it equals to leaf.header.size), cardinality of an inner block is the
sum of cardinalities of its chlidren.

The patch includes two chosable ways to store this information:
`BPS_INNER_CARD` and `BPS_INNER_CHILD_CARDS`.

The first implementation sores block cardinality in each inner block.
This implementation has minimal memory overhead (it just introduces
a new 64-bit field in `struct bps_inner`), but calculation of offsets
is not that fast, since in order to find an offset of a particular
child of an inner node we have to look into each of its children
prior to the looking one.

The second one sores an array of children cardinalities in inner
blocks. The memory overhead of this implementation is visible since
it significantly decreases the children capacity of inner blocks. The
max count in inner block is decreased from 42 to 25 for tree of 8-byte
elements with 512-byte blocks and from 25 to 18 for tree of 16-byte
elements with 512-byte blocks. Offset calcluations are faster though.

It's possible (though impractical) to enable both solutions, the tree
will use the best ways to perform offset-based tasks, but will have to
maintain both children cardinality array and inner own cardinalities.

Along with the theoretical support this patch introduces a bunch of
functions using it:
- `iterator_at(t, offset)`: gives an iterator to an element of a tree
  or tree view by its offset;
- `find_get_offset(t, key, offset_ptr)`: the same as `find` but also
  provides to the user the offset of the found element in the output
  parameter;
- `[lower|upper]_bound[_elem]_get_offset(t, key, exact, offset_ptr)`:
  the same as upper/lower bound functions but provide to the user the
  offset to the found position (end of the tree included).
- `insert_get_offset(t, new_elem, replaced, offset_ptr)`: the same as
  `insert`, but also provides the offset to the inserted element.
- `delete_get_offset(t, elem, offset_ptr)`: same as `delete`, but also
  returns offset to the deleted element prior to the deletion in the
  output parameter.

Another new function introduced is bps_tree_view_debug_check(t). This
function is similar to the bps_tree_debug_check(t), but is applicable
to tree views. It's used to adopt default tree view tests to the new
tree variations.

Each new implementation is tested by old tree tests (these now support
several tree variations selected with a C definition, the definitions
are specified in the test/unit/CMakeLists.txt).

New offset API-related test introduced (this one tests both of two
tree variations - BPS_INNER_CARD and BPS_INNER_CHILD_CARDS).

Part of tarantool#8204

NO_DOC=internal
NO_CHANGELOG=internal
These add three new configs to be tested in the benchmarks: tree with
child cardinalities enabled, with inner cardinality enabled and with
both of these.

By the way simplified the performance analisys by reducing the memory
allocation overhead (it's not required to be zero-initialized) and by
moving the test tree build into a separated function.

NO_DOC=perf test
NO_TEST=perf test
NO_CHANGELOG=perf test
Since the performance benchmarks for three additional flavors of the
BPS tree had been introduced, the amount of test in this suite has
increased to 228. Given some tests work with datasets of 10M entries,
the amount of time required to run these increased significantly.

Mitigate this by reducing the test datasets.

NO_DOC=perf test
NO_TEST=perf test
NO_CHANGELOG=perf test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants