Skip to content

Finish the CE part of the logarithmic offset RFC#10317

Merged
locker merged 4 commits intotarantool:masterfrom
mkostoevr:m.kostoev/gh-8204-select
Sep 13, 2024
Merged

Finish the CE part of the logarithmic offset RFC#10317
locker merged 4 commits intotarantool:masterfrom
mkostoevr:m.kostoev/gh-8204-select

Conversation

@mkostoevr
Copy link
Contributor

@mkostoevr mkostoevr commented Jul 26, 2024

This patchset completes the CE part of the logarithmic offset RFC:

  1. accelerate the select with offset in memtx;
  2. introduce the offset parameter of index:pairs method;
  3. introduce the index:offset_of method.

Closes #8204

@mkostoevr mkostoevr added the do not merge Not ready to be merged label Jul 26, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from f976931 to 5b982c5 Compare July 26, 2024 15:45
@coveralls
Copy link

coveralls commented Jul 26, 2024

Coverage Status

coverage: 87.298% (-0.005%) from 87.303%
when pulling d7ef8f5 on mkostoevr:m.kostoev/gh-8204-select
into be709c6
on tarantool:master
.

@mkostoevr mkostoevr self-assigned this Jul 26, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch 9 times, most recently from b879ab6 to 2541bbd Compare August 1, 2024 11:46
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch 6 times, most recently from 11f1ff5 to 009b15a Compare August 2, 2024 09:43
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from 009b15a to 68c7c76 Compare August 10, 2024 19:52
@mkostoevr mkostoevr changed the title Accelerate the select with offset in memtx Finish the CE part of the logarithmic offset RFC Aug 23, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch 2 times, most recently from 1ea421e to 8e12b82 Compare August 23, 2024 15:44
@mkostoevr mkostoevr marked this pull request as ready for review August 26, 2024 11:16
@mkostoevr mkostoevr requested a review from a team as a code owner August 26, 2024 11:16
@mkostoevr mkostoevr requested review from drewdzzz and locker August 26, 2024 11:35
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from ca35e20 to b16dff6 Compare September 3, 2024 11:38
@drewdzzz drewdzzz removed their assignment Sep 3, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch 2 times, most recently from e1539da to 666f401 Compare September 4, 2024 08:44
@mkostoevr mkostoevr assigned locker and unassigned mkostoevr Sep 4, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from 666f401 to c5a2009 Compare September 5, 2024 09:12
@mkostoevr mkostoevr assigned mkostoevr and unassigned locker Sep 5, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from c5a2009 to 3a67264 Compare September 5, 2024 09:58
@mkostoevr mkostoevr assigned locker, mkostoevr and drewdzzz and unassigned mkostoevr and locker Sep 5, 2024
@drewdzzz drewdzzz assigned mkostoevr and unassigned drewdzzz Sep 5, 2024
@mkostoevr mkostoevr force-pushed the m.kostoev/gh-8204-select branch from 3a67264 to 8702ece Compare September 9, 2024 07:11
Copy link
Contributor

@drewdzzz drewdzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, LGTM!

@mkostoevr
Copy link
Contributor Author

@lenkis, please take a look at the new version of the changelogs/unreleased/gh-8204-select-offset-count-perf.md.

This method is used in order to perform the iterator creation with
the offset specified. The generic implementation performs the regular
iterator creation and skips the specified amount of tuples checking
the fiber slice in the way, but now it's possible to implement an
accelerated version of select with offset if an index supports that.

Part of tarantool#8204

NO_DOC=internal
NO_TEST=covered by existing tests
NO_CHANGELOG=internal
@mkostoevr
Copy link
Contributor Author

mkostoevr commented Sep 11, 2024

Rebased and dropped the GAP_COUNT destruction since the #10160 (which removes all the global gaps including the GAP_COUNT on DDL along with stories) is merged.

--- a/src/box/memtx_tx.c
+++ b/src/box/memtx_tx.c
@@ -340,12 +340,6 @@ memtx_tx_count_gap_item_new(struct txn *txn, enum iterator_type type,
 			    const char *key, uint32_t part_count,
 			    struct tuple *until, hint_t until_hint);
 
-/**
- * Destroy the count gap item.
- */
-static void
-memtx_tx_count_gap_item_destroy(struct gap_item_base *base);
-
 /**
  * Helper structure for searching for point_hole_item in the hash table,
  * @sa point_hole_item_pool.
@@ -2983,7 +2977,6 @@ memtx_tx_delete_gap(struct gap_item_base *item)
 		pool = &txm.nearby_gap_item_mempoool;
 		break;
 	case GAP_COUNT:
-		memtx_tx_count_gap_item_destroy(item);
 		pool = &txm.count_gap_item_mempool;
 		break;
 	case GAP_FULL_SCAN:
@@ -3372,7 +3365,11 @@ memtx_tx_nearby_gap_item_new(struct txn *txn, enum iterator_type type,
 }
 
 /**
- * Allocate and create count gap item.
+ * Allocate and create count gap item. The @a until tuple's story must have
+ * a gap item from the @a txn transaction or be tracked by it, so the story
+ * is not deleted by the garbage collector and the tuple is not deleted (if
+ * it's not NULL).
+ *
  * Note that in_read_gaps base member must be initialized later.
  */
 static struct count_gap_item *
@@ -3392,23 +3389,10 @@ memtx_tx_count_gap_item_new(struct txn *txn, enum iterator_type type,
 				      sizeof(item->short_key), &item->key_len);
 	item->until = until;
 	item->until_hint = until_hint;
-	if (item->until != NULL)
-		tuple_ref(item->until);
 
 	return item;
 }
 
-/**
- * Destroy the count gap item.
- */
-static void
-memtx_tx_count_gap_item_destroy(struct gap_item_base *base)
-{
-	struct count_gap_item *item = (struct count_gap_item *)base;
-	if (item->until != NULL)
-		tuple_unref(item->until);
-}
-
 /**
  * Allocate and create full scan gap item.
  * Note that in_read_gaps base member must be initialized later.
@@ -3465,6 +3449,8 @@ memtx_tx_track_gap_slow(struct txn *txn, struct space *space, struct index *inde
  * tuples are skipped by a transaction without reading.
  *
  * @return the amount of invisible tuples counted.
+ *
+ * @pre The @a until tuple (if not NULL) must be clarified by @a txn.
  */
 uint32_t
 memtx_tx_track_count_until_slow(struct txn *txn, struct space *space,
diff --git a/src/box/memtx_tx.h b/src/box/memtx_tx.h
index dae5daf593..82ef370504 100644
--- a/src/box/memtx_tx.h
+++ b/src/box/memtx_tx.h
@@ -356,6 +356,8 @@ memtx_tx_track_count(struct txn *txn, struct space *space,
  * NB: can trigger story garbage collection.
  *
  * @return the amount of invisible tuples counted.
+ *
+ * @pre The @a until tuple (if not NULL) must be clarified by @a txn.
  */
 static inline uint32_t
 memtx_tx_track_count_until(struct txn *txn, struct space *space,

This patch rewrites the memtx tree iterator creation path so it handles
the offset parameter taking advantage of offset-based functions of the
BPS tree with BPS_INNER_CARD config.

It also introduces the memtx_tx_index_invisible_count_matching_until
function which counts the amount of tuples matching with a key and
iterator type until a tuple that are invisible to a transaction.

Since the invisible count is required in order to update the tree
iterator, the garbage collection is not performed in it anymore, it
is performed in the `index_size` methods of memtx indexes instead.

Part of tarantool#8204

@TarantoolBot document
Title: select with offset and count are now logarithmic in memtx tree
Product: Tarantool
Since: 3.3.0

Root document: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_index/count/#lua-function.index_object.count

> Iterate over an index, counting the number of tuples which match
> the key-value.

This is no longer true for the memtx tree index, where count has a
logarithmic complexity (`O(log(size))` where `size` is the amount of
tuples in the index) and it does not iterate over the index.

---

Root document: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_index/select/#lua-function.index_object.select

> `offset` – the number of tuples to skip (use this parameter carefully
> when scanning large data sets).

The part about large datasets is not actual for the memtx tree index
now, since the complexity of select with offset there is logarithmic
too (it does not scan the space either).

> Use the offset option carefully when scanning large data sets as it
> linearly increases the number of scanned tuples and leads to a full
> space scan. Instead, you can use the after and fetch_pos options.

Ditto.

---

Root document: https://www.tarantool.io/en/doc/latest/platform/engines/memtx_vinyl_diff/

> count() function 	Takes a constant amount of time

Now it takes logarithmic time as mentioned above.
The purpose of the argument is identical to the one in the `select`:
it makes the iterator skip first N tuples prior to return.

The `box_index_iterator_after` is removed as unused, since it's only
been used by the Lua site for the `pairs` iterator creation.

Part of tarantool#8204

@TarantoolBot document
Title: A new `offset` parameter of the `index:pairs` method
Product: Tarantool
Since: 3.3.0

Now the `index:pairs` method supports the `offset` parameter which may
be used to skip the specified amount of tuples of the iterator. If the
skip caused the iterator to exhaust, it behaves as expected (no tuple
is returned).
The method is used to get the (potential) 1-based iterator-relative
position in the index of a tuple matching a key.

Closes tarantool#8204

@TarantoolBot document
Title: A new `index:offset_of` method introduced
Product: Tarantool
Since: 3.3.0

The method returns 0-based offset in the index of a first tuple
matching the provided key and iterator. The position is counted from
the beginning or end of the space depending on the iterator direction,
for example:

``` lua
-- index: {{1}, {3}}
index:offset_of({3}, {iterator = 'eq'})  -- returns 1: [1, <3>]
index:offset_of({3}, {iterator = 'req'}) -- returns 0: [<3>, 1]
```

In case there's no tuple matching the key and iterator in the index the
function returns the position a matching tuple would be placed at if
existed, for example:

``` lua
-- index: {{1}, {3}}
index:offset_of({2}, {iterator = 'eq'})  -- 1: [1, <2>, 3]
index:offset_of({4}, {iterator = 'req'}) -- 0: [<4>, 3, 1]
```

This works with any iterator:

``` lua
-- index: {{1}, {3}}
index:offset_of({0}, {iterator = 'ge'}) -- 0: [<1>, 3]
index:offset_of({1}, {iterator = 'lt'}) -- 2: [3, 1, <...>]

-- index: {{'b'}, {'bb'}, {'bc'}, {'c'}, {'cc'}}
index:offset_of({'b'},  {iterator = 'np'}) -- 3: [b, bb, bc, <c>, cc]
index:offset_of({'cc'}, {iterator = 'pp'}) -- 1: [cc, <c>, bc, bb, b]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize select with offset option

7 participants