Skip to content

Save order of secondary indexes to snapshot #10847

@drewdzzz

Description

@drewdzzz

Currently, we save tuples of every space in the order of its primary index. On recovery, we sort tuples in order of each secondary key with parallel qsort because B-tree can be built blazingly fast from a sorted array. However, the sorting process still can take too much time.

Suggested solution

Let's write the order of secondary indexes to the snapshot. The algorithm is:

  1. Iterate over each space index and write tuple addresses in its order. After this step, such entry in snapshot will appear:
+---------------------+
| [0x01, 0x02, 0x03]  | - primary index
| [0x01, 0x03, 0x02]  | - secondary index
+---------------------+
  1. After recovering the primary index (or even during its recovery), we can use its serialized representation to build a mapping (hash-table) from the old tuple addresses to the new ones. Imagine that the new primary index has such tuple addresses in this order: [0xa3, 0xb2, 0xc1]. Then we know that the first tuple in primary index had address 0x01 and its new address is 0xa3, the second one had 0x02 and the new address is 0xb2 and so on. So we can easily build the mapping:
{0x01: 0xa3, 0x02: 0xb2, 0x03: 0xc1}
  1. Update all arrays of orders in secondary indexes using the tuple address mapping - and now we have an array of actual tuples sorted in a secondary index order, so we don't need to use parallel qsort and can simply build an index. In our example, the secondary index in snapshot was [0x01, 0x03, 0x02] and now it becomes [0xa3, 0xc1, 0xb2] - array of actual tuples in required order.

Advantages

  1. The solution can be implemented as a snapshot extension so that the new Tarantool versions will be able to read the old snapshot without serialized indexes. Moreover, the feature can be disabled if one wants to have a compact snapshot. And, going even further, one can serialize only chosen indexes, for example, with complex keys (such indexes take the longest time to be built on recovery).

Drawbacks and Concerns

  1. Currently, we sort all values after WAL recovery and then build all secondary indexes. Using this approach, we will need to build serialized indexes right after snapshot recovery and process inserts to the secondary indexes as well on WAL recovery. Won't it diminish the performance gain?
  2. What if a before_replace trigger updates a tuple on recovery? It seems we shouldn't use the optimization for spaces with before_replace triggers.
  3. Don't forget to handle force recovery - we shouldn't use the optimization for tuples that we failed to read from snapshot.
  4. Don't forget to take https://github.com/tarantool/tarantool-ee/issues/979 into account.

Metadata

Metadata

Assignees

Labels

featureA new functionality

Projects

Status

Q1 2025 – Jan-Mar

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions