Save order of secondary indexes to snapshot

Currently, we save tuples of every space in the order of its primary index. On recovery, we sort tuples in order of each secondary key with parallel qsort because B-tree can be built blazingly fast from a sorted array. However, the sorting process still can take too much time.

## Suggested solution

Let's write the order of secondary indexes to the snapshot. The algorithm is:
1. Iterate over each space index and write tuple addresses in its order. After this step, such entry in snapshot will appear:
```
+---------------------+
| [0x01, 0x02, 0x03]  | - primary index
| [0x01, 0x03, 0x02]  | - secondary index
+---------------------+
```

2. After recovering the primary index (or even during its recovery), we can use its serialized representation to build a mapping (hash-table) from the old tuple addresses to the new ones. Imagine that the new primary index has such tuple addresses in this order: `[0xa3, 0xb2, 0xc1]`. Then we know that the first tuple in primary index had address `0x01` and its new address is `0xa3`, the second one had `0x02` and the new address  is `0xb2` and so on. So we can easily build the mapping:
```
{0x01: 0xa3, 0x02: 0xb2, 0x03: 0xc1}
```

3. Update all arrays of orders in secondary indexes using the tuple address mapping - and now we have an array of actual tuples sorted in a secondary index order, so we don't need to use parallel qsort and can simply build an index. In our example, the secondary index in snapshot was `[0x01, 0x03, 0x02]` and now it becomes `[0xa3, 0xc1, 0xb2]` - array of actual tuples in required order.

### Advantages

1. The solution can be implemented as a snapshot extension so that the new Tarantool versions will be able to read the old snapshot without serialized indexes. Moreover, the feature can be disabled if one wants to have a compact snapshot. And, going even further, one can serialize only chosen indexes, for example, with complex keys (such indexes take the longest time to be built on recovery).

### Drawbacks and Concerns

1. Currently, we sort all values after WAL recovery and then build all secondary indexes. Using this approach, we will need to build serialized indexes right after snapshot recovery and process inserts to the secondary indexes as well on WAL recovery. Won't it diminish the performance gain?
2. What if a before_replace trigger updates a tuple on recovery? It seems we shouldn't use the optimization for spaces with before_replace triggers.
3. Don't forget to handle force recovery - we shouldn't use the optimization for tuples that we failed to read from snapshot.
4. Don't forget to take https://github.com/tarantool/tarantool-ee/issues/979 into account.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save order of secondary indexes to snapshot #10847

Suggested solution

Advantages

Drawbacks and Concerns

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Save order of secondary indexes to snapshot #10847

Description

Suggested solution

Advantages

Drawbacks and Concerns

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions