-
Notifications
You must be signed in to change notification settings - Fork 403
Labels
featureA new functionalityA new functionality
Description
Currently, we save tuples of every space in the order of its primary index. On recovery, we sort tuples in order of each secondary key with parallel qsort because B-tree can be built blazingly fast from a sorted array. However, the sorting process still can take too much time.
Suggested solution
Let's write the order of secondary indexes to the snapshot. The algorithm is:
- Iterate over each space index and write tuple addresses in its order. After this step, such entry in snapshot will appear:
+---------------------+
| [0x01, 0x02, 0x03] | - primary index
| [0x01, 0x03, 0x02] | - secondary index
+---------------------+
- After recovering the primary index (or even during its recovery), we can use its serialized representation to build a mapping (hash-table) from the old tuple addresses to the new ones. Imagine that the new primary index has such tuple addresses in this order:
[0xa3, 0xb2, 0xc1]. Then we know that the first tuple in primary index had address0x01and its new address is0xa3, the second one had0x02and the new address is0xb2and so on. So we can easily build the mapping:
{0x01: 0xa3, 0x02: 0xb2, 0x03: 0xc1}
- Update all arrays of orders in secondary indexes using the tuple address mapping - and now we have an array of actual tuples sorted in a secondary index order, so we don't need to use parallel qsort and can simply build an index. In our example, the secondary index in snapshot was
[0x01, 0x03, 0x02]and now it becomes[0xa3, 0xc1, 0xb2]- array of actual tuples in required order.
Advantages
- The solution can be implemented as a snapshot extension so that the new Tarantool versions will be able to read the old snapshot without serialized indexes. Moreover, the feature can be disabled if one wants to have a compact snapshot. And, going even further, one can serialize only chosen indexes, for example, with complex keys (such indexes take the longest time to be built on recovery).
Drawbacks and Concerns
- Currently, we sort all values after WAL recovery and then build all secondary indexes. Using this approach, we will need to build serialized indexes right after snapshot recovery and process inserts to the secondary indexes as well on WAL recovery. Won't it diminish the performance gain?
- What if a before_replace trigger updates a tuple on recovery? It seems we shouldn't use the optimization for spaces with before_replace triggers.
- Don't forget to handle force recovery - we shouldn't use the optimization for tuples that we failed to read from snapshot.
- Don't forget to take https://github.com/tarantool/tarantool-ee/issues/979 into account.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureA new functionalityA new functionality
Type
Projects
Status
Q1 2025 – Jan-Mar