Skip to content

Move sort & join operations to SQLite when possible #1317

@norberttech

Description

@norberttech

This is an experimental feature that might not work at all.

So the idea is to whenever possible (or when configured) to offload those heavy tasks to better optimized engine.

We should start from sorting since its much easier. What needs to happen is following:

  • create sqlite db with columns reflecting sort by columns, and their types.
  • serialize the entire row and save it as serialized string in blob column
  • read from sorted results of SQL query and yield out unserialized row tomake sure nothing changed

One important thing here is that sqlite db should be created and configured in a way that allows for:

  • setting maximum allowed memory consumption
  • we need to allow to pass serializer that would allow encrypt serialized row in order to avoid any data leaks
  • once reading is done db should be removed
  • to defines where that temporary db is created (in memory should also be an option)

Once a proof of concept is created we should measure it performance and compare with native sort. The main bottleneck might happen at writing/reading to/from sqlite.

My recommendation would be to start with sqlite3 extension and if it works provide also pdo_sqlite alternative.

I would make it part of the core/etl and detect if sqlite is even available

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions