Transform uses a composite aggregation to page through the source. Benchmarks have shown that the executed ordinary searches slow down the ingest of the source. Due to transform a refresh is triggered more often than usual, causing the source ingest to do more work than without it.
By using a point in time reader, transform will prevent causing so much churn (benchmarks have shown a potential reduction of refreshes by 50%).
Requirements
- open a pit reader at the beginning of a new checkpoint
- use pit for all searches in the checkpoint
- re-create a pit if necessary (e.g. timeout, start/stop)
- don't fail due to a broken pit
keepAlive should be kept reasonably small
- explicitly delete the pit
- after a checkpoint
- on stop
- fallback to non pit mode, in case a node older
7.10 is part of the local or remote (CCS) cluster
Design considerations
Searches are executed by ClientTransformIndexer which inherits from TransformIndexer, adding the search capabilities given a client. This needs to be enhanced to create/destroy the pit object.
Future: Given checkpoints it will be possible to prune the search query/indices and e.g. avoid calling out to frozen/cold indices.
Transform uses a composite aggregation to page through the source. Benchmarks have shown that the executed ordinary searches slow down the ingest of the source. Due to transform a
refreshis triggered more often than usual, causing the source ingest to do more work than without it.By using a point in time reader, transform will prevent causing so much churn (benchmarks have shown a potential reduction of refreshes by 50%).
Requirements
keepAliveshould be kept reasonably small7.10is part of the local or remote (CCS) clusterDesign considerations
Searches are executed by
ClientTransformIndexerwhich inherits fromTransformIndexer, adding the search capabilities given aclient. This needs to be enhanced to create/destroy the pit object.Future: Given checkpoints it will be possible to prune the search query/indices and e.g. avoid calling out to frozen/cold indices.