Re-indexing a tsdb data stream is more challenging than re-indexing a regular tsdb data stream. This is because when a new data stream is created, then the new backing index start and end time settings are blindly set to $now-2h to $now+2h. The backing indices of the existing tsdb data stream may overlap with this, or not at all. Directly re-indexing from the old tsdb data stream into the new tsdb data stream only works for documents that have a timestamp that matches with $now-2h to $now+2h.
Given that reindexing a tsdb data stream is possible, we should document how to do this. Currently no documentation around reindex a tsdb data stream exists.
The process looks something like this:
- Create a specific index template for the new data stream only that will contain the re-indexed data. Otherwise other data streams may get affected. This index template should contain the new mappings / index settings that should get applied.
- Update the template to set specific
index.time_series.start_time and index.time_series.end_time index settings. The start and end time settings should be based on the lowest and highest @timestamp values in the data stream to be reindex. This way the first backing index is fixed to contain all data that is contained in the data stream that should be reindexed.
- Update the template to set the
index.number_of_shards index setting to the sum of all primary shards of all backing indices of the data stream to be reindexed.
- Update the template to set
index.number_of_replicas to zero and unset the index.lifecycle.name index setting.
- Start the reindex operation.
- After reindexing completed then remove the
index.time_series.start_time, index.time_series.end_time index settings from the template and set index.number_of_replicas, index.number_of_shards and index.lifecycle.name to the original values.
- Invoke the rollover api without any conditions set. Now data stream should be ready accept recent data.
Re-indexing a tsdb data stream is more challenging than re-indexing a regular tsdb data stream. This is because when a new data stream is created, then the new backing index start and end time settings are blindly set to $now-2h to $now+2h. The backing indices of the existing tsdb data stream may overlap with this, or not at all. Directly re-indexing from the old tsdb data stream into the new tsdb data stream only works for documents that have a timestamp that matches with $now-2h to $now+2h.
Given that reindexing a tsdb data stream is possible, we should document how to do this. Currently no documentation around reindex a tsdb data stream exists.
The process looks something like this:
index.time_series.start_timeandindex.time_series.end_timeindex settings. The start and end time settings should be based on the lowest and highest@timestampvalues in the data stream to be reindex. This way the first backing index is fixed to contain all data that is contained in the data stream that should be reindexed.index.number_of_shardsindex setting to the sum of all primary shards of all backing indices of the data stream to be reindexed.index.number_of_replicasto zero and unset theindex.lifecycle.nameindex setting.index.time_series.start_time,index.time_series.end_timeindex settings from the template and setindex.number_of_replicas,index.number_of_shardsandindex.lifecycle.nameto the original values.