Elasticsearch Version
8.10, 8.11
Installed Plugins
No response
Java Version
bundled
OS Version
Darwin
Problem Description
A tsds backing index has configured a START_TIME and END_TIME denoting the time bounds for the data they will host. The END_TIME in particular is configured based on the index.look_ahead_time setting.
All writes against a TSDS will be routed based on the document @timestamp to the correct backing index according to each index's START/END time configuration.
We can easily simulate a situation where we rollover the data stream and the now timestamp will now not be routed to the write index anymore but to the second generation index (because the second generation index's END_TIME configuration has not lapsed yet). This would normally not be a problem but if the index is read-only the write will fail.
The index could be read-only because it was downsampled. For downsampling in particular we should delay downsampling the backing index until the configured END_TIME for the backing index has lapsed. Note that a similar situation could be encountered if, say, a searchable_snapshot action is used instead of downsampling - however we should probably treat that separately.
This problem is present both in ILM and data stream lifecycle.
In ILM, currently, the only workaround is to increase the min_age of the phase where downsampling is configured, sufficiently such that the look_ahead_time for the backing indices will have lapsed by the time the index transitions to the phase where downsampling is configured.
Steps to Reproduce
PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "5s"
}
}
PUT _component_template/test-mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"format": "epoch_millis",
"type": "date"
},
"metricKey": {
"time_series_dimension": true,
"type": "keyword"
},
"value": {
"time_series_metric": "gauge",
"type": "float"
}
}
}
}
}
PUT _component_template/test-settings
{
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "test-lifecycle"
},
"look_ahead_time": "10m"
}
}
}
}
PUT _index_template/test1-template
{
"priority": 500,
"template": {
"settings": {
"index": {
"mode": "time_series"
}
}
},
"index_patterns": [
"test1*"
],
"data_stream": {
"hidden": false,
"allow_custom_routing": false
},
"composed_of": [
"test-settings",
"test-mappings"
]
}
PUT _ilm/policy/test-lifecycle
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"downsample": {
"fixed_interval": "5m"
},
"rollover": {
"max_primary_shard_size": "25gb",
"max_age": "1h",
"max_docs": 2
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "7d",
"actions": {
"delete": {
"delete_searchable_snapshot": true
}
}
}
}
}
}
===============
Get `now` date
❯ date +"%s%3N"
1695131067275
================
POST _bulk
{"create": {"_index": "test1ds2"}}
{"metricKey": "thekey1", "@timestamp": "1695131067275", "value": 1, "instance": "server:9100", "job": "node", "env": "hml", "__name__": "up"}
== rollover the data stream
POST test1ds2/_rollover
==Get `now` date
❯ date +"%s%3N"
1695131118582
POST _bulk
{"create": {"_index": "test1ds2"}}
{"metricKey": "thekey2", "@timestamp": "1695131118582", "value": 1, "instance": "server:9100", "job": "node", "env": "hml", "__name__": "up"}
The result is
{
"errors": true,
"took": 0,
"items": [
{
"create": {
"_index": "test1ds2",
"_id": null,
"status": 403,
"error": {
"type": "cluster_block_exception",
"reason": "index [downsample-5m-.ds-test1ds2-2023.09.19-000001] blocked by: [FORBIDDEN/8/index write (api)];"
}
}
}
]
}
Logs (if relevant)
No response
Elasticsearch Version
8.10, 8.11
Installed Plugins
No response
Java Version
bundled
OS Version
Darwin
Problem Description
A tsds backing index has configured a
START_TIMEandEND_TIMEdenoting the time bounds for the data they will host. TheEND_TIMEin particular is configured based on theindex.look_ahead_timesetting.All writes against a TSDS will be routed based on the document
@timestampto the correct backing index according to each index's START/END time configuration.We can easily simulate a situation where we rollover the data stream and the
nowtimestamp will now not be routed to the write index anymore but to the second generation index (because the second generation index'sEND_TIMEconfiguration has not lapsed yet). This would normally not be a problem but if the index is read-only the write will fail.The index could be read-only because it was downsampled. For downsampling in particular we should delay downsampling the backing index until the configured
END_TIMEfor the backing index has lapsed. Note that a similar situation could be encountered if, say, asearchable_snapshotaction is used instead ofdownsampling- however we should probably treat that separately.This problem is present both in ILM and data stream lifecycle.
In ILM, currently, the only workaround is to increase the
min_ageof the phase where downsampling is configured, sufficiently such that thelook_ahead_timefor the backing indices will have lapsed by the time the index transitions to the phase where downsampling is configured.Steps to Reproduce
The result is
Logs (if relevant)
No response