Elasticsearch Version
8.13 and above
Installed Plugins
No response
Java Version
bundled
OS Version
all
Problem Description
PR #97557 introduced DownsampleShardTaskParams a data structure used by our persistent task framework to store task specific data including, in this case, downsampling tasks specific data for tasks started when a downsampling operation is carried out.
PR #98023 introduced an array of strings dimensions which is used to store the set of dimensions defined for the original index the downsampling task is operating onto. This is required because with TSID Hashing we lose the ability to decode dimensions just by decoding the _tsid field and we need to store them unencoded somewhere else to support resuming interrupted persistent tasks.
Addition of the new dimensions string array changes the format of our wire protocol which we use when serialising and deserialising instances of objects like DownsampleShardTaskParams. This kind of changes require code to handle backward compatibility with nodes running older versions of Elasticsearch which "speak" a different version of the wire protocol. The check is missing (this is the bug!) as result, newer versions of Elasticsearch try to read a boolean unconditionally and later on, if the boolean is true, an array of strings (dimensions), ignoring the fact that the boolean and string array might or might not be there. Older versions of Elasticsearch do not serialize such boolean and/or string array since that did not exist when the older version was released. This is why newer versions of Elasticsearch need the check on the wire protocol version and need to implement backward compatible behaviour.
Moreover instances of DownsampleShardTaskParams are serialised as part of the cluster state which is written/read by nodes in the cluster and which needs to be readable by new nodes running a newer version of Elasticsearch after an upgrade. This is why the upgrade process is affected.
The issue happens because a node running Elasticsearch older than 8.13 (8.10.x-8.12.x) writes such cluster state with
DownsampleShardTaskParams not including the dimensions string array. Then, after nodes start moving to a new version as a result of an upgrade to 8.13, deserialising the cluster state fails in the node running version 8.13 because the dimensions array is missing.
(NOTE: hopefully failure in deserielizing the cluster state means the node running version 8.13 will never be able to join the cluster).
Steps to Reproduce
Ideally could happen just by having at least one downsampling task starting, then upgrading to version 8.13 while the downsampling task is running. Note also that the executor is not going to restart them as a result of the failure being unrecoverable.
Logs (if relevant)
No response
Elasticsearch Version
8.13 and above
Installed Plugins
No response
Java Version
bundled
OS Version
all
Problem Description
PR #97557 introduced
DownsampleShardTaskParamsa data structure used by our persistent task framework to store task specific data including, in this case, downsampling tasks specific data for tasks started when a downsampling operation is carried out.PR #98023 introduced an array of strings
dimensionswhich is used to store the set of dimensions defined for the original index the downsampling task is operating onto. This is required because with TSID Hashing we lose the ability to decode dimensions just by decoding the _tsid field and we need to store them unencoded somewhere else to support resuming interrupted persistent tasks.Addition of the new
dimensionsstring array changes the format of our wire protocol which we use when serialising and deserialising instances of objects likeDownsampleShardTaskParams. This kind of changes require code to handle backward compatibility with nodes running older versions of Elasticsearch which "speak" a different version of the wire protocol. The check is missing (this is the bug!) as result, newer versions of Elasticsearch try to read a boolean unconditionally and later on, if the boolean is true, an array of strings (dimensions), ignoring the fact that the boolean and string array might or might not be there. Older versions of Elasticsearch do not serialize such boolean and/or string array since that did not exist when the older version was released. This is why newer versions of Elasticsearch need the check on the wire protocol version and need to implement backward compatible behaviour.Moreover instances of
DownsampleShardTaskParamsare serialised as part of the cluster state which is written/read by nodes in the cluster and which needs to be readable by new nodes running a newer version of Elasticsearch after an upgrade. This is why the upgrade process is affected.The issue happens because a node running Elasticsearch older than 8.13 (8.10.x-8.12.x) writes such cluster state with
DownsampleShardTaskParamsnot including thedimensionsstring array. Then, after nodes start moving to a new version as a result of an upgrade to 8.13, deserialising the cluster state fails in the node running version 8.13 because thedimensionsarray is missing.(NOTE: hopefully failure in deserielizing the cluster state means the node running version 8.13 will never be able to join the cluster).
Steps to Reproduce
Ideally could happen just by having at least one downsampling task starting, then upgrading to version 8.13 while the downsampling task is running. Note also that the executor is not going to restart them as a result of the failure being unrecoverable.
Logs (if relevant)
No response