Fix segment stats in tsdb#89754
Conversation
|
CC @weizijun |
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Hi @nik9000, I've created a changelog YAML for you. |
The serialization for segment stats was broken for tsdb because we return a *slightly* different sort configuration. That caused `_segments` and `_cat/segments` to break when any shard of the tsdb index is hosted on another node. Closes elastic#89609
| - match: | ||
| $body: | | ||
| /^(tsdb \s+ 0 \s+ p \s+ \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} \s+ _\d (\s\d){3} \s+ | ||
| (\d+|\d+[.]\d+)(kb|b) \s+ \d+ (\s+ (false|true)){2} \s+ \d+\.\d+(\.\d+)? \s+ (false|true) \s? \n?)$/ |
There was a problem hiding this comment.
You can reproduce @weizijun's failure like this:
while g -p qa/smoke-test-multinode/ check -Dtests.method='*segments*tsdb*'; do echo ok; done
csoulios
left a comment
There was a problem hiding this comment.
I am not familiar with the Segment class and its inner workings.
I have left couple comments/questions, but I leave it up to the other reviewers to approve this PR.
| } else { | ||
| o.writeByte((byte) 5); | ||
| o.writeOptionalBoolean(field.getMissingValue() == null ? null : field.getMissingValue() == SortField.STRING_FIRST); | ||
| o.writeBoolean(field.getReverse()); |
There was a problem hiding this comment.
Not sure if I understand this part very well. Don't we need to write a boolean for the selector?
In all other cases (type = 0) I see one byte and thre booleans are written. Not sure why type = 5 requires only 2 booleans.
There was a problem hiding this comment.
If we have a selector we use the other case above. In this case we just don't have one.
| if (missingFirst != null) { | ||
| fields[i].setMissingValue(missingFirst ? SortedSetSortField.STRING_FIRST : SortedSetSortField.STRING_LAST); | ||
| } | ||
| } else if (type == 5) { |
There was a problem hiding this comment.
Where can I see what type = 5 is? A comment would help here, a constant would be even better, I think
There was a problem hiding this comment.
Yeah. I was following the old way and it didn't have them. I didn't want to rock the boat too much here, but there isn't any reason not to try and at least make it more readable. I'll add some constants.
jpountz
left a comment
There was a problem hiding this comment.
The change makes sense to me, _tsid is our first field that we index as a single-value field which can be used for index sorting.
@jpountz could you comment on why we aren't using Lucene's serialization for these? Or why we don't make our own object rather than this brittle thing? I don't particularly want to change now, but it seems like we wouldn't try to serialize with the |
|
I would guess that one reason is because segment stats added the sorting configuration to its output before Lucene exposed serialization for SortFields, which used to be a codec implementation before. @romseygeek might have more info about whether moving to Lucene serialization would be possible and a good idea. FWIW I see we have custom serialization logic for |
Oh no! It's totally different! |
* main: (34 commits) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) Fix segment stats in tsdb (elastic#89754) Synthetic _source: support dense_vector (elastic#89840) REST tests fetching fields with synthetic _source (elastic#89888) Do not deserialize back BytesTransportRequest to clone a request in MockTransportService (elastic#89926) Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (elastic#89912) Fix SnapshotStatusApisIT.testGetSnapshotsWithSnapshotInProgress (elastic#89925) Document synthetic source for text and keyword (elastic#89893) Fix CloneSnapshotIT.testRemoveFailedCloneFromCSWithQueuedSnapshotInProgress (elastic#89914) Add missing index.mapping.total_fields.limit setting to the target index (elastic#89875) ...
The serialization for segment stats was broken for tsdb because we
return a slightly different sort configuration. That caused
_segmentsand_cat/segmentsto break when any shard of the tsdbindex is hosted on another node.
Closes #89609