Fixed wrong malformed value ordering in synthetic source tests#143187
Fixed wrong malformed value ordering in synthetic source tests#143187Kubik42 merged 4 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
I think #143207 will also be fixed by this PR. |
|
If I'm understanding this correctly, the change to store values in binary doc values means that the order in which values are returned in the reconstructed source has also changed? It this a BWC issue? Or do we not make any guarantees about malformed values in arrays? |
I don't think this is a real bwc issue. There is a list of differences compared to stored _source. The guarantee of array ordering depends on |
|
This looks related too: #143203 |
…cations * upstream/main: Warn on API key version mismatch (elastic#143127) Fixed wrong malformed value ordering in synthetic source tests (elastic#143187) [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (elastic#143077) Add configureBenchmarkLogging calls across the various benchmarks (elastic#143185) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_aggregate_metric_double_implicit_casting} elastic#143292 Give system role permission to invoke shard refresh (elastic#143190) Mute testSyntheticSourceWithTranslogSnapshot (elastic#143260) Adds ResumeInfo Tests (elastic#142769) Use a static method to configure benchmark logging (elastic#143056) add connectors release notes (elastic#142884) Add CI triage guidance for AI agents (elastic#142994) ESQL: Data sources: ZSTD, BZIP2 (elastic#143228) [ES|QL] Channels issue when an agg is called with the same field (elastic#142180) (elastic#142269) Add support for project routing in reindex requests (elastic#142240)
In #142357, we changed the storage medium for malformed values from stored fields to binary doc values. Since binary doc values are sorted, the synthesized source will return malformed values in a different order than initially given in the document.
The synthesized source is still correct, its just the order of values has changed.
This ordering impacted some tests, resulting in test failures like:
Note, the CI was green in the original PR. In fact, I reran it multiple times and nothing ever failed.
To address the difference in sort order, I've introduced
SyntheticSourceMalformedValueSorter, which usesXContentDataHelper.encodeToken()to get the same byte representation the index uses for a value. Then it usesBytesRef.compareToto match the loader (which sorts values based on their encodedBytesRef).Addresses:
To make sure I didn't miss anything, I ran each of the failing tests with
-Dtests.iters=100:I also ran all the synthetic source tests under
org.elasticsearch.index.mapperandorg.elasticsearch.index.mapper.extras: