Skip to content

Fixed wrong malformed value ordering in synthetic source tests#143187

Merged
Kubik42 merged 4 commits intoelastic:mainfrom
Kubik42:text-fix
Feb 27, 2026
Merged

Fixed wrong malformed value ordering in synthetic source tests#143187
Kubik42 merged 4 commits intoelastic:mainfrom
Kubik42:text-fix

Conversation

@Kubik42
Copy link
Copy Markdown
Contributor

@Kubik42 Kubik42 commented Feb 26, 2026

In #142357, we changed the storage medium for malformed values from stored fields to binary doc values. Since binary doc values are sorted, the synthesized source will return malformed values in a different order than initially given in the document.

The synthesized source is still correct, its just the order of values has changed.

This ordering impacted some tests, resulting in test failures like:

java.lang.AssertionError:
Expected: "{"field":["1970-01-26T16:26:11.484157088Z","qQiA","WGrXjZXV","uZ"]}"
but: was "{"field":["1970-01-26T16:26:11.484157088Z","WGrXjZXV","qQiA","uZ"]}"

Note, the CI was green in the original PR. In fact, I reran it multiple times and nothing ever failed.

To address the difference in sort order, I've introduced SyntheticSourceMalformedValueSorter, which uses XContentDataHelper.encodeToken() to get the same byte representation the index uses for a value. Then it uses BytesRef.compareTo to match the loader (which sorts values based on their encoded BytesRef).

Addresses:

To make sure I didn't miss anything, I ran each of the failing tests with -Dtests.iters=100:

./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.IpFieldMapperTests.testSyntheticSourceWithTranslogSnapshot" -Druntime.java=25 -Dtests.iters=100
./gradlew ":modules:mapper-extras:test" --tests "org.elasticsearch.index.mapper.extras.ScaledFloatFieldMapperTests.testSyntheticSourceWithTranslogSnapshot" -Druntime.java=25 -Dtests.iters=100
./gradlew :server:test --tests 'org.elasticsearch.index.mapper.ByteFieldMapperTests.testSyntheticSourceMany' --tests 'org.elasticsearch.index.mapper.ShortFieldMapperTests.testSyntheticSourceMany' -Dtests.iters=100
./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.GeoPointFieldMapperTests.testSyntheticSourceWithTranslogSnapshot" -Druntime.java=25 -Dtests.iters=100
./gradlew ":modules:mapper-extras:test" --tests "org.elasticsearch.index.mapper.extras.ScaledFloatFieldMapperTests.testSyntheticSourceInObject" -Druntime.java=25 -Dtests.iters=100
./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.DateFieldMapperTests.testSyntheticSourceWithTranslogSnapshot" -Druntime.java=25 -Dtests.iters=100

I also ran all the synthetic source tests under org.elasticsearch.index.mapper and org.elasticsearch.index.mapper.extras:

./gradlew :server:test --tests 'org.elasticsearch.index.mapper.*.testSyntheticSource*' -Druntime.java=25 -Dtests.iters=10
./gradlew :modules:mapper-extras:test --tests 'org.elasticsearch.index.mapper.extras.*.testSyntheticSource*' -Druntime.java=25 -Dtests.iters=100

@Kubik42 Kubik42 marked this pull request as ready for review February 27, 2026 00:29
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@martijnvg
Copy link
Copy Markdown
Member

I think #143207 will also be fixed by this PR.

@romseygeek
Copy link
Copy Markdown
Contributor

If I'm understanding this correctly, the change to store values in binary doc values means that the order in which values are returned in the reconstructed source has also changed? It this a BWC issue? Or do we not make any guarantees about malformed values in arrays?

@martijnvg
Copy link
Copy Markdown
Member

It this a BWC issue? Or do we not make any guarantees about malformed values in arrays?

I don't think this is a real bwc issue. There is a list of differences compared to stored _source. The guarantee of array ordering depends on synthetic_source_keep option. In case that is set to array (which is the default for logsdb), we should keep ordering. However (even before #142357), with ignore above and ignore malformed we don't. And malformed and ignored values are appended after non ignored / malformed array elements. This is kind of undocumented behaviour. Now this undocumented behaviour changes, and the order in which the ignored and malformed values are appended change, but they still appear after the non ignored / malformed values.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ivancea
Copy link
Copy Markdown
Contributor

ivancea commented Feb 27, 2026

This looks related too: #143203

@Kubik42 Kubik42 merged commit 608c725 into elastic:main Feb 27, 2026
6 of 9 checks passed
@Kubik42 Kubik42 deleted the text-fix branch February 27, 2026 18:32
szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 27, 2026
…cations

* upstream/main:
  Warn on API key version mismatch (elastic#143127)
  Fixed wrong malformed value ordering in synthetic source tests (elastic#143187)
  [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (elastic#143077)
  Add configureBenchmarkLogging calls across the various benchmarks (elastic#143185)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_aggregate_metric_double_implicit_casting} elastic#143292
  Give system role permission to invoke shard refresh (elastic#143190)
  Mute testSyntheticSourceWithTranslogSnapshot (elastic#143260)
  Adds ResumeInfo Tests (elastic#142769)
  Use a static method to configure benchmark logging (elastic#143056)
  add connectors release notes (elastic#142884)
  Add CI triage guidance for AI agents (elastic#142994)
  ESQL: Data sources: ZSTD, BZIP2 (elastic#143228)
  [ES|QL] Channels issue when an agg is called with the same field (elastic#142180) (elastic#142269)
  Add support for project routing in reindex requests (elastic#142240)
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants