Validation of the `_source` to reject contradicting and ambiguous requests by urmichm · Pull Request #20742 · opensearch-project/OpenSearch

urmichm · 2026-02-27T11:42:54Z

Description

Validation of the _source object added to avoid confusion and reject contradicting and ambiguous requests:

"_source": {} At lease one of includes or excludes must be defined. 🚫
"_source": [] Explicitly defined empty array of includes is not allowed. 🚫
"_source": { "includes": "text", "excludes": ["title", "text"] } The text field is defined in both includes and excludes. Contradiction. 🚫
"_source": { "includes": [], "excludes": ["title"] } or _source: { "includes": ["title"], "excludes": [] } Explicitly defined empty arrays are not allowed to avoid confusion. 🚫

To include the whole _source object or to exclude it completely, the existing boolean logic is encouraged.
"_source": true and "_source": false

Unit tests added to test behaviour of parsing implementation.
Existing unit tests have been extended to cover invalid cases.

Related Issues

Resolves #20612

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-27T11:44:18Z

PR Reviewer Guide 🔍

(Review updated until commit `706f30b`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Core _source validation logic and unit tests Relevant files: server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java server/src/test/java/org/opensearch/search/fetch/subphase/FetchSourceContextTests.java Sub-PR theme: gRPC and REST API integration tests for _source validation Relevant files: modules/transport-grpc/src/test/java/org/opensearch/transport/grpc/proto/request/common/FetchSourceContextProtoUtilsTests.java rest-api-spec/src/main/resources/rest-api-spec/test/search/10_source_filtering.yml
⚡ Recommended focus areas for review Breaking Change The `validateAmbiguousFields()` call in the constructor and the new empty-array/empty-object restrictions in `fromXContent` and `parseSourceFieldArray` are breaking changes. Existing users who pass empty arrays or overlapping includes/excludes (even programmatically via the public constructor) will now get exceptions. This could break backward compatibility for existing integrations and stored queries. public FetchSourceContext(boolean fetchSource, String[] includes, String[] excludes) { this.fetchSource = fetchSource; this.includes = includes == null ? Strings.EMPTY_ARRAY : includes; this.excludes = excludes == null ? Strings.EMPTY_ARRAY : excludes; validateAmbiguousFields(); } public FetchSourceContext(boolean fetchSource) { this(fetchSource, Strings.EMPTY_ARRAY, Strings.EMPTY_ARRAY); } public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); } /** * The same entry cannot be both included and excluded in _source. * Since the constructors are public, this validation is required to be called in the constructor. * / private void validateAmbiguousFields() { Set<String> includeSet = new HashSet<>(Arrays.asList(this.includes)); for (String exclude : this.excludes) { if (includeSet.contains(exclude)) { throw new OpenSearchException(AMBIGUOUS_FIELD_MESSAGE, exclude); } } } StreamInput Bypass* The `FetchSourceContext(StreamInput in)` constructor reads fields directly and does not call `validateAmbiguousFields()`. This means deserialized instances from the wire (e.g., from older nodes in a mixed-version cluster) bypass the new validation, creating an inconsistency where the same data can be valid when deserialized but invalid when constructed directly. public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); } Public API Exposure `parseSourceObject` is now a `public static` method, which exposes an internal parsing detail as a public API. This seems unintentional and may cause issues if callers invoke it with a non-START_OBJECT token. It was previously private logic inside `fromXContent`. public static FetchSourceContext parseSourceObject(XContentParser parser) throws IOException { XContentParser.Token token = parser.currentToken(); Set<String> includes = Collections.emptySet(); Set<String> excludes = Collections.emptySet(); String currentFieldName = null; if (token != XContentParser.Token.START_OBJECT) { throw new ParsingException( parser.getTokenLocation(), "Expected a " + XContentParser.Token.START_OBJECT + " but got a " + token + " in [" + parser.currentName() + "]." ); } while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { if (token == XContentParser.Token.FIELD_NAME) { currentFieldName = parser.currentName(); continue; // only field name is required in this iteration } if (currentFieldName == null) { throw new ParsingException( parser.getTokenLocation(), "Expected a field name but got a " + token + " in [" + parser.currentName() + "]." ); } // process field value switch (token) { case XContentParser.Token.START_ARRAY -> { if (INCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { includes = parseSourceFieldArray(parser, INCLUDES_FIELD, excludes); } else if (EXCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { excludes = parseSourceFieldArray(parser, EXCLUDES_FIELD, includes); } else { throw new ParsingException( parser.getTokenLocation(), "Unknown key for a " + token + " in [" + currentFieldName + "]." ); } } case XContentParser.Token.VALUE_STRING -> { if (INCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { String includeEntry = parser.text(); if (excludes.contains(includeEntry)) { throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, includeEntry); } includes = Collections.singleton(includeEntry); } else if (EXCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { String excludeEntry = parser.text(); if (includes.contains(excludeEntry)) { throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, excludeEntry); } excludes = Collections.singleton(excludeEntry); } else { throw new ParsingException( parser.getTokenLocation(), "Unknown key for a " + token + " in [" + currentFieldName + "]." ); } } default -> { throw new ParsingException(parser.getTokenLocation(), "Unknown key for a " + token + " in [" + currentFieldName + "]."); } } } if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } return new FetchSourceContext(true, includes.toArray(new String[0]), excludes.toArray(new String[0])); } Duplicate Entry The same changelog entry for `_source` validation is added twice — once under the first section (line 14) and once under a second section (line 32). One of these entries should be removed. - Add validation of the `_source` object to reject contradicting and ambiguous requests. ([#20612](https://github.com/opensearch-project/OpenSearch/issues/20612)) Duplicate Deduplication The test `testFetchSourceArray` adds "include2" twice and asserts `result.includes().length == 2` (no duplicates). However, the old implementation using `ArrayList` would have kept duplicates. This test implicitly validates that `LinkedHashSet` deduplication is now applied to top-level arrays, but this behavior change is not documented or explicitly called out as intentional. public void testFetchSourceArray() throws IOException { final XContentBuilder source = XContentFactory.jsonBuilder() .startObject() .field("_source") .startArray() .value("include1") .value("include2") .value("include2") .endArray() .endObject(); final XContentParser parser = createSourceParser(source); FetchSourceContext result = FetchSourceContext.fromXContent(parser); assertTrue(result.fetchSource()); // fetch source // validate includes assertEquals(2, result.includes().length); // no duplicates assertTrue(Arrays.asList(result.includes()).containsAll(Arrays.asList("include1", "include2"))); // validate no excludes assertEquals(0, result.excludes().length); }

github-actions · 2026-02-27T11:44:53Z

PR Code Suggestions ✨

Latest suggestions up to 706f30b

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, meaning deserialized instances can bypass the ambiguity validation. If a node running an older version serializes a `FetchSourceContext` with conflicting includes/excludes and sends it to a node running the new version, the new node will accept it silently. Add the validation call here as well. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow invalid states when deserializing data from older nodes. Adding the validation call is a legitimate correctness concern, though the cross-version scenario is somewhat edge-case.	Low
General	Improve empty object validation accuracy The validation only checks if both `includes` and `excludes` are empty, but it doesn't account for the case where unrecognized field names were encountered (e.g., `{"unknown_field": ["val"]}`). In that scenario, the sets remain empty and the error message is misleading. Consider tracking whether any recognized fields were parsed to provide a more accurate error. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264] if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" + "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "] but found none" ); } Suggestion importance[1-10]: 2 __ Why: The suggestion changes only the error message text by appending "but found none", which is a very minor improvement. The actual logic issue described (unrecognized fields) is not addressed by the `improved_code`, and unrecognized fields already throw a `ParsingException` in the `default` case of the switch statement before reaching this check.	Low

Previous suggestions

Suggestions up to commit 53120e7

Category	Suggestion	Impact
Possible issue	Validate deserialized data in stream constructor The deserialization constructor `FetchSourceContext(StreamInput in)` does not call `validateAmbiguousFields()`, unlike the other constructors. This means a `FetchSourceContext` deserialized from a stream could bypass the new validation, potentially allowing contradicting includes/excludes to be reconstructed from persisted or transported data. Add a call to `validateAmbiguousFields()` at the end of this constructor. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow invalid `FetchSourceContext` objects to be reconstructed from serialized data. Adding validation here closes a real consistency gap, though in practice serialized data would have been validated before serialization.	Low
General	Reject duplicate entries within the same array The `parseSourceFieldArray` method does not detect duplicate entries within the same array (e.g., `includes: ["field1", "field1"]`). While `LinkedHashSet` silently deduplicates them, the caller and the `FetchSourceContext` constructor will receive a deduplicated array without any warning or error. If the intent is to reject ambiguous/contradicting requests, duplicate entries within the same array should either be rejected or at least consistently handled. Consider throwing a `ParsingException` when a duplicate entry is detected within the same array. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) throws IOException { Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields while (parser.nextToken() != XContentParser.Token.END_ARRAY) { if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { String entry = parser.text(); if (opposite != null && opposite.contains(entry)) { throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); } - sourceArr.add(entry); + if (!sourceArr.add(entry)) { + throw new ParsingException( + parser.getTokenLocation(), + "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source." + ); + } Suggestion importance[1-10]: 4 __ Why: Silently deduplicating entries within the same array could be surprising to users. However, this is a behavioral change beyond the PR's stated goal of rejecting contradicting includes/excludes, and duplicate entries within a single array are not inherently contradictory.	Low
	Improve empty object error message clarity The check `includes.isEmpty() && excludes.isEmpty()` will also throw when the object has unrecognized field names that were silently skipped, but more critically, it won't throw when only one of the two sets is populated (which is valid). However, the real issue is that this check fires even when the object contained only unrecognized field names that were never assigned — the `default` branch already throws for unknown tokens, but if a field name is recognized yet has an empty array, the `parseSourceFieldArray` method throws first. The logic appears correct for the intended cases, but the condition should also guard against the case where the object was parsed but both sets remain as `Collections.emptySet()` due to unrecognized field names not caught by the `default` branch. Consider verifying that the condition correctly distinguishes between "no fields provided" and "only unrecognized fields provided" to avoid misleading error messages. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-265] if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" + "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "] in [_source]" ); } Suggestion importance[1-10]: 2 __ Why: The suggestion only adds `" in [_source]"` to the error message, which is a very minor UX improvement. The `improved_code` is nearly identical to `existing_code` with only a cosmetic change, and the existing REST API tests already validate the current message format.	Low
	Clarify error message for top-level empty array When `_source` is a top-level array (list of include fields), `parseSourceFieldArray` is called with `null` as the `opposite` parameter, which correctly skips the ambiguity check. However, the `parseSourceFieldArray` method throws if the array is empty. This means `_source: []` will throw with the message "Expected at least one value for an array of [includes]", which is consistent with the REST API test. This is fine, but note that the `INCLUDES_FIELD` parse field is passed here even though this is not inside an object context — the error message will reference "includes" which may be slightly confusing to users who wrote `_source: []`. This is a minor UX issue but worth noting. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [171-174] +case XContentParser.Token.START_ARRAY -> { + String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]); + return new FetchSourceContext(true, includes, null); +} - Suggestion importance[1-10]: 1 __ Why: The `improved_code` is identical to the `existing_code`, making this a zero-change suggestion. It only asks the user to verify/note a minor UX concern about the error message referencing "includes" for a top-level array context.	Low

Suggestions up to commit 7cb356d

Category	Suggestion	Impact
Possible issue	Validate deserialized objects for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized `FetchSourceContext` objects (e.g., from inter-node communication) bypass the ambiguity validation, potentially allowing invalid states to propagate through the cluster. The validation should be called here as well. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor skips `validateAmbiguousFields()`, which could allow invalid states to propagate during deserialization. However, in practice, data written to `StreamOutput` would have already been validated at construction time, making this a low-risk but valid defensive improvement.	Low
General	Improve readability of null-coalescing boolean expression The expression `fetchSource == null \|\| fetchSource` is a subtle change from the original `fetchSource == null ? true : fetchSource`. While logically equivalent, the new form is less readable and could be confusing to future maintainers. The original ternary was clearer in intent. Consider reverting to the explicit ternary or using a named variable for clarity. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [156] -return new FetchSourceContext(fetchSource == null \|\| fetchSource, sourceIncludes, sourceExcludes); +return new FetchSourceContext(fetchSource == null ? true : fetchSource, sourceIncludes, sourceExcludes); Suggestion importance[1-10]: 3 __ Why: The expression `fetchSource == null \|\| fetchSource` is logically equivalent to `fetchSource == null ? true : fetchSource` but slightly less readable. This is a minor style improvement with no functional impact.	Low
	Clarify empty-object validation logic comment The check `includes.isEmpty() && excludes.isEmpty()` only catches the case where both sets are empty (i.e., an empty object `{}`). However, if the object contains unrecognized field names, the sets will also be empty, but the error message says "Expected at least one of [includes] or [excludes]", which is misleading. More importantly, a valid object with only `excludes` and no `includes` (or vice versa) is allowed and should not throw. The current logic is correct for the empty-object case, but consider whether a valid single-field object (e.g., only `excludes`) should be accepted — based on the test `testFetchSourceAsObject`, it should be, so this logic appears correct. However, the condition should also guard against unrecognized field names that silently produce empty sets. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264] if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed + // no valid field names -> empty object or only unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } Suggestion importance[1-10]: 1 __ Why: The suggestion only changes a comment (adding "or only unrecognized fields" to the comment text), which is a minor documentation improvement with no functional impact. The `improved_code` is nearly identical to `existing_code` with only a comment change.	Low
	Document silent deduplication behavior in array parsing Duplicate entries within the same array are silently deduplicated by the `LinkedHashSet`, but there is no warning or error for them. While deduplication is acceptable, the test `testFetchSourceArray` explicitly asserts that duplicates are silently removed. However, a duplicate entry that also appears in the `opposite` set will throw an ambiguity error, which is correct. The issue is that if `opposite` is `null` (as called from `fromXContent` for the top-level array case), duplicates are silently dropped — this is fine. The logic is consistent, but consider documenting this behavior explicitly. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields + Set<String> sourceArr = new LinkedHashSet<>(); // LinkedHashSet preserves insertion order and deduplicates entries while (parser.nextToken() != XContentParser.Token.END_ARRAY) { if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { String entry = parser.text(); if (opposite != null && opposite.contains(entry)) { throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); } - sourceArr.add(entry); + sourceArr.add(entry); // duplicate entries within the same array are silently deduplicated } else { Suggestion importance[1-10]: 1 __ Why: This suggestion only modifies comments to document existing behavior, with no functional code changes. Comment-only improvements have minimal impact on correctness or functionality.	Low

Suggestions up to commit 0b4c2c8

Category	Suggestion	Impact
Possible issue	Add validation to deserialization constructor The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized `FetchSourceContext` objects (e.g., from inter-node communication) bypass the ambiguity validation. Call `validateAmbiguousFields()` in this constructor as well to ensure consistency. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 7 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, creating an inconsistency where deserialized objects skip validation. This is a valid concern for correctness, though in practice the data would have been validated before serialization. Adding the validation call ensures consistency across all construction paths.	Medium
Possible issue	Replace immutable singleton sets with mutable sets The `parseSourceObject` method throws an error when both `includes` and `excludes` are empty, but this check uses `Collections.emptySet()` as the initial value. If a user provides only unrecognized field names (not `includes`/`excludes`), the sets remain empty and the error is thrown correctly. However, if a user provides only `includes` or only `excludes`, the other set remains `Collections.emptySet()` — this is fine. But the check should also guard against the case where both sets are non-empty but were set to `Collections.emptySet()` due to logic flow. The current logic is actually correct, but the condition should be `includes.isEmpty() && excludes.isEmpty()` which is already the case. No change needed here — but note that `Collections.emptySet()` is immutable, so calling `.contains()` on it is safe but calling `.add()` would throw. Since `parseSourceFieldArray` returns a new `LinkedHashSet`, this is fine. The real issue is that `Collections.singleton()` (used for `VALUE_STRING` case) is also immutable — if the same field appears twice (e.g., two `includes` string values), the second assignment would silently overwrite the first without error. Consider using a mutable set instead of `Collections.singleton()`. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [234-252] -if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed - throw new ParsingException( - parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" - ); +case XContentParser.Token.VALUE_STRING -> { + if (INCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { + String includeEntry = parser.text(); + if (excludes.contains(includeEntry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, includeEntry); + } + Set<String> newIncludes = new LinkedHashSet<>(includes); + newIncludes.add(includeEntry); + includes = newIncludes; + } else if (EXCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { + String excludeEntry = parser.text(); + if (includes.contains(excludeEntry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, excludeEntry); + } + Set<String> newExcludes = new LinkedHashSet<>(excludes); + newExcludes.add(excludeEntry); + excludes = newExcludes; + } else { + throw new ParsingException( + parser.getTokenLocation(), + "Unknown key for a " + token + " in [" + currentFieldName + "]." + ); + } } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `Collections.singleton()` is immutable and could cause issues if the same field name appears twice in the JSON (the second assignment silently overwrites the first). Using a mutable `LinkedHashSet` would be more robust. However, in practice, JSON parsers typically don't produce duplicate field names, so this is a minor edge case.	Low
General	Reject duplicate entries within the same array The `parseSourceFieldArray` method silently ignores duplicate entries within the same array (e.g., `["field1", "field1"]`) by using a `LinkedHashSet`. While deduplication is handled, there is no warning or error for duplicate entries within the same array. This may hide user mistakes. Consider throwing a `ParsingException` when a duplicate is detected within the same array to provide clearer feedback. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { - if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); +String entry = parser.text(); +if (opposite != null && opposite.contains(entry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); +} +if (!sourceArr.add(entry)) { + throw new ParsingException( + parser.getTokenLocation(), + "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source." + ); +} Suggestion importance[1-10]: 3 __ Why: While throwing on duplicates would provide clearer feedback, the current behavior of silently deduplicating via `LinkedHashSet` is a reasonable and common approach. The test in `FetchSourceContextTests.java` at line 68-69 even explicitly tests that duplicates are deduplicated, suggesting this is intentional behavior.	Low

Suggestions up to commit 9d97f05

Category	Suggestion	Impact
Possible issue	Fix placeholder format mismatch in error messages The `AMBIGUOUS_FIELD_MESSAGE` uses `{}` as a placeholder (SLF4J/log4j style), but `ParsingException` uses `String.format`-style formatting with `%s`. This means the error message thrown from `parseSourceFieldArray` and `parseSourceObject` will contain the literal `{}` instead of the actual field name. The `OpenSearchException` used in `validateAmbiguousFields` does support `{}` placeholders, so that path is correct. The `ParsingException` calls should use `String.format` or inline the value directly. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { - if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); +private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [%s] cannot be both included and excluded in _source."; +// In validateAmbiguousFields, use OpenSearchException with its own formatting: +throw new OpenSearchException("The same entry [{}] cannot be both included and excluded in _source.", exclude); + +// In parseSourceFieldArray and parseSourceObject, use: +throw new ParsingException(parser.getTokenLocation(), String.format(AMBIGUOUS_FIELD_MESSAGE, entry)); + Suggestion importance[1-10]: 8 __ Why: This is a real bug — `ParsingException` uses `String.format`-style `%s` placeholders, not `{}` style, so the error messages from `parseSourceFieldArray` and `parseSourceObject` would contain literal `{}` instead of the actual field name. The test at line 211 (`assertEquals("The same entry [AAA] cannot be both included and excluded in _source.", result.getMessage())`) would fail if this bug exists, confirming the issue is critical.	Medium
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized instances from the network or storage could bypass the new validation, potentially allowing contradicting includes/excludes to be reconstructed without error. The validation should be called here as well to ensure consistency. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 7 __ Why: The `StreamInput` constructor skips `validateAmbiguousFields()`, creating an inconsistency where deserialized objects bypass the new validation. This is a valid concern for correctness and consistency, though in practice serialized data would have been validated before serialization.	Medium
General	Avoid unnecessarily exposing internal parsing method The check `includes.isEmpty() && excludes.isEmpty()` will also throw when both fields are absent from the object, but it won't catch the case where the object contains only unrecognized field names (since those throw earlier). However, a more critical issue is that a valid object with only `excludes` (no `includes`) would have `includes` as `Collections.emptySet()`, which is fine. But consider that a user could provide `{"excludes": ["field1"]}` with no `includes` — this is a valid use case that should be allowed. The current check correctly allows this, but the error message says "Expected at least one of [includes] or [excludes]" which is accurate. This logic is correct as-is, but the condition should use `\|\|` semantics — actually the `&&` is correct here (both empty means neither was provided). No change needed for this specific logic, but the `parseSourceObject` method is now `public` which exposes internal parsing logic unnecessarily. Consider making it package-private or keeping it private. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [198] -if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed - throw new ParsingException( - parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" - ); -} +static FetchSourceContext parseSourceObject(XContentParser parser) throws IOException { Suggestion importance[1-10]: 3 __ Why: Making `parseSourceObject` package-private instead of `public` is a minor visibility concern. However, the test file `FetchSourceContextTests.java` directly calls `FetchSourceContext.parseSourceObject(parser)` at line 279, so changing visibility would break the test. The suggestion's `improved_code` doesn't address this dependency.	Low

Suggestions up to commit 68f6ccd

Category	Suggestion	Impact
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, meaning deserialized instances bypass the new validation. If a node on an older version serializes a `FetchSourceContext` with conflicting includes/excludes and sends it to a newer node, the validation will be silently skipped. Add the validation call here as well. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow invalid deserialized instances to skip the new validation. Adding the validation call here ensures consistency across all construction paths, though the practical risk is limited since conflicting fields would typically be caught at the source.	Low
Possible issue	Verify empty-object validation allows excludes-only input The validation that rejects an empty object fires even when the object contained unrecognized field names (which already threw a `ParsingException`). More critically, a valid object with only `excludes` (no `includes`) will be incorrectly rejected. The check should allow either `includes` or `excludes` to be non-empty independently, which is already the case, but the error message implies both are required. Consider verifying the logic is correct for the case where only `excludes` is provided. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-266] if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } +return new FetchSourceContext(true, includes.toArray(new String[0]), excludes.toArray(new String[0])); Suggestion importance[1-10]: 2 __ Why: The suggestion asks to verify the logic is correct for excludes-only input, but the existing code already handles this correctly - the check `includes.isEmpty() && excludes.isEmpty()` only throws when both are empty. The `improved_code` just adds the `return` statement that already exists on line 266, making this suggestion inaccurate about the actual issue.	Low
General	Improve error message for empty top-level source array When `_source` is specified as a top-level array (not inside an object), `parseSourceFieldArray` is called with `INCLUDES_FIELD` and `null` as the `opposite` argument. In this case, duplicate entries are silently deduplicated by the `LinkedHashSet`. This is fine, but the `parseField` parameter is only used in the empty-array error message, so passing `INCLUDES_FIELD` for a top-level array is misleading. This is a minor concern, but the real issue is that the empty-array check for a top-level `_source: []` will produce the message "Expected at least one value for an array of [includes]" which may confuse users since `includes` was not explicitly used. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { +case XContentParser.Token.START_ARRAY -> { + if (parser.nextToken() == XContentParser.Token.END_ARRAY) { + throw new ParsingException( + parser.getTokenLocation(), + "Expected at least one value for [_source] array" + ); + } + Set<String> includes = new LinkedHashSet<>(); + do { if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); + includes.add(parser.text()); + } else { + throw new ParsingException( + parser.getTokenLocation(), + "Unknown key for a " + parser.currentToken() + " in [" + parser.currentName() + "]." + ); + } + } while (parser.nextToken() != XContentParser.Token.END_ARRAY); + return new FetchSourceContext(true, includes.toArray(new String[0]), null); +} Suggestion importance[1-10]: 3 __ Why: The suggestion raises a valid UX concern about the misleading error message "Expected at least one value for an array of [includes]" when `_source: []` is used at the top level. However, the `improved_code` replaces the wrong section (it modifies `fromXContent` switch case instead of `parseSourceFieldArray`), making it inaccurate relative to the `existing_code` snippet.	Low

github-actions · 2026-02-27T11:53:37Z

❌ Gradle check result for c2718e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-02-27T13:32:50Z

Persistent review updated to latest commit 418c9b7

github-actions · 2026-02-27T13:35:09Z

Persistent review updated to latest commit 8f222c4

github-actions · 2026-02-27T13:40:06Z

Persistent review updated to latest commit 49d359c

gaobinlong · 2026-03-30T02:24:03Z

Is the version in the skip block not an OpenSearch version, but rather qa:mised-cluster?
Where can I find the latest version for the YAML tests?

We should use - 3.5.99 for now since the current latest version in main branch is 3.6.0 now.

github-actions · 2026-04-01T15:35:11Z

Persistent review updated to latest commit 7cb356d

github-actions · 2026-04-01T15:41:26Z

Persistent review updated to latest commit 53120e7

urmichm · 2026-04-01T15:41:38Z

thank you @gaobinlong !
tests updated, please have a look when you have a chance

github-actions · 2026-04-01T16:34:54Z

❌ Gradle check result for 53120e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

gaobinlong · 2026-04-02T02:26:46Z

❌ Gradle check result for 53120e7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@urmichm , sorry for that, I think we need to change the skip version to 3.6.99 as the 3.6.0 release is code frozen and we won't catch up the train, the current latest version has been changed to 3.7.0, that' the reason why these YAML rest tests failed.

gaobinlong · 2026-04-02T02:27:26Z

And please merge the latest main branch into your branch, thanks!

bowenlan-amzn · 2026-04-02T04:06:43Z

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java


 package org.opensearch.search.fetch.subphase;

+import org.opensearch.OpenSearchException;


Could you keep this PR as a pure refactor. And raise another PR for the change? That would be very helpful to the reviewer. I'm sure pure refactor will be merged quickly :)

Regarding the change, I notice you already think we cannot introduce breaking behavior for 3.x here, which I agree!

Unfortunately we don't have a 4.x branch so we cannot get the breaking change in. But at least we can use the deprecation log to capture the effort being spent here. At the time of 4.0, we should do a scan of all the deprecation logs and change to exceptions.

cc: @andrross

Example

OpenSearch/server/src/main/java/org/opensearch/search/builder/SearchSourceBuilder.java

Lines 1363 to 1368 in 94dfe6d

} else if (INDICES_BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {

deprecationLogger.deprecate(

"indices_boost_object_format",

"Object format in indices_boost is deprecated, please use array format instead"

);

while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {

OpenSearch/server/src/main/java/org/opensearch/search/builder/SearchSourceBuilder.java

Lines 1446 to 1449 in 94dfe6d

} else if (INDICES_BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {

while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {

indexBoosts.add(new IndexBoost(parser));

}

I have found your suggestion of splitting changes into multiple PRs very useful :)

I have created an MR for refactoring only #21086
Please have a look :)

I will update this PR (or create a new one) once the refactoring is merged

cc: @andrross

urmichm · 2026-04-02T11:09:29Z

@gaobinlong @bowenlan-amzn Thank you for the feedback!
I convert the PR to Draft and update/refactor the changes!

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

extract array parsing as its own function Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

parseSourceObject: split key-value process into different code-blocks Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

include and exclude collections must not contain the same entries Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

The same entry MUST NOT be present in both inludes AND excludes arrays. This leads to ambiguity. Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Explicitly defined source object '_source={}' not allowed Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

empty object validation updated to cover all possible cases Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

toXContent to create valid object Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

github-actions · 2026-04-02T11:41:19Z

Persistent review updated to latest commit 706f30b

github-actions · 2026-04-02T12:03:20Z

❌ Gradle check result for 706f30b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

urmichm requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners February 27, 2026 11:42

github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers Search:Query Insights labels Feb 27, 2026

urmichm mentioned this pull request Feb 27, 2026

[Feature Request] [BUG] _source include / exclude validation #20612

Open

urmichm force-pushed the 20612-source-validation branch from fe8b28f to 7cb356d Compare April 1, 2026 15:33

urmichm force-pushed the 20612-source-validation branch from 7cb356d to 53120e7 Compare April 1, 2026 15:39

bowenlan-amzn reviewed Apr 2, 2026

View reviewed changes

urmichm marked this pull request as draft April 2, 2026 11:09

urmichm and others added 15 commits April 2, 2026 13:39

Init marks for ISSUE-20612

78ec6e9

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

simplification i

380b7f3

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

switch case in favour of if-else-if

3e49a23

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

minor refactor

d64ed6d

extract array parsing as its own function Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Refactor

318a837

parseSourceObject: split key-value process into different code-blocks Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Intersection Validation

ab203d7

include and exclude collections must not contain the same entries Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Validation of includes / excludes arrays

04b9486

The same entry MUST NOT be present in both inludes AND excludes arrays. This leads to ambiguity. Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Empty _source not allowed

4aed28e

Explicitly defined source object '_source={}' not allowed Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

FetchSourceContext validate after construction

a201774

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

unit test i

da03298

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

spotless apply

0a87c6f

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Consistent Excpetion usage

3a5da6d

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

FetchSourceContext minor update

412e512

empty object validation updated to cover all possible cases Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Source toXContent update

5948008

toXContent to create valid object Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

yml tests

706f30b

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

urmichm force-pushed the 20612-source-validation branch from 53120e7 to 706f30b Compare April 2, 2026 11:39


		package org.opensearch.search.fetch.subphase;

		import org.opensearch.OpenSearchException;

	} else if (INDICES_BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
	deprecationLogger.deprecate(
	"indices_boost_object_format",
	"Object format in indices_boost is deprecated, please use array format instead"
	);
	while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {

	} else if (INDICES_BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
	while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
	indexBoosts.add(new IndexBoost(parser));
	}

Conversation

urmichm commented Feb 27, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 706f30b)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

gaobinlong commented Mar 30, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

urmichm commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

gaobinlong commented Apr 2, 2026

Uh oh!

gaobinlong commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bowenlan-amzn Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bowenlan-amzn Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

urmichm Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

urmichm commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `706f30b`)

github-actions bot commented Feb 27, 2026 •

edited

Loading

gaobinlong commented Apr 2, 2026 •

edited

Loading