Fix copy_to functionality for geo_point fields with object/array values#20542
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR enables the copy_to mechanism to work with multi-token field values (e.g., geo_point with object format containing separate lat/lon tokens). It introduces internal routing that serializes complex field structures to bytes, re-parses them, and applies copy_to semantics. A new switchParser method in ParseContext allows temporary parser replacement during this process. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/src/main/java/org/opensearch/index/mapper/DocumentParser.java`:
- Around line 1572-1600: The method parseFieldWithCopyTo currently always
serializes START_OBJECT/START_ARRAY via parseChildToBytes even when there are no
copy_to targets; change it to first inspect fieldMapper.copyTo().copyToFields()
and if that collection is empty, avoid buffering and just call
fieldMapper.parse(context) (and parseCopyFields(context, ... ) without the
childBytes path); only call parseChildToBytes, create an innerParser, switch the
context and pass childBytes into parseCopyFields when copyTo targets are
present. Update references: parseFieldWithCopyTo, parseChildToBytes,
parseCopyFields, and the call to fieldMapper.copyTo().copyToFields() so
buffering happens only when needed.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/src/main/java/org/opensearch/index/mapper/DocumentParser.javaserver/src/main/java/org/opensearch/index/mapper/ParseContext.javaserver/src/test/java/org/opensearch/index/mapper/DocumentParserTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/index/mapper/ParseContext.javaserver/src/main/java/org/opensearch/index/mapper/DocumentParser.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (6)
server/src/main/java/org/opensearch/index/mapper/ParseContext.java (1)
703-711: LGTM — scoped parser override is clean and consistent with other context wrappers.server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (4)
768-773: LGTM: field parsing is consistently routed through the new copy_to helper.
848-856: LGTM: disable_objects path now honors copy_to parsing.
1343-1377: LGTM: byte-backed reparse keeps copy_to isolated and deterministic.
1381-1402: LGTM: copy_to context switching remains intact for non-byte paths.server/src/test/java/org/opensearch/index/mapper/DocumentParserTests.java (1)
3509-3558: Great coverage for geo_point copy_to across object and array forms.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
❌ Gradle check result for 2055b1b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 9616e61: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 14a03b4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 348bb0a: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Hi @krocky-cooky |
|
Thanks for the comment. It looks like the issue was recently triaged. |
andrross
left a comment
There was a problem hiding this comment.
Looks like you have a spotless formatting error. Can you fix that and rebase from main?
This change enables copy_to functionality for geo_point fields when they are represented as objects or arrays. Previously, copy_to would fail for geo_point fields because the parser consumed tokens during the initial parse, leaving nothing for the copy_to operation. The fix serializes complex field values (objects/arrays) to bytes before parsing, allowing the same data to be parsed multiple times for both the original field and copy_to targets. Changes: - Add parseFieldWithCopyTo() method to handle field parsing with copy_to support for both simple and complex value types - Add parseChildToBytes() helper to serialize current parser state - Extend parseCopyFields() to accept byte array for reparsing - Add test case for geo_point with copy_to attribute Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com>
Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com>
Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com>
Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com>
Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com>
Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Extract common copy-to field resolution logic from two parseCopyFields overloads into a shared parseCopyFieldsInternal method that accepts a CheckedBiConsumer for the copy action. Also fix test method name typo: testGetPointArrayWithMultipleCopyTo -> testGeoPointArrayWithMultipleCopyTo. Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
clarify parser lifecycle and avoid reusing context with a closed parser. Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
- use try-with-resource with XcContentBuilder - fix brace style - use isEmpty() method Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
fb72ba1 to
63c3ea4
Compare
|
Persistent review updated to latest commit 63c3ea4 |
|
❌ Gradle check result for 63c3ea4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 63c3ea4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 63c3ea4: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20542 +/- ##
============================================
- Coverage 73.30% 73.20% -0.11%
+ Complexity 72252 72152 -100
============================================
Files 5795 5795
Lines 330056 330088 +32
Branches 47643 47645 +2
============================================
- Hits 241947 241632 -315
- Misses 68695 69063 +368
+ Partials 19414 19393 -21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es (opensearch-project#20542) This change enables copy_to functionality for geo_point fields when they are represented as objects or arrays. Previously, copy_to would fail for geo_point fields because the parser consumed tokens during the initial parse, leaving nothing for the copy_to operation. The fix serializes complex field values (objects/arrays) to bytes before parsing, allowing the same data to be parsed multiple times for both the original field and copy_to targets. Changes: - Add parseFieldWithCopyTo() method to handle field parsing with copy_to support for both simple and complex value types - Add parseChildToBytes() helper to serialize current parser state - Extend parseCopyFields() to accept byte array for reparsing - Add test case for geo_point with copy_to attribute Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com> Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com> Signed-off-by: Aparajita Pandey <aparajita31pandey@gmail.com>
…es (opensearch-project#20542) This change enables copy_to functionality for geo_point fields when they are represented as objects or arrays. Previously, copy_to would fail for geo_point fields because the parser consumed tokens during the initial parse, leaving nothing for the copy_to operation. The fix serializes complex field values (objects/arrays) to bytes before parsing, allowing the same data to be parsed multiple times for both the original field and copy_to targets. Changes: - Add parseFieldWithCopyTo() method to handle field parsing with copy_to support for both simple and complex value types - Add parseChildToBytes() helper to serialize current parser state - Extend parseCopyFields() to accept byte array for reparsing - Add test case for geo_point with copy_to attribute Signed-off-by: krocky-cooky <kurotaku9679.sub@gmail.com> Signed-off-by: Takuo Kuroki <kurotaku9679.sub@gmail.com>
Description
This PR adds support for the
copy_toattribute ongeo_pointfields when they are represented as complex structures (objects or arrays).Problem
Previously, when a
geo_pointfield with acopy_toattribute was indexed using object notation (e.g.,{"lat": 40.71, "lon": 74.00}) or array notation (e.g.,[74.00, 40.71]), the copy operation would fail. This occurred because the XContent parser consumed all tokens during the initial field parsing, leaving no data available for the subsequentcopy_tooperation. This is a similar issue to k-NN Plugin issue #2536 and is caused by values being defined across multiple tokens.Solution
This PR introduces a mechanism to preserve the field value for reuse:
copy_totarget field, allowing the same data to be parsed multiple times.Changes
parseFieldWithCopyTo()method to handle field parsing withcopy_tosupport for both simple and complex value typesparseChildToBytes()helper method to serialize the current parser state to a byte arrayparseCopyFields()to accept an optional byte array parameter for reparsing complex structuresgeo_pointfields withcopy_toin both object and array formatsTesting
testGeoPointWithCopyTo()to verify object notation ({"lat": ..., "lon": ...})testGeoPointArrayWithCopyTo()to verify array notation ([lon, lat])DocumentParserTests)Related Issues
Resolves #20540
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.