Skip to content

Add mapper_settings support and field_mapping mapper type for pull-based ingestion#20722

Merged
varunbharadwaj merged 5 commits intoopensearch-project:mainfrom
imRishN:field-mapping-pull-based
Feb 25, 2026
Merged

Add mapper_settings support and field_mapping mapper type for pull-based ingestion#20722
varunbharadwaj merged 5 commits intoopensearch-project:mainfrom
imRishN:field-mapping-pull-based

Conversation

@imRishN
Copy link
Copy Markdown
Member

@imRishN imRishN commented Feb 24, 2026

Description

Adds a new mapper_settings.* prefix setting to IngestionSource that allows mapper-specific configuration to be passed to message mappers in pull-based ingestion. Also adds field_mapping as a new MapperType enum value with a version compatibility check to prevent usage in mixed clusters.

This is a foundational change — the settings are stored but not yet consumed. The actual FieldMappingIngestionMessageMapper implementation will follow in a subsequent PR.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#20721

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…sed ingestion

Signed-off-by: Rishab Nahata <rishab.nahata@uber.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 24, 2026

PR Reviewer Guide 🔍

(Review updated until commit a94a723)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add field_mapping mapper type enum and skeleton class

Relevant files:

  • server/src/main/java/org/opensearch/indices/pollingingest/mappers/IngestionMessageMapper.java
  • server/src/main/java/org/opensearch/indices/pollingingest/mappers/FieldMappingIngestionMessageMapper.java

Sub-PR theme: Add mapper_settings storage and plumbing

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java
  • server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java
  • server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
  • server/src/test/java/org/opensearch/cluster/metadata/IngestionSourceTests.java

Sub-PR theme: Add validation for mapper_settings and version compatibility

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java
  • server/src/test/java/org/opensearch/cluster/metadata/MetadataCreateIndexServiceTests.java

⚡ Recommended focus areas for review

Hardcoded Version

The version check uses a hardcoded Version.V_3_6_0. This should be verified against the actual release version where this feature is introduced to ensure correct mixed-cluster compatibility enforcement.

if (minNodeVersion.before(Version.V_3_6_0)) {
    throw new IllegalArgumentException(
        "mapper_type [field_mapping] requires all nodes in the cluster to be on version ["
            + Version.V_3_6_0
            + "] or later, but the minimum node version is ["
            + minNodeVersion
            + "]"
    );
Missing Validation

The INGESTION_SOURCE_MAPPER_SETTINGS setting has no validation logic (empty lambda returns value as-is). Consider adding validation for setting value format or structure, similar to how other settings validate their inputs.

public static final Setting.AffixSetting<Object> INGESTION_SOURCE_MAPPER_SETTINGS = Setting.prefixKeySetting(
    "index.ingestion_source.mapper_settings.",
    key -> new Setting<>(key, "", (value) -> {
        return value;
    }, Property.IndexScope, Property.Final)
);
Incomplete Error Message

The error message for unsupported mapper_settings doesn't specify which mapper types do support settings. Consider clarifying that only field_mapping currently supports mapper_settings.

// default and raw_payload mappers don't use mapper_settings
if (mapperSettings.isEmpty() == false) {
    throw new IllegalArgumentException("mapper_settings are not supported for mapper_type [" + mapperType.getName() + "]");
}
break;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 24, 2026

PR Code Suggestions ✨

Latest suggestions up to a94a723

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Create defensive copy before wrapping

The constructor creates an unmodifiable wrapper around the input map, but doesn't
create a defensive copy first. If the caller retains a reference to the original
mutable map and modifies it after construction, those changes will be visible
through the unmodifiable wrapper, breaking immutability guarantees.

server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java [74]

-this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(mapperSettings) : Collections.emptyMap();
+this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(new HashMap<>(mapperSettings)) : Collections.emptyMap();
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a security issue where the unmodifiable wrapper doesn't prevent external modifications if the caller retains a reference to the original map. Creating a defensive copy before wrapping ensures true immutability.

Medium
Possible issue
Validate mapper_settings presence without mapper_type

The validation logic rejects mapper_settings for all non-FIELD_MAPPING mapper types
in the default case. However, if mapper_settings are provided without specifying a
mapper_type, the method returns early and never validates that the settings are
inappropriate. This allows invalid configurations to pass validation.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1280-1320]

 static void validateIngestionSourceSettings(Settings settings, ClusterState state) {
+    Map<String, Object> mapperSettings = IndexMetadata.INGESTION_SOURCE_MAPPER_SETTINGS.getAsMap(settings);
+    
     if (IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.exists(settings) == false) {
+        if (mapperSettings.isEmpty() == false) {
+            throw new IllegalArgumentException("mapper_settings cannot be specified without a mapper_type");
+        }
         return;
     }
 
     IngestionMessageMapper.MapperType mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings);
-    Map<String, Object> mapperSettings = IndexMetadata.INGESTION_SOURCE_MAPPER_SETTINGS.getAsMap(settings);
 
     switch (mapperType) {
         case FIELD_MAPPING:
             ...
             break;
         default:
             // default and raw_payload mappers don't use mapper_settings
             if (mapperSettings.isEmpty() == false) {
                 throw new IllegalArgumentException("mapper_settings are not supported for mapper_type [" + mapperType.getName() + "]");
             }
             break;
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion identifies a valid validation gap where mapper_settings can be provided without a mapper_type, which should be rejected. However, this is a moderate issue since the settings would simply be ignored rather than causing runtime errors.

Medium

Previous suggestions

Suggestions up to commit cbf8280
CategorySuggestion                                                                                                                                    Impact
General
Validate mapper settings values

The validation only checks if keys are recognized but doesn't validate that the
values are non-null or non-empty. Consider adding validation to ensure that
mapperSettings values are valid strings to prevent runtime errors during ingestion.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1301-1310]

-// Validate mapper_settings keys
-for (String key : mapperSettings.keySet()) {
+// Validate mapper_settings keys and values
+for (Map.Entry<String, Object> entry : mapperSettings.entrySet()) {
+    String key = entry.getKey();
     if (IngestionMessageMapper.FIELD_MAPPING_VALID_SETTINGS.contains(key) == false) {
         throw new IllegalArgumentException(
             "unknown mapper_settings key ["
                 + key
                 + "] for mapper_type [field_mapping]. Valid keys are: "
                 + IngestionMessageMapper.FIELD_MAPPING_VALID_SETTINGS
         );
     }
+    if (entry.getValue() == null || entry.getValue().toString().trim().isEmpty()) {
+        throw new IllegalArgumentException("mapper_settings key [" + key + "] cannot have null or empty value");
+    }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds value validation for mapperSettings keys, which could prevent runtime errors. However, the validation might be too strict (empty strings could be valid in some contexts), and the actual value validation should ideally be done by the mapper implementation itself.

Low
Explicitly handle all mapper types

The validation logic should explicitly handle the RAW_PAYLOAD case in addition to
DEFAULT to ensure clarity and maintainability. If new mapper types are added in the
future, the default case may inadvertently apply incorrect validation rules.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1287-1318]

-static void validateIngestionSourceSettings(Settings settings, ClusterState state) {
-    if (IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.exists(settings) == false) {
-        return;
-    }
-
-    IngestionMessageMapper.MapperType mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings);
-    Map<String, Object> mapperSettings = IndexMetadata.INGESTION_SOURCE_MAPPER_SETTINGS.getAsMap(settings);
-
-    switch (mapperType) {
-        case FIELD_MAPPING:
-            ...
-            break;
-        default:
-            // default and raw_payload mappers don't use mapper_settings
-            if (mapperSettings.isEmpty() == false) {
-                throw new IllegalArgumentException("mapper_settings are not supported for mapper_type [" + mapperType.getName() + "]");
-            }
-            break;
-    }
+switch (mapperType) {
+    case FIELD_MAPPING:
+        ...
+        break;
+    case DEFAULT:
+    case RAW_PAYLOAD:
+        // default and raw_payload mappers don't use mapper_settings
+        if (mapperSettings.isEmpty() == false) {
+            throw new IllegalArgumentException("mapper_settings are not supported for mapper_type [" + mapperType.getName() + "]");
+        }
+        break;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion improves code clarity by explicitly handling DEFAULT and RAW_PAYLOAD cases instead of relying on the default case. This makes the code more maintainable and self-documenting, though the current implementation is functionally correct.

Low
Ensure deep immutability of map

The mapperSettings map is made unmodifiable, but if the input map contains mutable
values (e.g., nested maps or lists), those values can still be modified. Consider
performing a deep copy if the values are mutable objects to ensure true
immutability.

server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java [74]

-this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(mapperSettings) : Collections.emptyMap();
+this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(new HashMap<>(mapperSettings)) : Collections.emptyMap();
Suggestion importance[1-10]: 3

__

Why: The suggestion addresses a potential issue with shallow immutability, but the current implementation using Collections.unmodifiableMap() is sufficient for most cases. The suggested new HashMap<>(mapperSettings) still doesn't provide deep immutability if values are mutable objects. The impact is low since mapperSettings typically contains simple string values.

Low
Suggestions up to commit 4655f27
CategorySuggestion                                                                                                                                    Impact
Possible issue
Create defensive copy before wrapping

The constructor creates an unmodifiable wrapper around the input mapperSettings map,
but doesn't create a defensive copy first. If the caller retains a reference to the
original map and modifies it after construction, those changes will be visible
through the unmodifiable wrapper, breaking immutability guarantees.

server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java [74]

-this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(mapperSettings) : Collections.emptyMap();
+this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(new HashMap<>(mapperSettings)) : Collections.emptyMap();
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential immutability issue. Without a defensive copy, external modifications to the original mapperSettings map would be visible through the unmodifiable wrapper, violating immutability guarantees. This is a significant correctness issue that could lead to unexpected behavior.

Medium
General
Validate mapper_settings without mapper_type

The validation only runs when INGESTION_SOURCE_MAPPER_TYPE_SETTING exists, but
mapper_settings can be specified without a mapper_type. This allows users to set
invalid mapper_settings that will be silently ignored, potentially causing
confusion. Validate that mapper_settings are only provided when a compatible
mapper_type is explicitly configured.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1279-1321]

 static void validateIngestionSourceSettings(Settings settings, ClusterState state) {
+    Map<String, Object> mapperSettings = IndexMetadata.INGESTION_SOURCE_MAPPER_SETTINGS.getAsMap(settings);
+    
     if (IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.exists(settings) == false) {
+        if (mapperSettings.isEmpty() == false) {
+            throw new IllegalArgumentException(
+                "mapper_settings cannot be specified without setting mapper_type"
+            );
+        }
         return;
     }
 
     IngestionMessageMapper.MapperType mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings);
-    Map<String, Object> mapperSettings = IndexMetadata.INGESTION_SOURCE_MAPPER_SETTINGS.getAsMap(settings);
 
     switch (mapperType) {
         case FIELD_MAPPING:
             ...
             break;
         default:
             // default and raw_payload mappers don't use mapper_settings
             if (mapperSettings.isEmpty() == false) {
                 throw new IllegalArgumentException(
                     "mapper_settings are not supported for mapper_type [" + mapperType.getName() + "]"
                 );
             }
             break;
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion identifies a valid edge case where mapper_settings could be specified without mapper_type, which would be silently ignored. Adding validation for this scenario improves user experience by providing clear error messages. However, this is a minor usability improvement rather than a critical bug.

Medium
Suggestions up to commit eab96bb
CategorySuggestion                                                                                                                                    Impact
Security
Defensive copy before unmodifiable wrap

The mapperSettings map should be defensively copied before wrapping it in an
unmodifiable map. Currently, if the caller retains a reference to the original map
and modifies it, those changes will be reflected in the supposedly immutable
mapperSettings field, breaking encapsulation.

server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java [74]

-this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(mapperSettings) : Collections.emptyMap();
+this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(new HashMap<>(mapperSettings)) : Collections.emptyMap();
Suggestion importance[1-10]: 8

__

Why: This is a valid security concern. Without defensive copying, external modifications to the original mapperSettings map would be reflected in the supposedly immutable field, breaking encapsulation and potentially causing unexpected behavior.

Medium
General
Use enum constant instead of string

The validation uses a hardcoded string literal "field_mapping" instead of the enum
constant. This creates a maintenance risk if the enum name changes. Use
MapperType.FIELD_MAPPING.getName() for consistency and type safety.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1277-1293]

 static void validateIngestionSourceSettings(Settings settings, ClusterState state) {
     if (IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.exists(settings)) {
-        String mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings).getName();
-        if ("field_mapping".equals(mapperType)) {
+        IngestionMessageMapper.MapperType mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings);
+        if (IngestionMessageMapper.MapperType.FIELD_MAPPING.equals(mapperType)) {
             Version minNodeVersion = state.nodes().getMinNodeVersion();
             if (minNodeVersion.before(Version.V_3_6_0)) {
                 throw new IllegalArgumentException(
-                    "mapper_type [field_mapping] requires all nodes in the cluster to be on version ["
+                    "mapper_type [" + mapperType.getName() + "] requires all nodes in the cluster to be on version ["
                         + Version.V_3_6_0
                         + "] or later, but the minimum node version is ["
                         + minNodeVersion
                         + "]"
                 );
             }
         }
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a maintainability issue. Using the enum constant MapperType.FIELD_MAPPING instead of the hardcoded string "field_mapping" improves type safety and reduces the risk of errors if the enum name changes. The improved code properly uses the enum for comparison and getName() for the error message.

Medium
Suggestions up to commit 4e13480
CategorySuggestion                                                                                                                                    Impact
Security
Defensive copy before unmodifiable wrap

The mapperSettings map should be defensively copied before wrapping it with
unmodifiableMap(). If the caller retains a reference to the original map and
modifies it, those changes will be reflected in the supposedly immutable map,
breaking encapsulation.

server/src/main/java/org/opensearch/cluster/metadata/IngestionSource.java [74]

-this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(mapperSettings) : Collections.emptyMap();
+this.mapperSettings = mapperSettings != null ? Collections.unmodifiableMap(new HashMap<>(mapperSettings)) : Collections.emptyMap();
Suggestion importance[1-10]: 8

__

Why: This is a valid security concern. Without a defensive copy, external modifications to the original mapperSettings map would be reflected in the supposedly immutable field, breaking encapsulation and potentially causing unexpected behavior.

Medium
Possible issue
Add null check for cluster nodes

The validation method doesn't check if state.nodes() is null before calling
getMinNodeVersion(). In edge cases during cluster initialization or state
transitions, this could cause a NullPointerException. Add a null check for defensive
programming.

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java [1277-1293]

 static void validateIngestionSourceSettings(Settings settings, ClusterState state) {
     if (IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.exists(settings)) {
         String mapperType = IndexMetadata.INGESTION_SOURCE_MAPPER_TYPE_SETTING.get(settings).getName();
         if ("field_mapping".equals(mapperType)) {
+            if (state.nodes() == null) {
+                throw new IllegalStateException("Cannot validate ingestion source settings: cluster nodes information is unavailable");
+            }
             Version minNodeVersion = state.nodes().getMinNodeVersion();
             if (minNodeVersion.before(Version.V_3_6_0)) {
                 throw new IllegalArgumentException(
                     "mapper_type [field_mapping] requires all nodes in the cluster to be on version ["
                         + Version.V_3_6_0
                         + "] or later, but the minimum node version is ["
                         + minNodeVersion
                         + "]"
                 );
             }
         }
     }
 }
Suggestion importance[1-10]: 3

__

Why: While defensive null checking is generally good practice, ClusterState.nodes() is unlikely to return null in normal operation. The suggestion adds safety but may be overly cautious for this codebase's typical usage patterns.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4e13480: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishab Nahata <rishab.nahata@uber.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit eab96bb

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for eab96bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@imRishN imRishN marked this pull request as ready for review February 24, 2026 08:34
@imRishN imRishN requested a review from a team as a code owner February 24, 2026 08:34
Signed-off-by: Rishab Nahata <rishab.nahata@uber.com>
@imRishN imRishN self-assigned this Feb 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4655f27: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishab Nahata <rishab.nahata@uber.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cbf8280: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishab Nahata <rishabnahata07@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a94a723

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for a94a723: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 86.11111% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.28%. Comparing base (5c183b8) to head (a94a723).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/cluster/metadata/IngestionSource.java 62.50% 0 Missing and 3 partials ⚠️
...st/mappers/FieldMappingIngestionMessageMapper.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20722      +/-   ##
============================================
+ Coverage     73.23%   73.28%   +0.04%     
- Complexity    72016    72019       +3     
============================================
  Files          5783     5784       +1     
  Lines        329438   329471      +33     
  Branches      47534    47541       +7     
============================================
+ Hits         241268   241440     +172     
+ Misses        68868    68613     -255     
- Partials      19302    19418     +116     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@varunbharadwaj varunbharadwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants