Skip to content

[FEATURE] PPL Support index mapping with dynamic=false #3995

@penghuo

Description

@penghuo

Is your feature request related to a problem?

When querying OpenSearch indices that have object fields with dynamic: false mapping, PPL queries fail to access nested fields within those objects even though the data exists in the document's _source. This creates a significant usability gap between PPL and the native Query DSL.

For example, with the following index mapping:

{
  "mappings": {
    "properties": {
      "event": {
        "type": "object",
        "dynamic": false
      }
    }
  }
}

And documents containing nested fields under "event":

{
  "event": {
    "user": {
      "id": "u123",
      "name": "Alice",
      "location": {"city": "Seattle"}
    },
    "status": "ERROR"
  }
}

A PPL query attempting to access these fields fails:

source=testindex | fields event.user

With the error:

{
  "error": {
    "reason": "Invalid Query",
    "details": "{alias=event,fieldName=user} field not found; fields are: {aliases=[testindex],fieldName=event}{aliases=[testindex],fieldName=_id}{aliases=[testindex],fieldName=_index}{aliases=[testindex],fieldName=_score}{aliases=[testindex],fieldName=_maxscore}{aliases=[testindex],fieldName=_sort}{aliases=[testindex],fieldName=_routing}",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Meanwhile, the equivalent OpenSearch query succeeds by accessing _source:

GET testindex/_search
{
  "_source": ["event.user"]
}

This inconsistency forces users to switch between PPL and DSL queries based on their index mapping configuration, creating a fragmented user experience.

What solution would you like?

Enhance PPL to support accessing fields from _source even when they're not explicitly mapped, particularly for objects with dynamic: false mapping.

The solution should:

  1. The query should works in schema less manner when the field not found in the mapping
  2. Automatically attempt to retrieve fields from _source when they're not found in the mapping

Example of desired behavior:

source=testindex | fields event.user.name, event.status

Should produce results like:

{"event.user.name": "Alice", "event.status": "ERROR"}

What alternatives have you considered?

  1. Explicitly mapping all fields - While this would solve the immediate issue, it's impractical for many log analytics use cases where:

    • Schema may evolve over time
    • Different log sources may have varying field structures
    • Index mapping size would grow exponentially for complex event structures
  2. Document transformation before indexing - Flattening nested objects during indexing could make all fields accessible, but this would significantly impact indexing performance and storage requirements.

Do you have any additional context?

This feature is critical for log analytics use cases where:

  1. Complex event structures are common
  2. dynamic: false is used to prevent mapping explosion
  3. Users need to query across a mix of explicitly mapped and unmapped fields

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languagecalcitecalcite migration releatedenhancementNew feature or request

    Type

    No type

    Projects

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions