Skip to content

Introduce Type-Driven Source Exclusion via auto_exclude_types#133377

Closed
Rassyan wants to merge 1 commit intoelastic:mainfrom
Rassyan:auto-exclude-types
Closed

Introduce Type-Driven Source Exclusion via auto_exclude_types#133377
Rassyan wants to merge 1 commit intoelastic:mainfrom
Rassyan:auto-exclude-types

Conversation

@Rassyan
Copy link
Copy Markdown
Contributor

@Rassyan Rassyan commented Aug 22, 2025

Summary

This PR introduces a new index-level setting:

PUT /my_index
{
  "settings": {
    "index.mapping.source.auto_exclude_types": ["dense_vector", "binary" ... ] 
  }
}

Key innovations

  1. Type-Driven Automation
    Automatically excludes all fields of specified types from _source at index time
  2. ⚡️ Zero-Overhead Integration
    • Dynamically injects exclusions during mapping parsing
    • Requires no manual field declarations

Synergy with Related Work

Provides foundational infrastructure for #133337's hybrid source reconstruction:

auto_exclude_types → Pruned _source → Adaptive Hybrid Reconstruction

Why This Approach?

As discussed by @benwtrent:

@mayya-sharipova the idea is that you have one parameter to exclude all vector fields, by their type, instead of providing each unique vector field name.

I also suppose this unlocks future consideration of having an index level default for _source inclusion at query time type of setting that is applied by default for all queries.

But, your comment @mayya-sharipova makes me wonder if we should do something like:

{
  "_source": {
    "mapping_type_excludes": [ "dense_vector", "sparse_vector" ]
  }...
}

Instead of having something called include_vectors.

@jimczi would we add some index level setting that applies this default source filtering at query time? Is that the ultimate goal here?

Originally posted in #128735 (comment)

By implementing this through index.mapping.source.auto_exclude_types, we maintain backward compatibility with existing _source.includes/excludes. I believe this approach delivers immediate value while paving the way for more advanced source optimization techniques.

@Rassyan Rassyan requested a review from a team as a code owner August 22, 2025 09:19
@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 22, 2025
@ldematte ldematte added the :Search Relevance/Search Catch all for Search Relevance label Aug 22, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Aug 22, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@john-wagster
Copy link
Copy Markdown
Contributor

I really appreciate you taking a pass here. I talked this through with a few folks internally and this has other impacts that we'll have to consider and probably take this in a separate direction. Closing for now and we'll come back after some additional internal conversation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants