-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[META] Context Aware Segments #19918
Description
Please describe the end goal of this project
In order to ensure, only more relevant data is iterated during query execution, we suggest collocating related data into same segment or group of segments. Group of a segment can be determined by a grouping criteria function. The goal is to align segment boundaries with the anticipated query patterns, ensuring that documents frequently queried together resides in the same segments. For eg: For log analytics scenarios, users often queries for anomalies (4xx and 5xx status code logs) over success logs (2xx). By applying a grouping criteria function based on status code anomalies and success logs are segregated into distinct segments (or groups of segments). This will ensure that search queries like “number of faults in the last hour” or “number of errors in the last three hours” will be more efficient, as they will need to only process segments with 4xx or 5xx status codes, which will be a much smaller dataset, improving query performance.
Supporting References
RFC: #18576
Lucene: apache/lucene#13387
Issues
[] #19098 Indexing support for Context Aware Segments
[] #19233 Adding support for grouping criteria in Context Aware Segments
[] #19558 Skipping less relevant segments for Context Aware Segments
[] #19851 LeafReader removes all SubReaderWrappers incase IndexWriter encounters a non aborting Exception
[] apache/lucene#15352 Efficient way to calculate hardLiveDocs count of a SegmentReader when both hard and soft deletes are present
[] #19917 Enabling integ test for context aware segments as a parameterised test case
[] #19919 Fix indexing regression for Context Aware Segments due to contention on DocumentWriterFlushControl
[] #19920 Add a merge policy provider for Criteria Based Merge policy
[] #19921 Remove child level directory during refresh
[] #19922 Adding Guardrail on file handle count and virtual memory address for context aware segments enabled
[] #19923 Add stats api for file handles
[] #19924 Add integ test case for Context Aware Segments
[] #19965 Recovery failure incase CompositeIndexWriter is unable to obtain lock on Active map during recovery
[] #20111 Monitor virtual file address and file handle count in DiskThresholdMonitor
[] #20380 Avoid iterating all child level writers inside CompositeIndexWriters
[] #20418 Add Chaos Testing for Context Aware Segments
Related component
Indexing
Metadata
Metadata
Assignees
Labels
Type
Projects
Status