A Comprehensive Guide to Elasticsearch Flush

What is Elasticsearch Flush?

Elasticsearch flush refers to the process of moving data from the in-memory transaction log into immutable segments in the Lucene index on disk.

The transaction log acts as a buffer where documents are initially stored when indexed. Flushing data to disk removes the need for Elasticsearch to maintain two copies of the data and frees up heap memory used for buffering by deleting unused transaction log files.

Index Anatomy – Where is Data Stored?

To understand flushing, let‘s first look at where data lives inside an Elasticsearch index:

Transaction Log: Stores document additions, updates and deletions. Changes held temporarily in memory.
Lucene Index: Actual persisted search index organized into segments on disk.
Operating System Cache: Recently accessed segments cached in OS page cache to speed up I/O.

When a document is indexed, it lands in the transaction log, giving the appearance that it‘s searchable. Behind the scenes, Elasticsearch efficiently batches up changes in memory before flushing to Lucene.

Lucene Index Segment Details

The Lucene index consists of one or more immutable segments created by flushing batches of documents from the transaction log.

Segments represent a point-in-time snapshot of the index with data structured for efficient search and compression. As documents get updated, new segments accumulate over time.

Elasticsearch automatically merges old segments in the background based on factor like:

Segment count reaching 32 per shard
Merge policy minimizing small segments
Indexing throttling allowing merge I/O

Merging improves search performance by reading from fewer files. It also drops deleted documents taking up space.

Understanding this anatomy explains why flushing is vital for making data persist, visible and efficiently accessible.

Why is Flushing Important?

Here are some key reasons why flushing is an important process in Elasticsearch:

Frees up heap memory by releasing in-memory transaction log buffers
Makes documents persist across cluster restarts
Allows field data and term vectors to load into memory for fast retrieval
Clears up heap for holding real-time index changes in memory
Makes documents visible to search across all shards post-flush

By default, Elasticsearch handles flushing automatically based on activity. However, the flush API allows manually invoking flushes when needed.

How the Flushing Process Works

When flush is triggered, here is what happens under the hood:

The transaction log is closed to writes so no new documents are indexed during flush.
All pending changes in the memory buffer are applied to an on-disk directory associated with the shard.
A new immutable segment is created containing documents visible at the point-in-time flush started.
The segment is sync‘ed to disk and files cataloged in the Lucene commit point.
The older transaction log is cleared and heap usage reduced.
A background merge process later optimizes newly flushed segments with older ones.

So in summary, flushing batches up document changes, applies them atomically, adds a new searchable segment and frees committed data from memory.

Elasticsearch Flush API

The flush API enables manually flushing one or more indices or data streams. Here is the basic syntax:

POST /<target>/_flush

For example, to flush an index named logs:

POST /logs/_flush

And to flush multiple targets:

POST /logs,metrics,events/_flush

This commits pending changes for those indices and data streams from memory to disk.

Flush API Parameters

Additional flush API parameters control behavior:

allow_no_indices – Ignore missing indices. Defaults false.
expand_wildcards – Expand wildcards into matching open/closed indices or data streams.
force – Flush even if no pending changes. Defaults false.
ignore_unavailable– Skip unavailable targets. Defaults false.
wait_if_ongoing – Block until in-flight flushes complete. Defaults false.

For example:

POST /_flush?wait_if_ongoing=true&force=true

Forces a flush on all indices waiting for any running flushes to finish first.

When are Indices/Data Streams Flushed?

By default, Elasticsearch handles flushing in the background when:

Translog fsync interval – defers commits for better throughput
Lucene segment count – merges based on target range
Memory threshold – heap usage avoiding OOM error
Time threshold – period defensive flush if queries fall behind

However, factors like index workload can influence automatic flush triggers:

Trigger Factor	Description
Indexing throughput	High ingest rate fills up transaction log faster necessitating more flushes
Index codec	Codec compression schemes impact size of flushed segments
Translog retention	Bigger buffered transaction logs delay flushes

Analyzing time-series flush metrics can indicate issues like uneven shard growth or memory pressure throttling indexing.

Tuning Flush Thresholds

Thresholds controlling flush behavior are configurable per index or globally like:

PUT /logs
{
  "settings": {    
    "index.translog.sync_interval": "60s",  
    "index.translog.durability": "async",
    "index.refresh_interval": "-1",
    "index.number_of_shards": "2" 
  }
}

Key settings to consider for flush optimization:

index.translog.sync_interval – fsync translogs less often
index.codec – smaller serialized formats (default: LZ4)
index.number_of_shards – allow sufficient primary shard resources
cluster.routing.allocation.total_shards_per_node – prevent flush contention

Tuning based on workload and capacity can alleviate flush bottlenecks.

Adaptive Flush Optimization

Elasticsearch 7.8 introduced adaptive flush to dynamically calibrate background flush triggers:

Reduces flush frequency during heavy load
Increases flush rate when queries are blocked
Targets higher non-flushing workload percentage

This minimizes impact to latency-sensitive search queries from resource intensive flushing.

Enable via:

PUT /_cluster/settings 
{
  "persistent" : {
    "indices.adaptive_flushing.adjust_for_index_patterns" : "*_event" 
  }
}

So rather than static flush triggers, adaptive flush aligns with real-time demands.

Monitoring Flushes

Elasticsearch exposes flush metrics to help track behavior:

1. Flush Statistics API

GET /_flush/stats

Returns aggregate flush counts, duration and periodic vs total triggered:

{
  "total" : {
    "periodic": 5,  
    "total": 152    
  }
}

2. Index Stats API

Per index shard flush stats:

GET /logs/_stats

{
  "indices": {
    "logs": {
      "shards": {
        "0": [        
           {
             "flush": {
                "total": 10,
                "total_time": "14.3ms",
             }
           }
        ]  
       }
    }
 }
}

3. X-Pack Monitoring

Time-series metrics on flush operations, I/O impact and latency:

Combining cluster-level stats, per-index details and historical trends provides a solid picture of flush efficiency.

The next section covers example use cases for manual flushes.

Use Cases for Manual Flushing

Mostly, the automatic flush behavior meets throughput and latency needs. However, several cases benefit from manual flushing:

Before Node Shutdowns

Flushing all indices manually before decommissioning nodes ensures no data loss from transaction logs:

POST /_flush?wait_if_ongoing=true

Speeds up recovery time later by avoiding full transaction log replay.

Free Up Heap Space

Flushing buffered in-memory changes from heavy indexing load frees heap for search:

POST /logs/_flush

Commits pending field data that may otherwise hit circuit breaker thresholds.

Refresh Search

During critical troubleshooting, flush instantly refreshes search indices:

POST /logs/_flush&refresh

Ensures dashboards query documents indexed just milliseconds earlier.

Rebalance Uneven Shards

Over time, uneven segments across shards can imbalanced sizes. Force flushing rebalances primary load:

POST /logs/_flush?force=true

Merging later evens out any diverging documents or deletes artifacts.

Flushing Data Streams vs Indices

Data streams consist of backing indices managed transparently:

POST /logs-datastream/_flush

Flushes all backing indices to publish changes across generations.

This differs from regular indices powered by one set of shards without any nesting. Data streams often leverage rollover alias automation to expose this implementation detail.

Tuning Data Stream Flush

In most cases, the out-of-box configuration efficiently flushes backing data stream indices. However, optimizations like adaptive flush apply equally to data streams as regular indices:

PUT /_cluster/settings
{
  "persistent": {
    "indices.adaptive_flushing.adjust_for_data_stream": "logs-*" 
  }
}

So while the data model shifts complexity into the engine, optimizations targeting flushing epochs and search responsiveness still translate analogously from indices.

Now that we‘ve covered flushing internals along with monitoring and use cases, let‘s explore some common performance tradeoffs.

Balancing Throughput, Durability and Query Latency

Flushing inherently balances durability guarantees with ingest rate throughput and search latency:

Less flushing improves indexing speed allowing transaction logs to buffer more changes in memory before persisting to disk. However, this increases heap usage and risks data loss in crashes.

More flushing provides stronger durability by frequently syncing the transaction log. But this adds CPU and I/O overhead reducing indexing throughput. More segments also impact search latency from extra file handles.

Understanding this spectrum enables configuring flush behavior aligned to requirements as workloads shift:

Bulk ingestion pipelines warrant deferred flushing
Transactional systems need pushing durability consistently
Mixed query and ingest clusters balance with adaptive flush

So while segments advance the immutable index, the transaction log acts as a damping mechanism absorbing traffic spikes during bursts. Finding the right operating point comes down to strengths needed on consistency, scale and query freshness.

Recovery Impact

When a shard relocates, peer shards replay local transactions to rebuild indices consistently before going live. Minimal buffered docs speeds up recovery:

Recovery checklists:

Transaction log replay – drain legacy buffer
Index validation – ensure documents converge
Searchers warm-up – load field data caches

Flushing bounds transaction logs minimizing this warmup. So for clusters with node volatility from failovers or autoscaling, keeping shards lean with more eager flush improves availability.

However, this competes with segment merging which enables dropping deleted tombstones. Tuning often balances tradeoffs based on restart SLAs vs index longevity retaining past data.

Best Practices

Here are guidelines for efficient flushing:

Profile time between flushes during indexing load tests
Revisit cirucit breaker thresholds if flush stalls occur
Consider index tagging older data to flush and merge less frequently
Enable compression with adaptive flush for large indices spanning cold-hot data
Add replicas shards on indexing clusters for resource headroom
Prefer SSD storage for better flush fync throughput

And in summary, flushing serves a key role making in-flight indexed data durable and visible in Elasticsearch while balancing efficiency tradeoffs. Monitoring segment patterns over time reveals optimization opportunities to handle evolivng search and ingestion workloads.

Troubleshooting Common Issues

Here is a flush-related issues FAQ:

Why are my dashboards slow after bulk indexing?

New documents require a refresh post-flush before visible to search. Explicit refresh requests expedite visibility.

Why did indexing speed slow down after moving large shards?

Check flush metrics as replay of unused transaction log temporarily slows ingestion untilCACHE catches up.

How do I speed up my 2 node cluster recovery during upgrades?

Manually flush indices to minimize transaction log carryover across restarts.

What tuning options can reduce flush bottlenecks on overloaded nodes?

Try relaxing flush triggers, disabling unused features,higher durability or adding replicas.

A Comprehensive Guide to Elasticsearch Flush

What is Elasticsearch Flush?

Index Anatomy – Where is Data Stored?

Lucene Index Segment Details

Why is Flushing Important?

How the Flushing Process Works

Elasticsearch Flush API

Flush API Parameters

When are Indices/Data Streams Flushed?

Tuning Flush Thresholds

Adaptive Flush Optimization

Monitoring Flushes

1. Flush Statistics API

2. Index Stats API

3. X-Pack Monitoring

Use Cases for Manual Flushing

Before Node Shutdowns

Free Up Heap Space

Refresh Search

Rebalance Uneven Shards

Flushing Data Streams vs Indices

Tuning Data Stream Flush

Balancing Throughput, Durability and Query Latency

Recovery Impact

Best Practices

Troubleshooting Common Issues

How to Join the MrBeast Gaming Discord Server: A Developer‘s Guide

Selectively Unstashing Files with Surgical Precision

Optimizing Linux Mint for Better Sleep: An Expert Developer‘s Guide to Blocking Blue Light

Demystifying the Key Differences: An In-Depth Guide to ‘git remote update‘, ‘git fetch‘ and ‘git pull‘

[Fixed] Windows 10 Slow Internet Issue

An Expert Guide to SciPy‘s Powerful Convolve Function

Linuxhaxor.net – About Open Source & Linux

What is Elasticsearch Flush?

Index Anatomy – Where is Data Stored?

Lucene Index Segment Details

Why is Flushing Important?

How the Flushing Process Works

Elasticsearch Flush API

Flush API Parameters

When are Indices/Data Streams Flushed?

Tuning Flush Thresholds

Adaptive Flush Optimization

Monitoring Flushes

1. Flush Statistics API

2. Index Stats API

3. X-Pack Monitoring

Use Cases for Manual Flushing

Before Node Shutdowns

Free Up Heap Space

Refresh Search

Rebalance Uneven Shards

Flushing Data Streams vs Indices

Tuning Data Stream Flush

Balancing Throughput, Durability and Query Latency

Recovery Impact

Best Practices

Troubleshooting Common Issues

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux