Skip to content

Make warmup settings dynamic for pull-based ingestion#20931

Closed
kaustubhbutte17 wants to merge 1 commit intoopensearch-project:mainfrom
kaustubhbutte17:feature/warmup-dynamic-settings
Closed

Make warmup settings dynamic for pull-based ingestion#20931
kaustubhbutte17 wants to merge 1 commit intoopensearch-project:mainfrom
kaustubhbutte17:feature/warmup-dynamic-settings

Conversation

@kaustubhbutte17
Copy link
Copy Markdown
Contributor

Description

Follow-up to #20526. Changes the two warmup settings from Final (immutable after index creation) to Dynamic (updatable at runtime via the update settings API).

This allows operators to adjust warmup behavior without recreating the index.

Settings Changed

Setting Before After
index.ingestion_source.warmup.timeout Final Dynamic
index.ingestion_source.warmup.lag_threshold Final Dynamic

Use Cases

  • Disable warmup mid-progress — emergency escape hatch if warmup is taking too long
  • Adjust lag threshold — tune without recreating the index
  • Change timeout — increase/decrease without recreating the index

Dynamic Update Behavior

Scenario Behavior
Disable warmup while in progress (timeout → -1) Warmup completes immediately, shard starts serving
Change threshold while warmup in progress New threshold takes effect on next check cycle
Change timeout while warmup in progress New timeout takes effect on next check cycle
Enable warmup after shard is already STARTED Does NOT re-trigger warmup (would be disruptive)

Settings Propagation Path

Update Settings API → IndexSettings (consumer) → IndexShard.onSettingsChanged() → IngestionEngine.updateWarmupConfig() → DefaultStreamPoller.updateWarmupConfig()

Testing

  • testUpdateWarmupConfigDisableWhileInProgress — disabling mid-warmup marks complete
  • testUpdateWarmupConfigThresholdWhileInProgress — threshold update during warmup
  • testUpdateWarmupConfigDoesNotReEnableAfterCompletion — no re-trigger after shard is serving
  • testUpdateWarmupConfigTimeoutWhileInProgress — timeout update during warmup

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit 381320c)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Wrong Settings Source

In the constructor, warmupTimeout and warmupLagThreshold are initialized from nodeSettings instead of scopedSettings (index-scoped settings). This means the initial values will be read from node-level settings rather than the index-specific settings, which is inconsistent with how other index settings are initialized and may result in incorrect default values.

this.warmupTimeout = IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING.get(nodeSettings);
this.warmupLagThreshold = IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING.get(nodeSettings);
Duplicate getIndexerOrNull

getIndexerOrNull() is called twice in onSettingsChanged() — once at the top (stored as engineOrNull) and again for the warmup config block (stored as indexer). The second call is redundant and could simply reuse engineOrNull. This is a minor inefficiency but also a code clarity issue.

Indexer indexer = getIndexerOrNull();
if (indexer instanceof EngineBackedIndexer) {
    Engine engine = ((EngineBackedIndexer) indexer).getEngine();
    if (engine instanceof IngestionEngine) {
        IngestionSource.WarmupConfig newConfig = new IngestionSource.WarmupConfig(
            indexSettings.getWarmupTimeout(),
            indexSettings.getWarmupLagThreshold()
        );
        ((IngestionEngine) engine).updateWarmupConfig(newConfig);
    }
}
Race Condition

In updateWarmupConfig, the check oldConfig.isEnabled() && !newConfig.isEnabled() && !warmupComplete followed by setting warmupComplete = true and calling warmupLatch.countDown() is not atomic. A concurrent warmup completion in the polling thread could result in warmupLatch.countDown() being called twice (once here and once in the normal completion path), which is safe for CountDownLatch but warmupComplete could be set redundantly. More critically, the read-check-write on warmupComplete is not synchronized, potentially causing issues under concurrent access.

if (oldConfig.isEnabled() && !newConfig.isEnabled() && !warmupComplete) {
    warmupComplete = true;
    warmupLatch.countDown();
    logger.info("Warmup disabled for index {} shard {} via dynamic settings update", indexName, shardId);
}
Latch Reset Issue

warmupLatch is now volatile and initialized to new CountDownLatch(1). However, if warmup completes normally (latch counted down to 0) and then updateWarmupConfig is called to re-enable warmup (even though the current logic doesn't re-trigger), the latch is already at 0 and cannot be reset. If future logic ever needs to reset warmup state, this design will be problematic. Additionally, making warmupLatch volatile doesn't guarantee atomicity of the countDown() operation in all scenarios.

private volatile CountDownLatch warmupLatch = new CountDownLatch(1);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 19, 2026

PR Code Suggestions ✨

Latest suggestions up to 381320c

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Read index settings from scoped settings

The warmup settings should be read from scopedSettings (which represents index-level
settings) rather than nodeSettings (which represents node-level settings). Using
nodeSettings will not correctly pick up index-specific values, and is inconsistent
with how other index settings like isStarTreeIndexEnabled are initialized just
above.

server/src/main/java/org/opensearch/index/IndexSettings.java [1252-1253]

-this.warmupTimeout = IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING.get(nodeSettings);
-this.warmupLagThreshold = IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING.get(nodeSettings);
+this.warmupTimeout = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING);
+this.warmupLagThreshold = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING);
Suggestion importance[1-10]: 9

__

Why: Using nodeSettings instead of scopedSettings to read index-level settings is a significant bug. Index-specific values won't be picked up correctly, and it's inconsistent with how other index settings (like isStarTreeIndexEnabled) are initialized just above using scopedSettings.get(...).

High
Fix unsafe volatile latch replacement

Making warmupLatch volatile does not guarantee atomicity of the countDown() and
await() operations across threads. If updateWarmupConfig races with
awaitWarmupComplete, a thread could read the old latch reference after countDown()
has been called on the new one, causing it to block indefinitely. The original final
field was safer; consider using AtomicReference if the latch truly needs to be
replaceable.

server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java [69]

-private volatile CountDownLatch warmupLatch = new CountDownLatch(1);
+private final AtomicReference<CountDownLatch> warmupLatch = new AtomicReference<>(new CountDownLatch(1));
Suggestion importance[1-10]: 7

__

Why: Making warmupLatch volatile doesn't ensure safe concurrent usage between countDown() and await() operations. A race condition could cause a thread to block indefinitely if it reads the old latch reference. Using AtomicReference<CountDownLatch> would be safer, though the current PR doesn't appear to replace the latch reference, only calling countDown() on it.

Medium
General
Avoid redundant method call in settings update

getIndexerOrNull() is called twice in onSettingsChanged(). The second call is
redundant since engineOrNull already holds the same reference. Reuse engineOrNull
for the warmup config update block to avoid the duplicate call.

server/src/main/java/org/opensearch/index/shard/IndexShard.java [3344-3354]

 Indexer engineOrNull = getIndexerOrNull();
 if (engineOrNull != null) {
     ...
 }
 
 // Update warmup config if this is an ingestion engine
-Indexer indexer = getIndexerOrNull();
-if (indexer instanceof EngineBackedIndexer) {
+if (engineOrNull instanceof EngineBackedIndexer) {
Suggestion importance[1-10]: 5

__

Why: getIndexerOrNull() is called twice in onSettingsChanged(), which is redundant. Reusing the existing engineOrNull variable avoids the duplicate call and is a minor but valid code quality improvement.

Low

Previous suggestions

Suggestions up to commit 725179a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix settings read from wrong settings source

The warmup settings should be read from scopedSettings (which represents index-level
settings) rather than nodeSettings (which represents node-level settings). Using
nodeSettings will not correctly pick up the index-specific values set on the index
metadata, causing the initial values to be wrong.

server/src/main/java/org/opensearch/index/IndexSettings.java [1252-1253]

-this.warmupTimeout = IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING.get(nodeSettings);
-this.warmupLagThreshold = IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING.get(nodeSettings);
+this.warmupTimeout = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING);
+this.warmupLagThreshold = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING);
Suggestion importance[1-10]: 9

__

Why: Reading warmup settings from nodeSettings instead of scopedSettings is a significant bug — index-specific settings stored in index metadata won't be picked up correctly, causing incorrect initial values for warmupTimeout and warmupLagThreshold. The fix is straightforward and critical for correctness.

High
Avoid redundant method call with potential race condition

getIndexerOrNull() is called twice, which is unnecessary and could return different
values if there's a race condition. Reuse the already-retrieved engineOrNull
variable instead of calling getIndexerOrNull() again and storing it in a new indexer
variable.

server/src/main/java/org/opensearch/index/shard/IndexShard.java [3332-3354]

 Indexer engineOrNull = getIndexerOrNull();
 if (engineOrNull != null) {
     final boolean disableTranslogRetention = indexSettings.isSoftDeleteEnabled() && useRetentionLeasesInPeerRecovery;
     engineOrNull.onSettingsChanged(
         disableTranslogRetention ? TimeValue.MINUS_ONE : indexSettings.getTranslogRetentionAge(),
         disableTranslogRetention ? new ByteSizeValue(-1) : indexSettings.getTranslogRetentionSize(),
         indexSettings.getSoftDeleteRetentionOperations()
     );
 }
 
 // Update warmup config if this is an ingestion engine
-Indexer indexer = getIndexerOrNull();
+Indexer indexer = engineOrNull;
Suggestion importance[1-10]: 7

__

Why: getIndexerOrNull() is called twice in onSettingsChanged, which introduces a potential race condition where the two calls could return different values. Reusing engineOrNull for the warmup config update block is a valid and important correctness improvement.

Medium
Fix race condition in warmup state transition

Making warmupLatch volatile does not guarantee safe publication of the latch object
itself when countDown() is called concurrently with awaitWarmupComplete(). There is
a race condition in updateWarmupConfig where warmupComplete is set to true and
warmupLatch.countDown() is called without synchronization, which could allow a
thread in awaitWarmupComplete to see warmupComplete=false but miss the latch
countdown. Consider using synchronized or an AtomicBoolean to guard these state
transitions.

server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java [69]

-private volatile CountDownLatch warmupLatch = new CountDownLatch(1);
+@Override
+public synchronized void updateWarmupConfig(IngestionSource.WarmupConfig newConfig) {
+    IngestionSource.WarmupConfig oldConfig = this.warmupConfig;
+    this.warmupConfig = newConfig;
 
+    // If warmup was enabled and is now disabled, mark as complete
+    if (oldConfig.isEnabled() && !newConfig.isEnabled() && !warmupComplete) {
+        warmupComplete = true;
+        warmupLatch.countDown();
+        logger.info("Warmup disabled for index {} shard {} via dynamic settings update", indexName, shardId);
+    }
+
+    logger.info(
+        "Warmup config updated for index {} shard {}: timeout={}, lagThreshold={}",
+        indexName,
+        shardId,
+        newConfig.timeout(),
+        newConfig.lagThreshold()
+    );
+}
+
Suggestion importance[1-10]: 6

__

Why: Making warmupLatch volatile doesn't protect against the race condition between warmupComplete and warmupLatch.countDown() in updateWarmupConfig. Adding synchronized to updateWarmupConfig is a reasonable mitigation, though the existing code already uses volatile for warmupComplete which provides some visibility guarantees.

Low
Suggestions up to commit cabc291
CategorySuggestion                                                                                                                                    Impact
Possible issue
Read index settings from scoped settings

The warmup settings should be read from scopedSettings (which represents index-level
settings) rather than nodeSettings (which represents node-level settings). Using
nodeSettings will not correctly pick up index-specific values, and is inconsistent
with how other index settings like isStarTreeIndexEnabled are initialized just
above.

server/src/main/java/org/opensearch/index/IndexSettings.java [1252-1253]

-this.warmupTimeout = IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING.get(nodeSettings);
-this.warmupLagThreshold = IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING.get(nodeSettings);
+this.warmupTimeout = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING);
+this.warmupLagThreshold = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING);
Suggestion importance[1-10]: 9

__

Why: Using nodeSettings instead of scopedSettings to read index-level settings is a significant bug. Index-specific values won't be picked up correctly, and it's inconsistent with how other settings like isStarTreeIndexEnabled are initialized just above using scopedSettings.get(...).

High
Fix non-atomic warmup state update

Making warmupLatch volatile does not make operations on it thread-safe. The
updateWarmupConfig method calls warmupLatch.countDown(), but if another thread is
concurrently calling awaitWarmupComplete which uses the same latch, there could be a
race condition. The latch itself is not replaced, so making it volatile provides no
additional safety here — the real concern is that warmupComplete and
warmupLatch.countDown() are not updated atomically, which could lead to threads
missing the signal.

server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java [69]

-private volatile CountDownLatch warmupLatch = new CountDownLatch(1);
+private final CountDownLatch warmupLatch = new CountDownLatch(1);
Suggestion importance[1-10]: 4

__

Why: Making warmupLatch volatile doesn't provide additional thread safety since the latch itself is never replaced. Reverting to final is more semantically correct, though the real atomicity concern between warmupComplete and warmupLatch.countDown() remains regardless of this change.

Low
General
Avoid redundant method call

getIndexerOrNull() is called twice in onSettingsChanged(), which is redundant. The
second call should reuse the result from the first call (engineOrNull) instead of
calling getIndexerOrNull() again and storing it in a new variable indexer.

server/src/main/java/org/opensearch/index/shard/IndexShard.java [3344-3354]

 Indexer engineOrNull = getIndexerOrNull();
 if (engineOrNull != null) {
     ...
 }
 
 // Update warmup config if this is an ingestion engine
-Indexer indexer = getIndexerOrNull();
-if (indexer instanceof EngineBackedIndexer) {
+if (engineOrNull instanceof EngineBackedIndexer) {
Suggestion importance[1-10]: 5

__

Why: getIndexerOrNull() is called twice unnecessarily; the result from the first call (engineOrNull) could be reused. This is a minor code quality issue with no functional impact.

Low
Suggestions up to commit 7d78efa
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix settings read from wrong settings source

The warmup settings should be read from scopedSettings (which represents index-level
settings) rather than nodeSettings (which represents node-level settings). Using
nodeSettings will not correctly pick up the index-specific values set on the index
metadata, potentially causing incorrect initialization of warmup configuration.

server/src/main/java/org/opensearch/index/IndexSettings.java [1252-1253]

-this.warmupTimeout = IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING.get(nodeSettings);
-this.warmupLagThreshold = IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING.get(nodeSettings);
+this.warmupTimeout = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_TIMEOUT_SETTING);
+this.warmupLagThreshold = scopedSettings.get(IndexMetadata.INGESTION_SOURCE_WARMUP_LAG_THRESHOLD_SETTING);
Suggestion importance[1-10]: 9

__

Why: Reading warmup settings from nodeSettings instead of scopedSettings is a significant bug - index-scoped settings should be read from scopedSettings to correctly pick up index-specific values. Other similar settings in the same constructor (e.g., isStarTreeIndexEnabled) correctly use scopedSettings.get(...).

High
Avoid unsafe volatile latch reassignment

Making warmupLatch volatile does not guarantee atomicity of the countDown()
operation relative to reassignment. If updateWarmupConfig is called concurrently
with awaitWarmupComplete, a thread could be waiting on the old latch while a new
latch is assigned, or countDown could be called on a stale reference. Since the
latch is only ever counted down (never replaced), keeping it final is safer; the
current code never reassigns it, so volatile provides no benefit and introduces a
false sense of thread safety.

server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java [69]

-private volatile CountDownLatch warmupLatch = new CountDownLatch(1);
+private final CountDownLatch warmupLatch = new CountDownLatch(1);
Suggestion importance[1-10]: 6

__

Why: The warmupLatch field is changed from final to volatile but is never reassigned in the code, making volatile misleading and providing no benefit. Using final is safer and more correct since CountDownLatch is already thread-safe internally.

Low
General
Eliminate redundant method call

getIndexerOrNull() is called twice in onSettingsChanged(), which is redundant. The
second call should reuse the result from the first call (engineOrNull) instead of
calling getIndexerOrNull() again and storing it in a new variable indexer.

server/src/main/java/org/opensearch/index/shard/IndexShard.java [3344-3354]

 Indexer engineOrNull = getIndexerOrNull();
 if (engineOrNull != null) {
     ...
 }
 
 // Update warmup config if this is an ingestion engine
-Indexer indexer = getIndexerOrNull();
-if (indexer instanceof EngineBackedIndexer) {
+if (engineOrNull instanceof EngineBackedIndexer) {
Suggestion importance[1-10]: 5

__

Why: getIndexerOrNull() is called twice unnecessarily - the result from the first call (engineOrNull) could be reused directly, avoiding a redundant method call and an extra variable indexer.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 7d78efa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cabc291

@kaustubhbutte17 kaustubhbutte17 force-pushed the feature/warmup-dynamic-settings branch from cabc291 to 725179a Compare March 20, 2026 02:41
@kaustubhbutte17 kaustubhbutte17 marked this pull request as draft March 20, 2026 02:41
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 725179a

Change warmup.timeout and warmup.lag_threshold from Final to Dynamic
settings so they can be updated at runtime via the update settings API.

This allows operators to:
- Disable warmup while it's in progress (emergency escape hatch)
- Adjust lag threshold without recreating the index
- Change timeout without recreating the index

Settings propagation: IndexSettings -> IndexShard.onSettingsChanged() ->
IngestionEngine.updateWarmupConfig() -> DefaultStreamPoller.updateWarmupConfig()

Signed-off-by: Kaustubh Butte <kaustubhbutte17@gmail.com>
@kaustubhbutte17 kaustubhbutte17 force-pushed the feature/warmup-dynamic-settings branch from 725179a to 381320c Compare March 20, 2026 02:57
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 381320c

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 381320c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant