Skip to content

fix: critical epoch transition gap#2113

Merged
nimrod-teich merged 5 commits into
mainfrom
fix_critical_epoch_transition_gap
Dec 7, 2025
Merged

fix: critical epoch transition gap#2113
nimrod-teich merged 5 commits into
mainfrom
fix_critical_epoch_transition_gap

Conversation

@AnnaR-prog

Copy link
Copy Markdown
Contributor

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • read the contribution guide
  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the main branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

@github-actions

github-actions Bot commented Nov 24, 2025

Copy link
Copy Markdown

Test Results

3 094 tests  ±0   3 093 ✅ +1   32m 18s ⏱️ + 11m 39s
  126 suites ±0       1 💤 ±0 
    7 files   ±0       0 ❌ ±0 

Results for commit 9689d8d. ± Comparison against base commit 1832ff4.

♻️ This comment has been updated with latest results.

@avitenzer avitenzer changed the title Fix: critical epoch transition gap fix: critical epoch transition gap Nov 25, 2025

@avitenzer avitenzer Nov 25, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cleanup of previousEpochBlockedProviders at the end of the checkAndUnblockHealthyReBlockedProviders function

defer func() {
csm.previousEpochBlockedProviders = make(map[string]struct{})
}()

Also you should check that the current epoch is still valid i.e.

if epoch != csm.atomicReadCurrentEpoch() {
utils.LavaFormatDebug("Skipping re-blocked provider check due to epoch change",
utils.Attribute{Key: "requestedEpoch", Value: epoch},
utils.Attribute{Key: "currentEpoch", Value: csm.atomicReadCurrentEpoch()},
)
return
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pull-request-size pull-request-size Bot added size/L and removed size/M labels Nov 30, 2025
Fixes race condition at epoch boundaries where blocked providers got clean
slate and users hit them with long timeouts before probes could re-block.

Solution: Save and restore blocking history, immediately unblock if probe
succeeds by checking reportedProviders state (no duplicate probing)
- Refactor validateAndReturnBlockedProviderToValidAddressesList to separate locking logic, resolving a recursive lock deadlock in checkAndUnblockHealthyReBlockedProviders.
- Inject Unique Identifier (GUID) into context in checkAndUnblockHealthyReBlockedProviders to prevent probeProvider from failing during comprehensive probes.
@AnnaR-prog AnnaR-prog force-pushed the fix_critical_epoch_transition_gap branch from ae7f5d2 to f79c584 Compare December 2, 2025 10:12
…kedProviders

- Add epoch validity check to prevent processing stale data during rapid epoch transitions
- Add cleanup of previousEpochBlockedProviders after processing to free memory
- Prevents race condition where probe completes after epoch has already advanced
- Critical for test reliability with 10s epoch timers
@nimrod-teich nimrod-teich merged commit 248c649 into main Dec 7, 2025
48 of 49 checks passed
@nimrod-teich nimrod-teich deleted the fix_critical_epoch_transition_gap branch December 7, 2025 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants