Skip to content

Better fix for dataflow analysis caching#1353

Merged
msridhar merged 3 commits intomasterfrom
fix-caching-again
Nov 26, 2025
Merged

Better fix for dataflow analysis caching#1353
msridhar merged 3 commits intomasterfrom
fix-caching-again

Conversation

@msridhar
Copy link
Copy Markdown
Collaborator

@msridhar msridhar commented Nov 26, 2025

Fixes #1351

We get rid of having two related caches that can get out of sync. Instead, we create a new RunOnceForwardAnalysisImpl class that allows for running a dataflow analysis at most once.

Summary by CodeRabbit

  • Refactor
    • Improved analysis caching to simplify and speed reuse of analysis results.
    • Introduced a run-once analysis wrapper to ensure each analysis executes at most once, reducing redundant work.
    • Removed redundant per-parameter run-tracking and related invalidation logic, streamlining cache management and instantiation paths.

✏️ Tip: You can customize this high-level summary in your review settings.

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.39%. Comparing base (119448e) to head (c320c3a).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #1353   +/-   ##
=========================================
  Coverage     88.39%   88.39%           
- Complexity     2592     2594    +2     
=========================================
  Files            97       98    +1     
  Lines          8702     8705    +3     
  Branches       1732     1732           
=========================================
+ Hits           7692     7695    +3     
  Misses          505      505           
  Partials        505      505           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

Main Branch:

Benchmark                          Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile      thrpt   25   9.951 ± 0.102  ops/s
CaffeineBenchmark.compile         thrpt   25   2.065 ± 0.013  ops/s
DFlowMicroBenchmark.compile       thrpt   25  35.200 ± 0.306  ops/s
NullawayReleaseBenchmark.compile  thrpt   25   1.562 ± 0.022  ops/s

With This PR:

Benchmark                          Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile      thrpt   25   9.865 ± 0.100  ops/s
CaffeineBenchmark.compile         thrpt   25   2.015 ± 0.032  ops/s
DFlowMicroBenchmark.compile       thrpt   25  34.706 ± 0.289  ops/s
NullawayReleaseBenchmark.compile  thrpt   25   1.536 ± 0.047  ops/s

@msridhar msridhar marked this pull request as ready for review November 26, 2025 22:00
@msridhar msridhar requested a review from yuxincs November 26, 2025 22:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

The PR changes DataFlow to store RunOnceForwardAnalysisImpl instances in the analysis cache instead of generic Analysis objects, removes the separate alreadyRunAnalyses cache and its associated invalidation/lookup logic, and updates cache loader and dataflow invocation sites to use RunOnceForwardAnalysisImpl. A new RunOnceForwardAnalysisImpl class wraps ForwardAnalysisImpl and overrides performAnalysis to execute and cache results only on the first call.

Possibly related PRs

Suggested reviewers

  • lazaroclapp
  • yuxincs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Better fix for dataflow analysis caching' accurately describes the main change: improving the caching mechanism for dataflow analysis by fixing the issue with unsynchronized caches.
Linked Issues check ✅ Passed The PR successfully addresses #1351's root problem by replacing the dual-cache approach with RunOnceForwardAnalysisImpl, ensuring analyses execute at most once and eliminating cache inconsistency issues.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the caching issue described in #1351; the new RunOnceForwardAnalysisImpl class and refactored DataFlow caching are within scope.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-caching-again

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e46ebe and c320c3a.

📒 Files selected for processing (2)
  • nullaway/src/main/java/com/uber/nullaway/dataflow/DataFlow.java (2 hunks)
  • nullaway/src/main/java/com/uber/nullaway/dataflow/RunOnceForwardAnalysisImpl.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build and test on windows-latest
  • GitHub Check: Build spring-framework with snapshot
  • GitHub Check: Build and test on ubuntu-latest
  • GitHub Check: Build and test on macos-latest
  • GitHub Check: Build caffeine with snapshot
🔇 Additional comments (1)
nullaway/src/main/java/com/uber/nullaway/dataflow/DataFlow.java (1)

80-90: New single-cache + RunOnce wrapper correctly fixes the invalidation bug

Moving analysisCache to store RunOnceForwardAnalysisImpl<?, ?, ?> and invoking performAnalysis(cfg) on that wrapper (which is idempotent per instance) removes the prior split between analysisCache and alreadyRunAnalyses. After eviction, any reloaded analysis instance starts with analysisPerformed == false and will be initialized when performAnalysis is requested, so a newly created analysis can no longer be mistakenly treated as “already run” with uninitialized internal state. The generics cast and call sites still match the previous usage patterns, so behavior is preserved while the cache inconsistency is eliminated.

Also applies to: 159-163

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 119448e and 5e46ebe.

📒 Files selected for processing (2)
  • nullaway/src/main/java/com/uber/nullaway/dataflow/DataFlow.java (2 hunks)
  • nullaway/src/main/java/com/uber/nullaway/dataflow/RunOnceForwardAnalysisImpl.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: benchmarking
  • GitHub Check: Build and test on windows-latest
  • GitHub Check: Build and test on macos-latest
  • GitHub Check: Build spring-framework with snapshot
  • GitHub Check: Build and test on ubuntu-latest
  • GitHub Check: Build caffeine with snapshot
🔇 Additional comments (3)
nullaway/src/main/java/com/uber/nullaway/dataflow/RunOnceForwardAnalysisImpl.java (1)

9-15: Run-once wrapper cleanly encapsulates the caching fix

The class-level design and Javadoc clearly move the “already run” state into the analysis instance itself, which should prevent the cache/set desynchronization that caused #1351. Generics and inheritance from ForwardAnalysisImpl look correct and consistent with the existing dataflow APIs.

nullaway/src/main/java/com/uber/nullaway/dataflow/DataFlow.java (2)

80-90: Good refactor: cache now stores run-once analyses keyed by (transfer, cfg)

Switching analysisCache to LoadingCache<AnalysisParams, RunOnceForwardAnalysisImpl<?, ?, ?>> with the loader constructing a fresh RunOnceForwardAnalysisImpl for each (transferFunction, cfg) key cleanly centralizes the “already run” state inside the analysis instance. This eliminates the possibility of the cache entry being evicted while a separate alreadyRunAnalyses structure still claims the key is initialized, which is exactly the inconsistency that led to the NPE in issue #1351. The key still includes cfg, so a single analysis instance is never shared across different CFGs.


159-163: performAnalysisIfNeeded(cfg) correctly replaces external bookkeeping

Using RunOnceForwardAnalysisImpl<A, S, T> here and calling analysis.performAnalysisIfNeeded(cfg) whenever performAnalysis is true ensures that:

  • Analyses are initialized exactly once per cached instance, even if dataflow is called multiple times.
  • After eviction, a newly created analysis will always run before getResult()/getRegularExitStore() is used, so cfg-dependent fields can’t remain null.

This aligns with the PR’s goal of fixing the out-of-sync cache state and removes the need for a separate “already run” cache.

@msridhar msridhar enabled auto-merge (squash) November 26, 2025 22:54
@msridhar msridhar merged commit 7cd8b1b into master Nov 26, 2025
11 checks passed
@msridhar msridhar deleted the fix-caching-again branch November 26, 2025 23:04
@github-actions
Copy link
Copy Markdown

Main Branch:

Benchmark                          Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile      thrpt   25  10.016 ± 0.032  ops/s
CaffeineBenchmark.compile         thrpt   25   2.069 ± 0.014  ops/s
DFlowMicroBenchmark.compile       thrpt   25  35.227 ± 0.209  ops/s
NullawayReleaseBenchmark.compile  thrpt   25   1.573 ± 0.045  ops/s

With This PR:

Benchmark                          Mode  Cnt   Score   Error  Units
AutodisposeBenchmark.compile      thrpt   25   9.902 ± 0.062  ops/s
CaffeineBenchmark.compile         thrpt   25   2.055 ± 0.011  ops/s
DFlowMicroBenchmark.compile       thrpt   25  34.747 ± 0.416  ops/s
NullawayReleaseBenchmark.compile  thrpt   25   1.550 ± 0.041  ops/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect Cache invalidation in Nullaway 0.12.12

2 participants