Skip to content

Fix lookupPathResolved thread safety#415

Merged
edolstra merged 2 commits intomainfrom
fix-412
Apr 7, 2026
Merged

Fix lookupPathResolved thread safety#415
edolstra merged 2 commits intomainfrom
fix-412

Conversation

@edolstra
Copy link
Copy Markdown
Collaborator

@edolstra edolstra commented Apr 7, 2026

Motivation

Cherry-picks upstream NixOS#15488 to make lookupPathResolved thread-safe.

Also cherry-picks upstream ff022d8. Our positionToDocComment was already thread-safe but the upstream version is (probably) more performant.

Fixes #412.

Context

Summary by CodeRabbit

  • Performance

    • Improved concurrent access patterns for internal caching mechanisms, enhancing evaluation performance in multi-threaded scenarios.
    • Optimized path resolution caching for faster lookups.
  • Bug Fixes

    • Strengthened thread safety of internal evaluation state caches to prevent race conditions during parallel operations.

xokdvium and others added 2 commits April 7, 2026 15:55
The lookupPathResolved cache was a plain unordered_flat_map with
unsynchronized find()/emplace() calls in resolveLookupPathPath().
Multiple threads resolving lookup paths concurrently would race on
this cache.

Replace it with a concurrent_flat_map following the same pattern used
for srcToStore, importResolutionCache, and fileEvalCache.

(cherry picked from commit 04516ac)
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

This PR converts two internal caches in EvalState from non-thread-safe map types to boost::concurrent_flat_map to eliminate data races during parallel evaluation. It updates all cache accesses from manual synchronization to concurrent-safe operations, addressing crashes observed with eval-cores > 1.

Changes

Cohort / File(s) Summary
Header declarations
src/libexpr/include/nix/expr/eval.hh
Replaced positionToDocComment and lookupPathResolved members from SharedSync<unordered_flat_map> and unguarded unordered_flat_map to const ref<concurrent_flat_map> for lock-free concurrent access.
Cache access patterns
src/libexpr/eval.cc
Updated resolveLookupPathPath, parse, and getDocCommentForPos functions to use getConcurrent(), emplace(), and visit() methods instead of manual find()/end() lookups and lock-based synchronization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PR #197: Converts additional EvalState internal caches from lock-guarded maps to boost::concurrent_flat_map with similar access pattern updates throughout the codebase.

Suggested reviewers

  • cole-h

Poem

🐰 A rabbit hops through concurrent streams,
Where data races once caused crashes and screams—
Now caches dance with lock-free grace,
No threads colliding in this safer place! 🌟

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive The PR includes changes to positionToDocComment in addition to lookupPathResolved. While these are related performance/thread-safety improvements mentioned in the PR objectives, the title only references lookupPathResolved, creating a minor scope mismatch. Clarify whether the positionToDocComment changes are essential to this PR or should be split into a separate issue/PR for better change isolation and review focus.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix lookupPathResolved thread safety' clearly and concisely summarizes the main objective of the PR, which is to address thread safety issues in the lookupPathResolved cache.
Linked Issues check ✅ Passed The PR successfully addresses the primary coding requirement from issue #412: converting lookupPathResolved from a non-thread-safe boost::unordered_flat_map to boost::concurrent_flat_map, eliminating unsynchronized find()/emplace() races.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-412

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

@github-actions github-actions bot temporarily deployed to pull request April 7, 2026 14:08 Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/libexpr/eval.cc`:
- Around line 3223-3231: The current concurrent-resolve path uses
getConcurrent(*lookupPathResolved, value) and then
lookupPathResolved->emplace(std::string(value), res) inside finish, which allows
a race where an earlier finisher can insert std::nullopt and later successful
resolution is lost; change the insertion to use the concurrent-aware helper
try_emplace_or_visit() (mirroring fileEvalCache's try_emplace_and_cvisit
pattern) so that if a thread wins the slot with std::nullopt a later successful
resolution can replace or visit that entry appropriately; update the finish
lambda to call try_emplace_or_visit with the std::string(value) key and handle
the visitor callback to replace the stored std::optional<SourcePath> on success.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5db161b5-39c4-4a0b-bc03-b075f8a30305

📥 Commits

Reviewing files that changed from the base of the PR and between 6e20d9e and 985e07c.

📒 Files selected for processing (2)
  • src/libexpr/eval.cc
  • src/libexpr/include/nix/expr/eval.hh

Comment on lines +3223 to +3231
if (auto cached = getConcurrent(*lookupPathResolved, value))
return *cached;

auto finish = [&](std::optional<SourcePath> res) {
if (res)
debug("resolved search path element '%s' to '%s'", value, *res);
else
debug("failed to resolve search path element '%s'", value);
lookupPathResolved.emplace(value, res);
lookupPathResolved->emplace(std::string(value), res);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /root && find . -name "eval.cc" -path "*/libexpr/*" | head -5

Repository: DeterminateSystems/nix-src

Length of output: 125


🏁 Script executed:

# Get context around lines 3223-3231
sed -n '3220,3235p' src/libexpr/eval.cc

Repository: DeterminateSystems/nix-src

Length of output: 674


🏁 Script executed:

# Check lookupPathResolved declaration
rg "lookupPathResolved" src/libexpr/eval.cc -B 2 -A 2 | head -40

Repository: DeterminateSystems/nix-src

Length of output: 662


🏁 Script executed:

# Find fileEvalCache implementation to compare patterns
rg "fileEvalCache" src/libexpr/eval.cc -B 3 -A 3 | head -60

Repository: DeterminateSystems/nix-src

Length of output: 1326


🏁 Script executed:

# Check the concurrent_flat_map type and available methods
rg "concurrent_flat_map" src/libexpr/ -B 1 -A 1

Repository: DeterminateSystems/nix-src

Length of output: 2404


🏁 Script executed:

# Search for try_emplace_and_cvisit to understand the comparison
rg "try_emplace_and_cvisit" src/libexpr/eval.cc -B 2 -A 5

Repository: DeterminateSystems/nix-src

Length of output: 310


🏁 Script executed:

rg "emplace_or_visit|visit_or_visit" src/libexpr/ -l

Repository: DeterminateSystems/nix-src

Length of output: 89


🏁 Script executed:

# Search for Boost concurrent_flat_map method usage patterns in the codebase
rg "concurrent_flat_map" src/libexpr/ -A 5 | grep -E "->|try_emplace|emplace|visit" | head -30

Repository: DeterminateSystems/nix-src

Length of output: 180


🏁 Script executed:

# Check what methods are called on concurrent maps in general
ast-grep --pattern 'lookupPathResolved->$_'

Repository: DeterminateSystems/nix-src

Length of output: 156


🏁 Script executed:

# Check Boost documentation or comments about concurrent_flat_map methods
rg "concurrent_flat_map|try_emplace_and_cvisit" src/ --type cpp -B 3 -A 8 | head -80

Repository: DeterminateSystems/nix-src

Length of output: 5141


🌐 Web query:

Boost concurrent_flat_map emplace_or_visit method API

💡 Result:

In Boost.Unordered boost::unordered::concurrent_flat_map (header #include <boost/unordered/concurrent_flat_map.hpp>), the “emplace_or_visit” operation is provided as try_emplace_or_visit (plus a const-visitation variant try_emplace_or_cvisit). [1]

API (overloads)

template<class... Args, class F>
bool try_emplace_or_visit(const key_type& k, Args&&... args, F&& f);
template<class... Args, class F>
bool try_emplace_or_cvisit(const key_type& k, Args&&... args, F&& f);

template<class... Args, class F>
bool try_emplace_or_visit(key_type&& k, Args&&... args, F&& f);
template<class... Args, class F>
bool try_emplace_or_cvisit(key_type&& k, Args&&... args, F&& f);

template<class K, class... Args, class F>
bool try_emplace_or_visit(K&& k, Args&&... args, F&& f);
template<class K, class... Args, class F>
bool try_emplace_or_cvisit(K&& k, Args&&... args, F&& f);

[2]

Semantics

  • If key k is not present: inserts a new element constructed from k and args... (piecewise construction); no element is constructed if the key already exists. [2]
  • If key k is present: calls f(element_ref) on the existing element instead of inserting. With *_cvisit, the reference passed to f is const. [2]
  • Returns true if an insert happened; otherwise false. [2]
  • Concurrency note: this operation is blocking only on internal rehashing. [2]
  • Important restriction: a visitation function executed by a container must not call operations on that same container. [2][3]

Typical use (replacement for operator[]-style “insert or update”)

m.try_emplace_or_visit(key, /*mapped ctor args...*/ 1,
                       [](auto& kv){ ++kv.second; });

This pattern is explicitly called out as a way to replace missing operator[]/at in concurrent_flat_map. [3]


Fix negative-cache poisoning from concurrent resolution race.

This splits the miss path into getConcurrent() and emplace(). If two threads resolve the same entry concurrently and the first finisher inserts std::nullopt on line 3231, a later successful resolution is dropped and lookupPathResolved stays negative-cached for the lifetime of this EvalState. Switch to try_emplace_or_visit() so a success can replace an earlier std::nullopt, matching the pattern already used in fileEvalCache with try_emplace_and_cvisit().

Fix using try_emplace_or_visit
     auto finish = [&](std::optional<SourcePath> res) {
         if (res)
             debug("resolved search path element '%s' to '%s'", value, *res);
         else
             debug("failed to resolve search path element '%s'", value);
-        lookupPathResolved->emplace(std::string(value), res);
+        lookupPathResolved->try_emplace_or_visit(std::string(value), res, [&](auto & kv) {
+            if (res && !kv.second)
+                kv.second = res;
+            res = kv.second;
+        });
         return res;
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libexpr/eval.cc` around lines 3223 - 3231, The current concurrent-resolve
path uses getConcurrent(*lookupPathResolved, value) and then
lookupPathResolved->emplace(std::string(value), res) inside finish, which allows
a race where an earlier finisher can insert std::nullopt and later successful
resolution is lost; change the insertion to use the concurrent-aware helper
try_emplace_or_visit() (mirroring fileEvalCache's try_emplace_and_cvisit
pattern) so that if a thread wins the slot with std::nullopt a later successful
resolution can replace or visit that entry appropriately; update the finish
lambda to call try_emplace_or_visit with the std::string(value) key and handle
the visitor callback to replace the stored std::optional<SourcePath> on success.

@edolstra edolstra enabled auto-merge April 7, 2026 14:15
@edolstra edolstra added this pull request to the merge queue Apr 7, 2026
Merged via the queue into main with commit 8661fde Apr 7, 2026
29 checks passed
@edolstra edolstra deleted the fix-412 branch April 7, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel eval: data race on lookupPathResolved crashes nix eval

4 participants