Skip to content

Streamline active job collection.#153650

Draft
nnethercote wants to merge 1 commit intorust-lang:mainfrom
nnethercote:collect_active-FIDDLING
Draft

Streamline active job collection.#153650
nnethercote wants to merge 1 commit intorust-lang:mainfrom
nnethercote:collect_active-FIDDLING

Conversation

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Mar 10, 2026

collect_active_jobs_from_all_queries takes a require_complete bool, and then some callers expect a full map result while others allow a partial map result. The end result is four possible combinations, but only three of them are used/make sense.

This commit introduces CollectActiveJobsKind, a three-value enum that describes the three sensible combinations, and rewrites collect_active_jobs_from_all_queries around it. This makes it and its call sites much clearer, and removes the weird Option<()> and Result<QueryJobMap, QueryJobMap> return types.

Other changes of note.

  • active is removed. The comment about make_frame is out of date, and create_deferred_query_stack_frame is safe to call with the query state locked.
  • When shard locking failure is allowed, collection no longer stops on the first failed shard.

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2026
@nnethercote
Copy link
Contributor Author

cc @Zalathar @Zoxc @zetanumbers

@Zalathar
Copy link
Member

Is it correct to radically change the shard locking behaviour like this?

@petrochenkov
Copy link
Contributor

Looks like this effectively reverts #148777, which was a bugfix specifically relying on calling lock_shards instead of try_lock_shards.
cc @ywxt to confirm

Having the same thing in both the Ok and Err cases was strange.

I disagree, the first thing is a good map, the second thing is an erroneous map for diagnostics.
Result here works as usual and doesn't allow forgetting about the error checking, unlike the boolean.
(Replacing Option<()> with bools seems ok.)

@nnethercote
Copy link
Contributor Author

Thanks for the info! Looks like making this a draft PR was the right decision :)

I still don't like the code structure. Even with the comments it was strange enough that I misunderstood what it was doing. I'll try to find another approach to improve that while preserving the try_lock/lock distinction.

I also wonder why a test wasn't added in #148777. My understanding of parallel front-end testing is poor, this might force me to look into it.

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 10, 2026

The testing isn't even merged yet, the relevant PR is #143953.
Zulip thread: #t-compiler/parallel-rustc > Add the parallel front-end test suite

@nnethercote
Copy link
Contributor Author

nnethercote commented Mar 10, 2026

The require_complete boolean plus the use/non-use of expect/unwrap on the result means there are four possible scenarios for this function.

// - unwrap: need the full map
// - true => lock_shards: willing to wait for locks, i.e. there might be
//   contention(?)
// - used from:
//   - depth_limit_error
collect_active_jobs_from_all_queries(tcx, true).unwrap()

// - unwrap: need the full map
// - false => try_lock_shards: not willing to wait for locks, i.e. we assume
//   there is no contention(?)
// - used from:
//   - the deadlock handler (in `rustc_in_thread_pool_with_globals`)
//   - cycle_error
collect_active_jobs_from_all_queries(tcx, false).unwrap()

// - unwrap_or_else: partial map is acceptable
// - true => lock_shards: willing to wait for locks, i.e. there might be
//   contention
// - used from:
//   - (nowhere)
collect_active_jobs_from_all_queries(tcx, true).unwrap_or_else(|m| m)

// - unwrap_or_else: partial map is acceptable
// - false => try_lock_shards: not willing to wait for locks, i.e. we assume
//   there is no contention(?)
// - used from:
//   - print_query_stack
collect_active_jobs_from_all_queries(tcx, true).unwrap_or_else(|m| m)

Only three of them are used in practice. Maybe require_complete should be replaced by a three-value enum like this:

enum CollectKind {
    Full,
    FullNoContention,
    PartialAllowed,
}

and collect_active_jobs_from_all_queries could just return the (full or partial) QueryJobMap instead of a Result.

@Zalathar
Copy link
Member

I think the bottom two cases are confused. The print_query_stack function passes false.

Also IIRC, when true is passed, the function will always return Ok, so the true/unwrap-or-else combination is unused because it's inherently useless.

@nnethercote
Copy link
Contributor Author

I think the bottom two cases are confused. The print_query_stack function passes false.

Fixed, thanks.

Also IIRC, when true is passed, the function will always return Ok, so the true/unwrap-or-else combination is unused because it's inherently useless.

Good point. All the more reason for the enum.

@ywxt
Copy link
Contributor

ywxt commented Mar 11, 2026

require_complete=true only is used in depth_limit_error, which ensures us can get the depth and root of queries from a complete job map.

Remove the require_complete parameter. We now always use try_lock_shards and let the caller abort on incompleteness

I don't think it's acceptable that the compiler panics during reporting the depth limit.

@ywxt
Copy link
Contributor

ywxt commented Mar 11, 2026

enum CollectKind {
    Full,
    FullNoContention,
    PartialAllowed,
}

that's better than a bool + Result 👍

@nnethercote
Copy link
Contributor Author

I have updated the code to use the three-value enum. I'm not 100% sure about the names, happy to hear alternative suggestions there.

@zetanumbers
Copy link
Contributor

I also wonder why a test wasn't added in #148777. My understanding of parallel front-end testing is poor, this might force me to look into it.

Tests for A-parallel-compiler bugs are nearly impossible to make consistent. They shouldn't be relied upon unless you can relatively consistently reproduce a test failure on your branch's base commit and then can no longer reproduce the test failure on the branch itself. I even have fixed bugs without being able to reproduce A-parallel-compiler issue to begin with, just by diagnosing the source problem.

@rust-bors

This comment has been minimized.

`collect_active_jobs_from_all_queries` takes a `require_complete` bool,
and then some callers `expect` a full map result while others allow a
partial map result. The end result is four possible combinations, but
only three of them are used/make sense.

This commit introduces `CollectActiveJobsKind`, a three-value enum that
describes the three sensible combinations, and rewrites
`collect_active_jobs_from_all_queries` around it. This makes it and its
call sites much clearer, and removes the weird `Option<()>` and
`Result<QueryJobMap, QueryJobMap>` return types.

Other changes of note.
- `active` is removed. The comment about `make_frame` is out of date,
  and `create_deferred_query_stack_frame` *is* safe to call with the
  query state locked.
- When shard locking failure is allowed, collection no longer stops on
  the first failed shard.
@nnethercote nnethercote force-pushed the collect_active-FIDDLING branch from 48bfc79 to 454e301 Compare March 11, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants