-
Notifications
You must be signed in to change notification settings - Fork 4.1k
util/taskset: introduce lightweight distributed task scheduler #156578
Description
This introduces a small, self-contained utility package for managing distributed task assignment and progress tracking. The new taskset package provides a simple, deterministic way to claim, complete, and distribute work units across workers in distributed flows such as bulk merge or ingest.
This PR supports the Merge and Ingest phases of distributed merge by providing a concurrency-safe structure that coordinates which node (or processor) works on which partition of data.
This has been implemented already in the prototype fork at: jeffswenson/cockroach@feature-distributed-merge. The goal of this issue is to pull it out into a smaller PR thats easier to review.
Goal
Provide a minimal API for assigning and tracking task progress in multi-worker environments without requiring coordination through SQL or KV.
Specifically, the package should:
- Maintain a pool of unclaimed tasks (
TaskSet). - Allow workers to claim, complete, and request additional work deterministically.
- Split task spans dynamically to maximize utilization (e.g., “work stealing”).
- Operate entirely in-memory — no RPCs, no persistent state.
This is used by:
- The merge coordinator, which distributes merge spans across nodes.
- The ingest coordinator, which assigns SST ingest tasks across leaseholders.
Implementation Highlights
Core Types
TaskId
A small wrapper type (int64) used to identify work units.
Includes a sentinel TaskIdDone to indicate that all work is complete.
TaskSet
Manages a set of unassigned task spans and supports concurrent claims.
Implements an algorithm similar to work stealing, where workers grab new tasks by splitting the largest available range.
Key methods:
ClaimFirst()— claim an initial task.ClaimNext(lastTask)— claim the next adjacent task or split a span if needed.IsDone()— helper for determining completion.
taskSpan
Internal helper representing a contiguous span of task IDs.
Design
- No locking is needed beyond slice modifications; each worker calls claim functions serially.
- Task distribution favors keeping workers on contiguous work units (better cache/locality).
- Splits the largest unassigned span in half to balance work when many tasks remain.
- Handles both sequential and parallel claim patterns.
Tests
task_span_test.goensures deterministic behavior in single-worker and multi-worker
Dependencies
- none
Jira issue: CRDB-56067
Epic CRDB-48845