Skip to content

util/taskset: introduce lightweight distributed task scheduler #156578

@spilchen

Description

@spilchen

This introduces a small, self-contained utility package for managing distributed task assignment and progress tracking. The new taskset package provides a simple, deterministic way to claim, complete, and distribute work units across workers in distributed flows such as bulk merge or ingest.

This PR supports the Merge and Ingest phases of distributed merge by providing a concurrency-safe structure that coordinates which node (or processor) works on which partition of data.

This has been implemented already in the prototype fork at: jeffswenson/cockroach@feature-distributed-merge. The goal of this issue is to pull it out into a smaller PR thats easier to review.

Goal

Provide a minimal API for assigning and tracking task progress in multi-worker environments without requiring coordination through SQL or KV.

Specifically, the package should:

  1. Maintain a pool of unclaimed tasks (TaskSet).
  2. Allow workers to claim, complete, and request additional work deterministically.
  3. Split task spans dynamically to maximize utilization (e.g., “work stealing”).
  4. Operate entirely in-memory — no RPCs, no persistent state.

This is used by:

  • The merge coordinator, which distributes merge spans across nodes.
  • The ingest coordinator, which assigns SST ingest tasks across leaseholders.

Implementation Highlights

Core Types

TaskId

A small wrapper type (int64) used to identify work units.
Includes a sentinel TaskIdDone to indicate that all work is complete.

TaskSet

Manages a set of unassigned task spans and supports concurrent claims.
Implements an algorithm similar to work stealing, where workers grab new tasks by splitting the largest available range.

Key methods:

  • ClaimFirst() — claim an initial task.
  • ClaimNext(lastTask) — claim the next adjacent task or split a span if needed.
  • IsDone() — helper for determining completion.

taskSpan

Internal helper representing a contiguous span of task IDs.

Design

  • No locking is needed beyond slice modifications; each worker calls claim functions serially.
  • Task distribution favors keeping workers on contiguous work units (better cache/locality).
  • Splits the largest unassigned span in half to balance work when many tasks remain.
  • Handles both sequential and parallel claim patterns.

Tests

  • task_span_test.go ensures deterministic behavior in single-worker and multi-worker

Dependencies

  • none

Jira issue: CRDB-56067

Epic CRDB-48845

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)v26.1.0-prerelease

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions