Skip to content

feat(api): add singleton flag to enqueue API#1483

Merged
yohamta0 merged 4 commits into
mainfrom
max-active-run-check
Dec 14, 2025
Merged

feat(api): add singleton flag to enqueue API#1483
yohamta0 merged 4 commits into
mainfrom
max-active-run-check

Conversation

@yohamta0

@yohamta0 yohamta0 commented Dec 14, 2025

Copy link
Copy Markdown
Collaborator

cc: @ghansham, @kriyanshii

Summary by CodeRabbit

  • New Features

    • Added a singleton flag to DAG enqueue requests to prevent duplicate runs when a DAG is already running or queued.
  • API Changes

    • Enqueue endpoint now can return HTTP 409 Conflict for singleton-mode conflicts.
    • Added more specific authentication error codes.
  • Behavioral Changes

    • Enqueue/retry paths no longer perform prior maxActiveRuns queue-length enforcement unless singleton is used.
    • Queue concurrency now honors DAG-configured max active runs for DAG-based queues.
  • Tests

    • Added integration test validating max-active-runs behavior for DAG-based queues.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 14, 2025

Copy link
Copy Markdown

Walkthrough

Adds an optional singleton flag to DAG enqueue requests with server-side singleton enforcement returning HTTP 409 on conflict, removes several pre-enqueue queue-length/maxActiveRuns checks across enqueue/retry/start flows, and adjusts scheduler queue concurrency initialization for DAG-scoped queues.

Changes

Cohort / File(s) Summary
API Schema & Codegen
api/v2/api.yaml, api/v2/api.gen.go, ui/src/api/v2/schema.ts
Added optional singleton boolean to POST /dags/{fileName}/enqueue request body and a 409 Conflict response type (EnqueueDAGDAGRun409JSONResponse) for singleton violations. Updated UI schema and error codes (added auth-related error codes).
Frontend API — Singleton Enforcement
internal/service/frontend/api/v2/dags.go
Added ensureSingletonEnqueue(ctx, dag) and integrated it into the enqueue flow; when request Singleton=true returns 409 if active or queued runs exist.
Removed Queue-Length Prechecks
internal/cmd/enqueue.go, internal/cmd/retry.go, internal/cmd/start.go, internal/service/frontend/api/v2/dagruns.go
Deleted logic that counted queued + live runs and returned errors when exceeding dag.MaxActiveRuns; enqueue/retry/start flows no longer pre-validate queue length and proceed to enqueue unconditionally (subject to other validations).
Scheduler queue concurrency init
internal/service/scheduler/queue_processor.go
Added updateQueueMaxConcurrency to set non-global queue maxConcurrency from the DAG's MaxActiveRuns (uses first queued item); called before computing available slots.
Tests
internal/integration/scheduler_test.go
Added an integration test TestDAGQueueMaxActiveRunsFirstBatch (appears duplicated in patch) verifying DAG-based queues start up to maxActiveRuns in the first scheduler batch.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as EnqueueDAGDAGRun
    participant Singleton as ensureSingletonEnqueue
    participant DAG as DAG Service
    participant Runs as Runs DB
    participant Queue as Queue Service

    Client->>API: POST /dags/{name}/enqueue (singleton=true)
    API->>Singleton: ensureSingletonEnqueue(ctx, dag)
    Singleton->>DAG: fetch DAG config
    DAG-->>Singleton: DAG details (MaxActiveRuns,...)
    Singleton->>Runs: query active runs
    Runs-->>Singleton: active count
    Singleton->>Runs: query queued runs
    Runs-->>Singleton: queued count

    alt conflict found
        Singleton-->>API: return 409 Conflict
        API-->>Client: 409 Error (Error payload)
    else no conflict
        Singleton-->>API: ok
        API->>Queue: enqueue DAG run
        Queue-->>API: enqueue success
        API-->>Client: 200/202 OK
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review areas needing attention:
    • ensureSingletonEnqueue() correctness (race conditions, query semantics) and proper HTTP 409 error construction
    • Removals in enqueue.go, retry.go, start.go, dagruns.go to ensure no behavioral gaps or dangling assumptions elsewhere
    • Scheduler updateQueueMaxConcurrency integration and its effect on non-global queues
    • Duplicated integration test entry in internal/integration/scheduler_test.go

Possibly related PRs

Poem

🐰 I hopped in code to guard the run,
One at a time, no two for fun.
If another's queued, I softly say "No"—
A 409 pause, then onward we go.
Tidy lanes and quieter nights, hop-hop, delight! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding a singleton flag to the enqueue API, which aligns with the primary modifications across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch max-active-run-check

Comment @coderabbitai help to get the list of available commands and usage tips.

@ghansham

Copy link
Copy Markdown

Enqueue should consider overriden name (and parameters, if any) while taking decision rather than using actual dag filename

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ebfa3c and 582ecc6.

📒 Files selected for processing (8)
  • api/v2/api.gen.go (3 hunks)
  • api/v2/api.yaml (2 hunks)
  • internal/cmd/enqueue.go (0 hunks)
  • internal/cmd/retry.go (1 hunks)
  • internal/cmd/start.go (0 hunks)
  • internal/service/frontend/api/v2/dagruns.go (0 hunks)
  • internal/service/frontend/api/v2/dags.go (2 hunks)
  • ui/src/api/v2/schema.ts (3 hunks)
💤 Files with no reviewable changes (3)
  • internal/service/frontend/api/v2/dagruns.go
  • internal/cmd/enqueue.go
  • internal/cmd/start.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Backend entrypoint in cmd/ orchestrates the scheduler and CLI; runtime, persistence, and service layers sit under internal/* (for example internal/runtime, internal/persistence)
Keep Go files gofmt/goimports clean; use tabs, PascalCase for exported symbols (SchedulerClient), lowerCamelCase for locals, and Err... names for package-level errors
Repository linting relies on golangci-lint; prefer idiomatic Go patterns, minimal global state, and structured logging helpers in internal/common

Files:

  • internal/service/frontend/api/v2/dags.go
  • internal/cmd/retry.go
  • api/v2/api.gen.go
{api/v1,api/v2}/**/*.{proto,yaml,yml,json}

📄 CodeRabbit inference engine (AGENTS.md)

API definitions live in api/v1 and api/v2; generated server stubs land in internal/service, while matching TypeScript clients flow into ui/src/api

Files:

  • api/v2/api.yaml
ui/**/*.{ts,tsx,jsx,js}

📄 CodeRabbit inference engine (ui/CLAUDE.md)

ui/**/*.{ts,tsx,jsx,js}: Use developer-centric UI design with high information density, minimal whitespace, compact components, and no unnecessary decorations
Support both light and dark modes for all UI components using Tailwind CSS class pairs like dark:bg-slate-700
NEVER use full-page loading overlays or LoadingIndicator components that hide content - show stale data while fetching updates instead
Use compact modal design with small headers, minimal padding (p-2 or p-3), tight spacing, and support keyboard navigation (arrows, enter, escape)
Use small heights for form elements: select boxes h-7 or smaller, buttons h-7 or h-8, inputs with compact padding (py-0.5 or py-1)
Minimize row heights in tables and lists while maintaining readability, merge related columns, and always handle long text with whitespace-normal break-words
Use consistent metadata styling with bg-slate-200 dark:bg-slate-700 backgrounds and maintain text hierarchy with primary/secondary/muted text colors
Use flexbox-first layouts with min-h-0 and overflow-hidden to prevent layout breaks, account for fixed elements when setting heights
Maintain keyboard navigation support in all interactive components with appropriate focus indicators and ARIA labels

Files:

  • ui/src/api/v2/schema.ts
ui/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

ui/**/*.{ts,tsx}: The React + TypeScript frontend resides in ui/, with production bundles copied to internal/service/frontend/assets by make ui
UI code follows ESLint + Prettier (2-space indent) and Tailwind utilities; name React components in PascalCase (JobList.tsx) and hooks with use* (useJobs.ts)

Files:

  • ui/src/api/v2/schema.ts
🧬 Code graph analysis (2)
internal/cmd/retry.go (1)
internal/core/execution/proc.go (1)
  • ProcStore (8-27)
api/v2/api.gen.go (1)
api/v1/api.gen.go (1)
  • Error (343-352)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (8)
api/v2/api.yaml (1)

800-803: Singleton enforcement implementation verified and correct.

The singleton field implementation correctly prevents enqueuing when a DAG is already running or queued, returning a 409 conflict with ErrorCodeMaxRunReached:

  • ensureSingletonEnqueue() (lines 677–692 in dags.go) checks both live (running) and queued instances before allowing enqueue
  • Error messages distinguish between "already running" and "already queued" cases
  • Consistent error handling with the /dags/{fileName}/start endpoint, which uses the same isSingletonExecution() logic and error code
  • API spec description matches implementation behavior
api/v2/api.gen.go (3)

1474-1477: Singleton *bool is the right API shape for optional singleton-mode enqueue.
Using *bool + omitempty preserves “unset” vs “false”, and the comment clarifies 409 behavior.


6009-6016: 409 response wiring matches existing generated patterns (Content-Type + status + JSON body).
This should integrate cleanly with EnqueueDAGDAGRunResponseObject/strict handler responses the same way other ...409JSONResponse types do in this file.


8462-8653: swaggerSpec blob update is expected auto-generated output; do not hand-edit.
Verified that api/v2/api.yaml contains the new singleton field (lines 741, 759, 800, 818, 1238, 1264) and corresponding 409 conflict responses across multiple endpoints. The generated code correctly includes EnqueueDAGDAGRun409JSONResponse type at line 6009. Ensure CI regenerates this deterministically from api/v2/api.yaml.

internal/service/frontend/api/v2/dags.go (1)

831-835: LGTM! Clean integration of singleton mode.

The singleton check is correctly placed before the enqueue operation, allowing early rejection when conflicts are detected. The implementation follows the existing patterns in this file for optional request body fields.

ui/src/api/v2/schema.ts (3)

2469-2473: Singleton field properly defined.

The TypeScript schema correctly reflects the backend API: optional boolean field with clear documentation about 409 conflict behavior. The definition is consistent with the similar singleton field in the executeDAG operation.


2490-2498: 409 conflict response correctly added.

The conflict response properly documents singleton mode rejection when a DAG is already running or queued, maintaining consistency with the executeDAG endpoint's singleton handling.


4007-4010: Note: Additional auth-related error codes included.

The schema includes new auth-specific error codes (auth.unauthorized, auth.token_invalid, auth.forbidden) that appear unrelated to the singleton feature. Since this file is auto-generated from the OpenAPI spec, these likely reflect other API surface changes in the same generation.

Comment thread internal/cmd/retry.go
Comment thread internal/service/frontend/api/v2/dags.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/service/scheduler/queue_processor.go (1)

505-528: Consider caching the DAG read to avoid redundant I/O.

This helper reads the DAG from the first queued item. Later, processDAG (lines 405-412) reads the same DAG again when processing that item. For correctness this is fine, but it results in duplicate FindAttempt + ReadDAG calls for the first item in each batch.

A future optimization could cache the DAG reference or pass it through to avoid the redundant read.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 582ecc6 and 8fd8c78.

📒 Files selected for processing (2)
  • internal/integration/scheduler_test.go (1 hunks)
  • internal/service/scheduler/queue_processor.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Backend entrypoint in cmd/ orchestrates the scheduler and CLI; runtime, persistence, and service layers sit under internal/* (for example internal/runtime, internal/persistence)
Keep Go files gofmt/goimports clean; use tabs, PascalCase for exported symbols (SchedulerClient), lowerCamelCase for locals, and Err... names for package-level errors
Repository linting relies on golangci-lint; prefer idiomatic Go patterns, minimal global state, and structured logging helpers in internal/common

Files:

  • internal/integration/scheduler_test.go
  • internal/service/scheduler/queue_processor.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Co-locate Go tests as *_test.go; favour table-driven cases and cover failure paths
Use stretchr/testify/require and shared fixtures from internal/test instead of duplicating mocks

Files:

  • internal/integration/scheduler_test.go
🧬 Code graph analysis (2)
internal/integration/scheduler_test.go (5)
internal/test/helper.go (1)
  • WithConfigMutator (72-76)
internal/core/execution/dagrun.go (3)
  • DAGRunStore (23-47)
  • NewDAGRunAttemptOptions (106-111)
  • DAGRunRef (144-147)
internal/runtime/transform/status.go (4)
  • NewStatusBuilder (17-19)
  • WithLogFilePath (119-123)
  • WithAttemptID (43-47)
  • WithHierarchyRefs (25-30)
internal/core/status.go (1)
  • Queued (12-12)
internal/core/execution/queue.go (1)
  • QueueStore (16-35)
internal/service/scheduler/queue_processor.go (3)
internal/common/logger/context.go (2)
  • Warn (45-47)
  • Error (50-52)
internal/common/logger/tag/tag.go (1)
  • Error (20-22)
internal/core/execution/queue.go (1)
  • QueuedItemData (66-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test on ubuntu-latest
  • GitHub Check: Build
🔇 Additional comments (3)
internal/service/scheduler/queue_processor.go (1)

281-291: LGTM - Correct placement for first-batch concurrency fix.

The early update of maxConcurrency before calculating free slots ensures DAG-based queues process up to maxActiveRuns items in the first batch rather than defaulting to 1.

internal/integration/scheduler_test.go (2)

284-291: Good test coverage for the first-batch concurrency fix.

The test correctly validates that DAG-based queues initialize with the proper maxActiveRuns value before processing the first batch, which is the core behavior being fixed.


360-361: Shallow copy is intentional here.

The pattern thCopy := th followed by thCopy.Context = ctx is consistent with other tests in this file. The shallow copy is appropriate since only the Context field needs to differ for the goroutine's timeout, while other fields (stores, config) are intentionally shared.

Comment thread internal/integration/scheduler_test.go
@yohamta0 yohamta0 merged commit 9841e6e into main Dec 14, 2025
6 checks passed
@yohamta0 yohamta0 deleted the max-active-run-check branch December 14, 2025 10:05
@codecov

codecov Bot commented Dec 14, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 4.76190% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.91%. Comparing base (3ebfa3c) to head (8fd8c78).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/service/scheduler/queue_processor.go 0.00% 17 Missing and 1 partial ⚠️
internal/cmd/retry.go 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1483      +/-   ##
==========================================
- Coverage   59.92%   59.91%   -0.01%     
==========================================
  Files         195      195              
  Lines       21913    21918       +5     
==========================================
+ Hits        13131    13132       +1     
- Misses       7373     7381       +8     
+ Partials     1409     1405       -4     
Files with missing lines Coverage Δ
internal/cmd/enqueue.go 65.21% <ø> (+3.05%) ⬆️
internal/cmd/start.go 47.10% <ø> (+0.95%) ⬆️
internal/cmd/retry.go 49.61% <33.33%> (+0.36%) ⬆️
internal/service/scheduler/queue_processor.go 37.20% <0.00%> (-3.79%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ebfa3c...8fd8c78. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants