Skip to content

feat: add simplified secret management#2153

Merged
yohamta0 merged 11 commits into
mainfrom
feat/simplified-secret-management
May 16, 2026
Merged

feat: add simplified secret management#2153
yohamta0 merged 11 commits into
mainfrom
feat/simplified-secret-management

Conversation

@yohamta0

@yohamta0 yohamta0 commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add RFC 019-backed secret registry with a Dagu-managed encrypted file store
  • add REST API and UI for managing workspace-local secret refs and write-only values
  • resolve registry ref secrets at run start and mask managed secret values

Changes

  • introduce the internal secret model, store interface, resolver, and encrypted file-backed implementation
  • wire secret store access through CLI validation/dry-run paths, runtime, server, and worker execution
  • extend OpenAPI/generated clients and add the Secrets administration page
  • simplify the secret registry shape to workspace + ref, without path, stage, tags, or display name fields

Related Issues

N/A

Validation

  • git diff --check
  • make api
  • pnpm gen:api
  • go test ./internal/secret ./internal/persis/filesecret ./internal/service/frontend/api/v1
  • go test ./internal/runtime/agent -run 'TestAgent_RegistryRefSecretResolution|TestAgent_OutputSecretMasking'\n- go test ./internal/engine ./internal/service/worker -run '^$'\n- go test ./internal/cmd -run 'TestValidate|TestDry'\n- pnpm typecheck\n- pnpm build\n- Browser smoke check at /secrets\n\n## Checklist\n- [x] Secret metadata does not expose plaintext values.\n- [x] Dagu-managed values are write-only through the UI/API.\n- [x] Secret registry identity is scoped to workspace + ref only.\n- [x] Local focused validation completed.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a complete Secrets Management API with endpoints to list, create, retrieve, update, and delete secrets.
    • Introduced workspace-scoped secret references for DAGs, allowing team-managed secret integrations.
    • Added a Secrets management page in the web UI for administrators to create, update, and rotate secrets.
    • Implemented secret versioning and status management (enable/disable).
    • Added support for multiple secret providers including Vault, Kubernetes, AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e8fa41a4-f09b-4576-9bb1-acdae7d4e0f0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements an enterprise-grade secret registry system enabling workspace-scoped team-managed secrets with encrypted file-backed storage, provider-agnostic reference resolution, REST API endpoints, and web UI. It introduces Dagu-managed secret type alongside existing external providers, with strict plaintext handling rules and provider capability contracts to prevent unintended secret reads during access checks.

Changes

Secret Registry Implementation

Layer / File(s) Summary
Secret domain model and persistence contract
internal/secret/secret.go, internal/secret/store.go, internal/secret/secret_test.go
Defines Secret struct with workspace/ref/provider metadata, status lifecycle, and audit timestamps; introduces VersionMetadata for encrypted value history; exports ProviderType and Status enums; provides Store interface contract for all storage implementations including CRUD, value versioning, and resolution.
Encrypted file-backed secret store
internal/persis/filesecret/store.go, internal/persis/filesecret/store_test.go
Implements persistent encrypted storage using JSON records with in-memory ID and workspace+ref indexes; supports atomic Create/Read/Update/Delete, encrypted value versioning with rotation tracking, and decryption on resolution; rebuilds indexes from disk on init.
In-memory test store
internal/secret/memory_store_test.go
Provides thread-safe mutex-protected test double implementing full Store interface without filesystem dependencies; used by reference resolver and API tests.
Provider capability checking
internal/cmn/secrets/resolver.go, internal/cmn/secrets/env.go, internal/cmn/secrets/file.go, internal/cmn/secrets/kubernetes.go, internal/cmn/secrets/vault.go, internal/cmn/secrets/resolver_test.go
Introduces CheckCapability contract to declare provider access-check requirements (no-fetch, metadata-only, requires-value-read, unsupported); all providers report capabilities; registry enforces capabilities before calling accessibility checks to avoid plaintext reads.
Registry reference resolver
internal/secret/resolver.go, internal/secret/resolver_test.go
Implements reference resolver validating workspace-scoped resolvability (enabled/Dagu-managed checks) and resolving registry refs through backing store; includes tests for cross-workspace isolation and disabled-secret enforcement.
DAG secret schema and validation
internal/cmn/schema/dag.schema.json, internal/core/dag.go, internal/core/spec/dag.go, internal/core/spec/builder.go, internal/core/spec/dag_test.go, internal/cmn/schema/dag_schema_test.go
Updates DAG secrets section to support both legacy provider+key and new workspace-local ref forms with mutual exclusivity validation; enforces env-var identifier naming (no DAGU_ prefix); detects collisions against DAG vars/params and reserved runtime env keys.
Agent integration
internal/runtime/agent/agent.go, internal/runtime/agent/agent_test.go
Adds SecretStore field to agent options; extends resolveSecrets to conditionally create reference resolver for registry refs when store is configured; includes test exercising registry ref resolution in command substitution.
Execution wiring across paths
internal/cmd/context.go, internal/cmd/start.go, internal/cmd/dry.go, internal/cmd/restart.go, internal/cmd/retry.go, internal/engine/run.go, internal/service/frontend/server.go, internal/service/worker/remote_handler.go, internal/service/worker/remote_handler_test.go
Creates encrypted file-backed secret store from data directory during initialization; injects store into agent options across all execution paths (start, dry-run, restart, retry, local engine, remote worker); logs warning on store creation failure but continues without store.
REST API endpoints
internal/service/frontend/api/v1/api.go, internal/service/frontend/api/v1/secrets.go, internal/service/audit/entry.go, internal/service/frontend/api/v1/secrets_test.go
Implements list/create/get/update/delete endpoints for secrets; supports status transitions (enable/disable) and value versioning; enforces workspace-scoped manage/audit permissions; validates provider types; derives provider-ref fingerprints from config key material; emits audit events with secret metadata; includes tests validating plaintext leak prevention, duplicate rejection, workspace filtering, and workspace-all rejection.
OpenAPI specification and schema
api/v1/api.yaml, ui/src/api/v1/schema.ts
Defines REST contract with /secrets collection endpoints, secret-scoped metadata/version/status endpoints; adds request/response schemas including CreateSecretRequest, UpdateSecretRequest, WriteSecretVersionRequest, SecretResponse, SecretListResponse; auto-generates TypeScript types for frontend.
Frontend UI and navigation
ui/src/App.tsx, ui/src/menu.tsx, ui/src/contexts/AuthContext.tsx, ui/src/pages/administration/index.tsx, ui/src/pages/secrets/index.tsx, ui/src/__tests__/menu.test.tsx
Adds useCanManageSecrets permission hook gating manager-or-above role; registers /secrets route with role protection; adds Secrets nav item and admin section; implements full secrets management page with list table, create/edit/rotate dialogs, status toggles, delete confirmation, and plaintext-free responses.
RFC 019 design specification
rfcs/019_enterprise_secret_management.md
Comprehensive design documenting team secret integration model, security boundaries, authorization (canAttachSecret / canUseSecret), provider capability semantics, run-scoped resolution, plaintext masking rules, audit/provenance schemas, API sketch, UI requirements, migration approach, and phased implementation plan.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • dagucloud/dagu#2006: Adds CheckCapability support to Kubernetes resolver that directly complements this PR's provider capability checking contract.
  • dagucloud/dagu#2108: Updates DAG spec secret-reference plumbing in parseSecretRefs for secretRef/ref-based secrets, directly related to this PR's DAG schema and validation changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/simplified-secret-management

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/service/worker/remote_handler.go (1)

391-414: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

agentStoresFromSnapshot does not initialize secretStore, leaving registry-ref secrets unresolvable on snapshot-path workers.

The snapshot execution path (line 570) calls agentStoresFromSnapshot, which returns an agentStoreBundle with secretStore left uninitialized. In contrast, the non-snapshot path (line 575) calls h.agentStores(ctx), which explicitly populates secretStore via filesecret.NewFromDataDir(h.config.Paths.DataDir) with a warning fallback (lines 343-346).

Since secretStore is workspace-local and not part of the snapshot payload, it must be constructed on the worker, just as all other execution paths do (engine/run.go, cmd/start.go, cmd/restart.go, cmd/retry.go, cmd/dry.go). Without it, opts.SecretStore at line 588 will be nil for snapshot-path DAG runs, causing any registry-ref secret resolution to fail.

Add context parameter and initialize secretStore in agentStoresFromSnapshot using the same pattern as agentStores():

-func (h *remoteTaskHandler) agentStoresFromSnapshot(snapshotPayload []byte) (agentStoreBundle, error) {
+func (h *remoteTaskHandler) agentStoresFromSnapshot(ctx context.Context, snapshotPayload []byte) (agentStoreBundle, error) {
 	snapshot, err := agent.UnmarshalSnapshot(snapshotPayload)
 	if err != nil {
 		return agentStoreBundle{}, err
 	}
 	if snapshot == nil {
 		return agentStoreBundle{}, fmt.Errorf("agent snapshot is empty")
 	}
 
 	stores := agent.NewSnapshotStores(snapshot)
 	if stores.ConfigStore == nil {
 		return agentStoreBundle{}, fmt.Errorf("agent snapshot is missing config")
 	}
 	if stores.ModelStore == nil {
 		return agentStoreBundle{}, fmt.Errorf("agent snapshot is missing models")
 	}
 
-	return agentStoreBundle{
+	bundle := agentStoreBundle{
 		configStore: stores.ConfigStore,
 		modelStore:  stores.ModelStore,
 		soulStore:   stores.SoulStore,
 		memoryStore: stores.MemoryStore,
-	}, nil
+	}
+	if store, err := filesecret.NewFromDataDir(h.config.Paths.DataDir); err != nil {
+		logger.Warn(ctx, "Failed to create secret store on snapshot path", tag.Error(err))
+	} else {
+		bundle.secretStore = store
+	}
+	return bundle, nil
 }

Update call site at line 570:

-agentStores, err = h.agentStoresFromSnapshot(agentSnapshot)
+agentStores, err = h.agentStoresFromSnapshot(ctx, agentSnapshot)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/worker/remote_handler.go` around lines 391 - 414,
agentStoresFromSnapshot currently omits initializing secretStore which breaks
registry-ref secret resolution for snapshot-path workers; change the signature
to accept a context (ctx) and, after creating stores :=
agent.NewSnapshotStores(snapshot), initialize secretStore the same way as in
agentStores() by calling filesecret.NewFromDataDir(h.config.Paths.DataDir)
(handle and log the error with a warning fallback if it fails) and include it in
the returned agentStoreBundle.secretStore; also update the caller site that
invokes agentStoresFromSnapshot to pass the context.
🧹 Nitpick comments (9)
internal/secret/secret.go (1)

208-222: ⚡ Quick win

Verify HMAC fingerprint behavior for empty inputs.

When providerRef is empty (line 212-214), the function returns an empty fingerprint without error. However, when the key is empty (line 209-211), it returns an error. This asymmetry may be intentional—empty ref suggests no provider connection, while empty key is a configuration error—but it should be verified.

Additionally, the HMAC construction includes null-byte separators (lines 217, 219) to prevent collision attacks where different field combinations could produce the same MAC. This is good practice.

Consider adding a test case that explicitly validates the empty providerRef behavior returns ("", nil) to document this design choice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/secret/secret.go` around lines 208 - 222, Add unit tests for
ProviderRefFingerprint to document and lock in the current asymmetrical
behavior: create a test that calls ProviderRefFingerprint with a valid non-empty
key and providerType/providerConnectionID but an empty providerRef and assert it
returns ("", nil), and a test that calls it with an empty key and non-empty
providerRef and asserts an error is returned; reference the
ProviderRefFingerprint function and its parameters (key, providerType,
providerConnectionID, providerRef) so reviewers can see the intended behavior is
explicit.
internal/persis/filesecret/store.go (2)

89-119: ⚡ Quick win

Consider logging corruption details for investigation.

During index rebuild (lines 107-111), corrupted secret files are logged as warnings but silently skipped. While this prevents startup failure, it might hide data integrity issues. Consider adding more diagnostic information to the log message, such as the specific parsing error or file size, to aid debugging.

💡 Enhanced logging for troubleshooting
 		slog.Warn("Failed to load secret file during index rebuild",
 			slog.String("file", filePath),
+			slog.Int64("size", entry.Size()),
 			slog.String("error", err.Error()))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/persis/filesecret/store.go` around lines 89 - 119, The rebuildIndex
function currently warns on failed loads but omits useful corruption details;
update the slog.Warn call inside the loop (where loadRecordFromFile returns err)
to include the actual parsing error value (err), the file size (via
os.Stat(filePath).Size()), and optionally a short hex checksum (e.g., sha256 of
the file bytes) to aid investigation; use the existing symbols
loadRecordFromFile, slog.Warn, filePath, and refKey context when adding these
fields so s.byID and s.byRef logic remains unchanged.

331-361: ⚖️ Poor tradeoff

Timestamp updates during ResolveValue may cause contention.

ResolveValue updates LastResolvedAt and UpdatedAt (lines 355-356) and writes the record back to disk (line 357) on every resolution. For frequently accessed secrets, this causes unnecessary write amplification and file-lock contention. Consider whether these timestamp updates should be:

  1. Asynchronous/batched to reduce I/O
  2. Periodic rather than on every access
  3. Optional based on configuration

This is a design tradeoff: precise access tracking vs. performance. If audit trails require exact resolution timestamps, the current approach is correct but expensive.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/persis/filesecret/store.go` around lines 331 - 361, ResolveValue
currently updates record.Secret.LastResolvedAt and UpdatedAt and calls
writeRecordToFile on every access, which causes write amplification; modify
Store.ResolveValue (and the Store struct) so timestamp writes are optional or
deferred: add a configuration flag or enum on Store (e.g.,
UpdateResolveTimestamps bool or ResolveUpdateStrategy {Immediate, Async,
Periodic, Disabled}), then in ResolveValue only update and call
writeRecordToFile when strategy == Immediate; for Async spawn a non-blocking
goroutine or enqueue an update to an internal worker that serializes writes
(ensure you avoid holding s.mu while doing async I/O), for Periodic record the
last-resolved time in-memory and flush it on a timer or during other mutations;
reference the ResolveValue method, record.Secret.LastResolvedAt/UpdatedAt, and
writeRecordToFile when implementing the conditional or deferred update handling.
internal/cmn/secrets/resolver.go (1)

154-156: 💤 Low value

Consider extracting registry ref validation to reduce duplication.

The validation logic checking that registry refs don't include provider/key/options appears in both Resolve (lines 155-156) and CheckAccessibility (lines 220-221). Consider extracting to a helper method for maintainability.

♻️ Example refactor
+func validateRegistryRef(ref core.SecretRef) error {
+	if ref.Provider != "" || ref.Key != "" || len(ref.Options) > 0 {
+		return fmt.Errorf("secret %q registry ref cannot include provider, key, or options", ref.Name)
+	}
+	return nil
+}
+
 func (r *Registry) Resolve(ctx context.Context, ref core.SecretRef) (string, error) {
 	if ref.Ref != "" {
-		if ref.Provider != "" || ref.Key != "" || len(ref.Options) > 0 {
-			return "", fmt.Errorf("secret %q registry ref cannot include provider, key, or options", ref.Name)
-		}
+		if err := validateRegistryRef(ref); err != nil {
+			return "", err
+		}

Also applies to: 220-221

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cmn/secrets/resolver.go` around lines 154 - 156, Extract the
duplicated registry ref validation into a shared helper (e.g.,
validateRegistryRef or SecretRef.ValidateRegistryRef) and call it from both
Resolve and CheckAccessibility: move the logic that checks if ref.Ref != "" then
ensures ref.Provider == "" && ref.Key == "" && len(ref.Options) == 0, returning
the same error fmt.Errorf("secret %q registry ref cannot include provider, key,
or options", ref.Name) when violated; replace the original inline checks in
Resolve and CheckAccessibility with calls to this new helper to remove
duplication and keep behavior identical.
internal/service/frontend/api/v1/secrets.go (2)

396-396: ⚖️ Poor tradeoff

Semantic coupling: secret management reuses audit permission check.

The secret management permission check uses role.CanManageAudit() (line 396), which semantically grants Manager+ roles. This works today since both audit and secrets require Manager+ access, but creates coupling between two unrelated features. If secret permissions ever need to diverge from audit permissions (e.g., a dedicated "secret manager" role), this logic will need refactoring.

Consider adding a role.CanManageSecrets() method in the domain layer that currently delegates to CanManageAudit(), making future permission changes easier to implement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/frontend/api/v1/secrets.go` at line 396, The permission
check for secret management currently calls role.CanManageAudit(), coupling
secrets to audit; add a new domain method role.CanManageSecrets() that for now
returns role.CanManageAudit() (delegate) and replace the call in
internal/service/frontend/api/v1/secrets.go (the if !role.CanManageAudit()
check) with if !role.CanManageSecrets() so future permission changes for secrets
can be implemented by changing CanManageSecrets() only.

84-106: 💤 Low value

Consider store-level pagination for large secret registries.

The current implementation fetches all secrets, filters by workspace permission, then paginates in-memory. For deployments with thousands of secrets across many workspaces, this pattern could consume significant memory and CPU. Consider adding offset/limit to the ListOptions and pushing pagination into the store layer if performance becomes a concern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/frontend/api/v1/secrets.go` around lines 84 - 106, The code
currently loads all secrets then paginates in-memory; change this to push
pagination into the store by adding Offset and Limit to secretpkg.ListOptions
and passing the request offset/limit into a.secretStore.List(ctx,
secretpkg.ListOptions{Workspace: workspaceFilter, Offset: offset, Limit:
limit}); update the secret store implementations to apply offset/limit at the
DB/query layer and (optionally) return a total count so the handler can set
total without slicing; after that remove the in-memory slicing logic around
visible (keep the permission check via a.canManageSecretWorkspace and mapping
via toSecretResponse) and rely on the store to return only the requested page.
internal/service/frontend/api/v1/secrets_test.go (1)

21-91: 🏗️ Heavy lift

Consider expanding test coverage in follow-up work.

The current tests verify critical security properties (write-only values, duplicate rejection), but several scenarios remain untested:

  • Update, Delete, Enable, Disable operations
  • Permission enforcement (workspace access, manager role requirement)
  • Workspace filtering and pagination edge cases
  • External provider secrets (non-DaguManaged flows)
  • Error handling (invalid provider type, empty values, malformed requests)

Since this is a new feature, having smoke tests for the critical security property is acceptable for initial merge. Consider adding comprehensive test coverage as the feature matures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/frontend/api/v1/secrets_test.go` around lines 21 - 91, Add
follow-up tests expanding coverage around secrets: create new test functions
alongside TestSecretsAPI_CreateWriteListDoesNotReturnPlaintext and
TestSecretsAPI_CreateRejectsDuplicateWorkspaceRef to cover UpdateSecret,
DeleteSecret, EnableSecret, DisableSecret flows (exercise api.UpdateSecret,
api.DeleteSecret, api.EnableSecret/DisableSecret), permission enforcement
(workspace access and manager role checks when calling
CreateSecret/WriteSecretVersion/ListSecrets), workspace filtering and pagination
edge cases for ListSecrets, external provider flows (use non-DaguManaged
provider types such as SecretProviderTypeExternal and provider-specific
handling), and error cases (invalid provider type, empty/malformed values) while
reusing store.ResolveValue to assert secret material is never present in API
responses; implement these as separate focused tests in
internal/service/frontend/api/v1/secrets_test.go.
ui/src/pages/secrets/index.tsx (1)

575-576: ⚡ Quick win

Use compact control heights to match the frontend density guideline.

New form/select inputs use h-9; this page should use the compact sizing pattern (h-7 or smaller for selects/inputs).

As per coding guidelines: “Use information-dense, compact UI design… small form element heights (h-7 or smaller for select boxes)”.

Also applies to: 593-594, 618-619, 636-637, 655-656, 670-671, 776-777

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ui/src/pages/secrets/index.tsx` around lines 575 - 576, Replace the oversized
form control height classes (className="h-9") in ui/src/pages/secrets/index.tsx
with the compact size (use className="h-7" or smaller) for all input/select
components to follow the dense UI guideline; search for every occurrence of
className="h-9" in this file (including the instances around the current diff
and the other reported spots) and update those input/select/select-like
components (the JSX elements that render form controls) to use h-7 so selects
and inputs render with the compact height.
ui/src/__tests__/menu.test.tsx (1)

132-144: ⚡ Quick win

Add explicit assertions for Secrets nav visibility (allowed vs denied).

The new useCanManageSecrets path is mocked but not asserted. Please add one test where it’s true (Secrets link visible) and one where it’s false (hidden).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ui/src/__tests__/menu.test.tsx` around lines 132 - 144, The test suite's
beforeEach sets useCanManageSecretsMock but there are no explicit tests
asserting the Secrets navigation visibility; add two tests in
ui/src/__tests__/menu.test.tsx that toggle useCanManageSecretsMock to true and
false and assert the Secrets link appears or is absent accordingly.
Specifically, create one test that sets
useCanManageSecretsMock.mockReturnValue(true), renders the menu (using the
existing render helper in the file), and expects a DOM node with text "Secrets"
(or the same selector used for other nav links) to be in the document; create
another test that sets useCanManageSecretsMock.mockReturnValue(false), renders
the menu, and expects that same "Secrets" node is not present. Use the existing
beforeEach and other auth mocks to keep context consistent and reference the
useCanManageSecretsMock and the menu render helper when adding the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/v1/api.yaml`:
- Around line 12369-12394: CreateSecretRequest currently allows invalid field
combinations; update the OpenAPI schema to enforce conditional validation based
on providerType (the CreateSecretRequest object and its providerType property)
so that when providerType is "dagu-managed" only value (writeOnly) is accepted
and external provider fields (providerConnectionId, providerRef) are
prohibited/omitted, and when providerType is an external provider type require
the appropriate identifiers (providerConnectionId and/or providerRef) and
disallow the write-only value; implement this using OpenAPI conditional schemas
(oneOf/anyOf with discriminators or if/then/else keyed on providerType) so
generated server/client types and validation will only allow valid combinations.
- Around line 7205-7423: The OpenAPI spec endpoints for secret operations are
missing 401/403 responses; update each operation (operationId: createSecret,
getSecret, updateSecret, deleteSecret, writeSecretVersion, disableSecret,
enableSecret) to include 401 and 403 response entries that reference the
existing Error schema (same pattern as 400/404/409 entries) so auth failures are
explicitly modeled; after updating api.yaml, run the generator (make api) to
regenerate server/client code per project guidelines.

In `@internal/core/spec/builder.go`:
- Around line 265-271: The current validation uses hasRef, hasProvider, hasKey
and only errors when both provider and key are present with ref missing; it
doesn't reject cases where ref is combined with only provider or only key.
Change the logic to require exactly one of these shapes: either ref alone
(hasRef true and both hasProvider/hasKey false) or provider+key together (hasRef
false and hasProvider && hasKey); compute a boolean like valid := (hasRef &&
!hasProvider && !hasKey) || (!hasRef && hasProvider && hasKey) and return the
core.NewValidationError (using def and def.Name) when !valid, preserving the
existing error message and use of core.NewValidationError.

In `@internal/persis/filesecret/store.go`:
- Line 190: You're calling s.GetByID(context.Background(), id) while already
holding s.mu.RLock() which causes nested read-lock acquisition because GetByID
calls s.recordByID(id); fix by avoiding the nested call—either use the
already-held read lock to call s.recordByID(id) directly and then perform the
remaining GetByID logic (decryption/validation) inline, or release the read lock
before invoking s.GetByID; update the implementation around the call site to
reference recordByID and continue the necessary processing without re-entering
the store's locking path.

In `@internal/secret/memory_store_test.go`:
- Around line 66-75: The test memory store List implementation ignores the
provided ListOptions; update memoryStoreForTest.List to read the opts parameter
(ListOptions) and apply workspace filtering: when opts.Workspace is non-empty
only include secrets whose Workspace (e.g., sec.Workspace) matches
opts.Workspace, otherwise include all secrets; keep the existing RLock/RUnlock
and return clones (sec.Clone()) of the matched secrets to preserve behavior of
the filesecret store’s workspace filtering.
- Around line 77-86: The Update method on memoryStoreForTest currently replaces
the entry in s.secrets without updating the byRef index when a secret's
Workspace or Ref changes; modify Update (memoryStoreForTest.Update) to first
load the existing secret from s.secrets, compute its old byRef key (e.g.,
old.Workspace+"|"+old.Ref), compute the new byRef key from sec, and if the keys
differ remove the old key from s.byRef and add the new key mapping to sec.ID,
then store sec.Clone() into s.secrets; keep existing ErrNotFound check and mutex
usage to ensure atomicity so GetByRef will reflect ref/workspace changes.

In `@ui/src/pages/secrets/index.tsx`:
- Around line 349-353: The icon-only row action Button (the Button with props
variant="ghost" size="icon" and disabled={actionSecretId === secret.id}) is
missing an accessible label; add an aria-label that describes the action and
includes the secret identifier (e.g., aria-label={`Actions for ${secret.name ||
secret.id}`}) so screen readers can announce it—update the Button JSX to include
this aria-label using the available secret fields.
- Around line 160-167: The SWR key for the secrets list call is missing the
workspace so workspace-scoped cache invalidation won't work; update the
useQuery('/secrets', { params: { query: { remoteNode, limit: 500 } } })
invocation to include the current workspace (e.g., add workspace: workspace or
workspaceId variable) inside params.query so the key becomes workspace-aware and
isWorkspaceScopedSWRKey can detect it.

---

Outside diff comments:
In `@internal/service/worker/remote_handler.go`:
- Around line 391-414: agentStoresFromSnapshot currently omits initializing
secretStore which breaks registry-ref secret resolution for snapshot-path
workers; change the signature to accept a context (ctx) and, after creating
stores := agent.NewSnapshotStores(snapshot), initialize secretStore the same way
as in agentStores() by calling filesecret.NewFromDataDir(h.config.Paths.DataDir)
(handle and log the error with a warning fallback if it fails) and include it in
the returned agentStoreBundle.secretStore; also update the caller site that
invokes agentStoresFromSnapshot to pass the context.

---

Nitpick comments:
In `@internal/cmn/secrets/resolver.go`:
- Around line 154-156: Extract the duplicated registry ref validation into a
shared helper (e.g., validateRegistryRef or SecretRef.ValidateRegistryRef) and
call it from both Resolve and CheckAccessibility: move the logic that checks if
ref.Ref != "" then ensures ref.Provider == "" && ref.Key == "" &&
len(ref.Options) == 0, returning the same error fmt.Errorf("secret %q registry
ref cannot include provider, key, or options", ref.Name) when violated; replace
the original inline checks in Resolve and CheckAccessibility with calls to this
new helper to remove duplication and keep behavior identical.

In `@internal/persis/filesecret/store.go`:
- Around line 89-119: The rebuildIndex function currently warns on failed loads
but omits useful corruption details; update the slog.Warn call inside the loop
(where loadRecordFromFile returns err) to include the actual parsing error value
(err), the file size (via os.Stat(filePath).Size()), and optionally a short hex
checksum (e.g., sha256 of the file bytes) to aid investigation; use the existing
symbols loadRecordFromFile, slog.Warn, filePath, and refKey context when adding
these fields so s.byID and s.byRef logic remains unchanged.
- Around line 331-361: ResolveValue currently updates
record.Secret.LastResolvedAt and UpdatedAt and calls writeRecordToFile on every
access, which causes write amplification; modify Store.ResolveValue (and the
Store struct) so timestamp writes are optional or deferred: add a configuration
flag or enum on Store (e.g., UpdateResolveTimestamps bool or
ResolveUpdateStrategy {Immediate, Async, Periodic, Disabled}), then in
ResolveValue only update and call writeRecordToFile when strategy == Immediate;
for Async spawn a non-blocking goroutine or enqueue an update to an internal
worker that serializes writes (ensure you avoid holding s.mu while doing async
I/O), for Periodic record the last-resolved time in-memory and flush it on a
timer or during other mutations; reference the ResolveValue method,
record.Secret.LastResolvedAt/UpdatedAt, and writeRecordToFile when implementing
the conditional or deferred update handling.

In `@internal/secret/secret.go`:
- Around line 208-222: Add unit tests for ProviderRefFingerprint to document and
lock in the current asymmetrical behavior: create a test that calls
ProviderRefFingerprint with a valid non-empty key and
providerType/providerConnectionID but an empty providerRef and assert it returns
("", nil), and a test that calls it with an empty key and non-empty providerRef
and asserts an error is returned; reference the ProviderRefFingerprint function
and its parameters (key, providerType, providerConnectionID, providerRef) so
reviewers can see the intended behavior is explicit.

In `@internal/service/frontend/api/v1/secrets_test.go`:
- Around line 21-91: Add follow-up tests expanding coverage around secrets:
create new test functions alongside
TestSecretsAPI_CreateWriteListDoesNotReturnPlaintext and
TestSecretsAPI_CreateRejectsDuplicateWorkspaceRef to cover UpdateSecret,
DeleteSecret, EnableSecret, DisableSecret flows (exercise api.UpdateSecret,
api.DeleteSecret, api.EnableSecret/DisableSecret), permission enforcement
(workspace access and manager role checks when calling
CreateSecret/WriteSecretVersion/ListSecrets), workspace filtering and pagination
edge cases for ListSecrets, external provider flows (use non-DaguManaged
provider types such as SecretProviderTypeExternal and provider-specific
handling), and error cases (invalid provider type, empty/malformed values) while
reusing store.ResolveValue to assert secret material is never present in API
responses; implement these as separate focused tests in
internal/service/frontend/api/v1/secrets_test.go.

In `@internal/service/frontend/api/v1/secrets.go`:
- Line 396: The permission check for secret management currently calls
role.CanManageAudit(), coupling secrets to audit; add a new domain method
role.CanManageSecrets() that for now returns role.CanManageAudit() (delegate)
and replace the call in internal/service/frontend/api/v1/secrets.go (the if
!role.CanManageAudit() check) with if !role.CanManageSecrets() so future
permission changes for secrets can be implemented by changing CanManageSecrets()
only.
- Around line 84-106: The code currently loads all secrets then paginates
in-memory; change this to push pagination into the store by adding Offset and
Limit to secretpkg.ListOptions and passing the request offset/limit into
a.secretStore.List(ctx, secretpkg.ListOptions{Workspace: workspaceFilter,
Offset: offset, Limit: limit}); update the secret store implementations to apply
offset/limit at the DB/query layer and (optionally) return a total count so the
handler can set total without slicing; after that remove the in-memory slicing
logic around visible (keep the permission check via a.canManageSecretWorkspace
and mapping via toSecretResponse) and rely on the store to return only the
requested page.

In `@ui/src/__tests__/menu.test.tsx`:
- Around line 132-144: The test suite's beforeEach sets useCanManageSecretsMock
but there are no explicit tests asserting the Secrets navigation visibility; add
two tests in ui/src/__tests__/menu.test.tsx that toggle useCanManageSecretsMock
to true and false and assert the Secrets link appears or is absent accordingly.
Specifically, create one test that sets
useCanManageSecretsMock.mockReturnValue(true), renders the menu (using the
existing render helper in the file), and expects a DOM node with text "Secrets"
(or the same selector used for other nav links) to be in the document; create
another test that sets useCanManageSecretsMock.mockReturnValue(false), renders
the menu, and expects that same "Secrets" node is not present. Use the existing
beforeEach and other auth mocks to keep context consistent and reference the
useCanManageSecretsMock and the menu render helper when adding the tests.

In `@ui/src/pages/secrets/index.tsx`:
- Around line 575-576: Replace the oversized form control height classes
(className="h-9") in ui/src/pages/secrets/index.tsx with the compact size (use
className="h-7" or smaller) for all input/select components to follow the dense
UI guideline; search for every occurrence of className="h-9" in this file
(including the instances around the current diff and the other reported spots)
and update those input/select/select-like components (the JSX elements that
render form controls) to use h-7 so selects and inputs render with the compact
height.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bdc10bb8-a19f-42cd-b1fe-f2d7448b3a11

📥 Commits

Reviewing files that changed from the base of the PR and between 4638232 and a3ebedd.

📒 Files selected for processing (45)
  • api/v1/api.gen.go
  • api/v1/api.yaml
  • internal/cmd/context.go
  • internal/cmd/dry.go
  • internal/cmd/restart.go
  • internal/cmd/retry.go
  • internal/cmd/start.go
  • internal/cmn/schema/dag.schema.json
  • internal/cmn/schema/dag_schema_test.go
  • internal/cmn/secrets/env.go
  • internal/cmn/secrets/file.go
  • internal/cmn/secrets/kubernetes.go
  • internal/cmn/secrets/resolver.go
  • internal/cmn/secrets/resolver_test.go
  • internal/cmn/secrets/vault.go
  • internal/core/dag.go
  • internal/core/spec/builder.go
  • internal/core/spec/dag.go
  • internal/core/spec/dag_test.go
  • internal/engine/run.go
  • internal/intg/secret_test.go
  • internal/persis/filesecret/store.go
  • internal/persis/filesecret/store_test.go
  • internal/runtime/agent/agent.go
  • internal/runtime/agent/agent_test.go
  • internal/secret/memory_store_test.go
  • internal/secret/resolver.go
  • internal/secret/resolver_test.go
  • internal/secret/secret.go
  • internal/secret/secret_test.go
  • internal/secret/store.go
  • internal/service/audit/entry.go
  • internal/service/frontend/api/v1/api.go
  • internal/service/frontend/api/v1/secrets.go
  • internal/service/frontend/api/v1/secrets_test.go
  • internal/service/frontend/server.go
  • internal/service/worker/remote_handler.go
  • rfcs/019_enterprise_secret_management.md
  • ui/src/App.tsx
  • ui/src/__tests__/menu.test.tsx
  • ui/src/api/v1/schema.ts
  • ui/src/contexts/AuthContext.tsx
  • ui/src/menu.tsx
  • ui/src/pages/administration/index.tsx
  • ui/src/pages/secrets/index.tsx

Comment thread api/v1/api.yaml
Comment thread api/v1/api.yaml
Comment on lines +12369 to +12394
CreateSecretRequest:
type: object
required:
- ref
- providerType
properties:
workspace:
type: string
description: "Workspace selector. Omit or use default for the unlabelled default scope."
ref:
type: string
description: "Secret ref used from DAG YAML, for example prod/db-password."
pattern: "^[a-z0-9][a-z0-9-]*(/[a-z0-9][a-z0-9-]*)*$"
description:
type: string
providerType:
$ref: "#/components/schemas/SecretProviderType"
providerConnectionId:
type: string
providerRef:
type: string
value:
type: string
writeOnly: true
description: "Initial Dagu-managed value. Write-only; never returned by the API."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Constrain CreateSecretRequest by provider type.

This schema currently accepts impossible combinations, such as providerType: dagu-managed with external-provider fields, or an external provider without the identifiers it needs. That pushes basic contract validation to runtime and makes the generated request types advertise invalid payloads as valid.

As per coding guidelines, "Generate REST API server code from OpenAPI spec at api/v1/api.yaml using oapi-codegen via make api."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/v1/api.yaml` around lines 12369 - 12394, CreateSecretRequest currently
allows invalid field combinations; update the OpenAPI schema to enforce
conditional validation based on providerType (the CreateSecretRequest object and
its providerType property) so that when providerType is "dagu-managed" only
value (writeOnly) is accepted and external provider fields
(providerConnectionId, providerRef) are prohibited/omitted, and when
providerType is an external provider type require the appropriate identifiers
(providerConnectionId and/or providerRef) and disallow the write-only value;
implement this using OpenAPI conditional schemas (oneOf/anyOf with
discriminators or if/then/else keyed on providerType) so generated server/client
types and validation will only allow valid combinations.

Comment thread internal/core/spec/builder.go
Comment thread internal/persis/filesecret/store.go Outdated
Comment thread internal/secret/memory_store_test.go Outdated
Comment thread internal/secret/memory_store_test.go
Comment thread ui/src/pages/secrets/index.tsx
Comment thread ui/src/pages/secrets/index.tsx
@yohamta0

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
internal/core/spec/builder.go (1)

212-230: ⚖️ Poor tradeoff

Consider adding a test to enforce synchronization between this list and internal/core/exec constants.

The reserved names list is currently in sync with internal/core/exec/env.go, but there's no enforcement mechanism to prevent divergence if env keys are added, removed, or renamed in the future. A test that validates this list against the actual constants in internal/core/exec would prevent collision bugs from silent mismatches.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/spec/builder.go` around lines 212 - 230, Add a unit test that
fails if the reservedSecretEnvNames slice diverges from the env key constants
defined in the internal/core/exec package: write a test (e.g.,
TestReservedSecretEnvNamesMatchExecConstants) that imports internal/core/exec,
collects the exported env key constants from that package into a set, and
compares that set to the reservedSecretEnvNames slice; if there is any missing
or extra name, the test should fail with a clear message. Ensure the test
references the reservedSecretEnvNames symbol and the exec package constants so
future edits to either will be caught.
internal/service/frontend/api/v1/secrets.go (2)

383-395: ⚡ Quick win

Clarify why secret management requires audit permission.

The method name CanManageAudit() suggests it checks audit log management capability, but here it gates access to secret management operations. Either add a comment explaining this permission mapping or introduce a more semantically appropriate check like role.CanManageSecrets() if available.

💡 Suggested comment to add
 func (a *API) requireSecretManageForWorkspace(ctx context.Context, workspaceName string) error {
 	role, ok, err := a.effectiveRoleForWorkspace(ctx, workspaceName)
 	if err != nil {
 		return err
 	}
 	if !ok {
 		return workspaceResourceNotFound()
 	}
+	// Secret management requires audit management capability (manager role).
 	if !role.CanManageAudit() {
 		return errInsufficientPermissions
 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/frontend/api/v1/secrets.go` around lines 383 - 395, The
permission check in requireSecretManageForWorkspace uses role.CanManageAudit(),
which is confusing because it suggests audit-log permissions are being used to
gate secret operations; either replace that call with a semantically correct
permission check (e.g., role.CanManageSecrets()) if such a method exists on the
Role type, or if business logic intentionally reuses the audit permission to
control secret management, add a concise comment above
requireSecretManageForWorkspace explaining that mapping and why CanManageAudit()
is the authoritative gate for secret management to avoid confusion for future
readers.

64-107: ⚡ Quick win

Potentially redundant permission filter in ListSecrets.

Line 73 validates that the user can manage the workspace specified in workspaceFilter, and line 82 lists secrets scoped to that workspace. The subsequent per-secret permission check on lines 88-93 should always succeed since all returned secrets belong to the already-validated workspace. Consider removing the redundant filter or adding a comment if this is intentional defensive programming.

♻️ Option 1: Remove redundant filter
 	secrets, err := a.secretStore.List(ctx, secretpkg.ListOptions{Workspace: workspaceFilter})
 	if err != nil {
 		return nil, fmt.Errorf("failed to list secrets: %w", err)
 	}

-	visible := make([]api.SecretResponse, 0, len(secrets))
+	visible := make([]api.SecretResponse, len(secrets))
 	for _, sec := range secrets {
-		if !a.canManageSecretWorkspace(ctx, sec.Workspace) {
-			continue
-		}
-		visible = append(visible, toSecretResponse(sec))
+		visible[i] = toSecretResponse(sec)
 	}

 	total := len(visible)

Option 2: Add clarifying comment if intentional

 	secrets, err := a.secretStore.List(ctx, secretpkg.ListOptions{Workspace: workspaceFilter})
 	if err != nil {
 		return nil, fmt.Errorf("failed to list secrets: %w", err)
 	}

+	// Defensive filter: although workspaceFilter was already validated,
+	// re-check permissions for each secret to guard against store bugs.
 	visible := make([]api.SecretResponse, 0, len(secrets))
 	for _, sec := range secrets {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/frontend/api/v1/secrets.go` around lines 64 - 107,
ListSecrets performs requireSecretManageForWorkspace after
parseSecretWorkspaceSelector and then calls a.secretStore.List scoped by that
workspace, so the per-secret permission check using a.canManageSecretWorkspace
inside the loop is redundant; either remove that check and directly append to
visible using toSecretResponse(secrets...), or if you intended defensive
programming, keep the check but add a clarifying comment above the loop
explaining why per-secret permission re-check is required, referencing the
ListSecrets method and the requireSecretManageForWorkspace, secretStore.List,
and canManageSecretWorkspace symbols so future readers understand the intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/core/spec/builder.go`:
- Around line 212-230: Add a unit test that fails if the reservedSecretEnvNames
slice diverges from the env key constants defined in the internal/core/exec
package: write a test (e.g., TestReservedSecretEnvNamesMatchExecConstants) that
imports internal/core/exec, collects the exported env key constants from that
package into a set, and compares that set to the reservedSecretEnvNames slice;
if there is any missing or extra name, the test should fail with a clear
message. Ensure the test references the reservedSecretEnvNames symbol and the
exec package constants so future edits to either will be caught.

In `@internal/service/frontend/api/v1/secrets.go`:
- Around line 383-395: The permission check in requireSecretManageForWorkspace
uses role.CanManageAudit(), which is confusing because it suggests audit-log
permissions are being used to gate secret operations; either replace that call
with a semantically correct permission check (e.g., role.CanManageSecrets()) if
such a method exists on the Role type, or if business logic intentionally reuses
the audit permission to control secret management, add a concise comment above
requireSecretManageForWorkspace explaining that mapping and why CanManageAudit()
is the authoritative gate for secret management to avoid confusion for future
readers.
- Around line 64-107: ListSecrets performs requireSecretManageForWorkspace after
parseSecretWorkspaceSelector and then calls a.secretStore.List scoped by that
workspace, so the per-secret permission check using a.canManageSecretWorkspace
inside the loop is redundant; either remove that check and directly append to
visible using toSecretResponse(secrets...), or if you intended defensive
programming, keep the check but add a clarifying comment above the loop
explaining why per-secret permission re-check is required, referencing the
ListSecrets method and the requireSecretManageForWorkspace, secretStore.List,
and canManageSecretWorkspace symbols so future readers understand the intent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 373ccb10-f387-40ad-bcfd-21d86401aeba

📥 Commits

Reviewing files that changed from the base of the PR and between a3ebedd and da3c64b.

📒 Files selected for processing (15)
  • api/v1/api.gen.go
  • api/v1/api.yaml
  • internal/cmn/secrets/resolver.go
  • internal/core/spec/builder.go
  • internal/core/spec/dag_test.go
  • internal/persis/filesecret/store.go
  • internal/secret/memory_store_test.go
  • internal/secret/resolver_test.go
  • internal/secret/secret.go
  • internal/service/frontend/api/v1/secrets.go
  • internal/service/frontend/api/v1/secrets_test.go
  • internal/service/worker/remote_handler.go
  • internal/service/worker/remote_handler_test.go
  • ui/src/api/v1/schema.ts
  • ui/src/pages/secrets/index.tsx
✅ Files skipped from review due to trivial changes (1)
  • ui/src/api/v1/schema.ts
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/secret/secret.go
  • api/v1/api.yaml
  • ui/src/pages/secrets/index.tsx
  • internal/service/worker/remote_handler.go
  • internal/cmn/secrets/resolver.go
  • internal/persis/filesecret/store.go

@yohamta0 yohamta0 merged commit 49ab4a8 into main May 16, 2026
11 checks passed
@yohamta0 yohamta0 deleted the feat/simplified-secret-management branch May 16, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant