TaskFlow: restore managed substrate#58930
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0ede2adbb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| normalizeComparableText(task.parentFlowId) === | ||
| normalizeComparableText(params.parentFlowId), |
There was a problem hiding this comment.
Keep runId dedupe independent of parent flow linkage
Including parentFlowId in runScopeMatches breaks idempotent createQueuedTaskRun/createRunningTaskRun behavior for ACP/subagent runs. The first create call typically has no parentFlowId, then ensureSingleTaskFlow links it afterward; a retry with the same runId still arrives without parentFlowId, so it no longer matches the existing task and creates a duplicate task (and often a duplicate one-task flow). That can produce duplicate status updates, notifications, and cancellation handling for one logical run.
Useful? React with 👍 / 👎.
| restoreAttempted = true; | ||
| const restored = getFlowRegistryStore().loadSnapshot(); | ||
| flows.clear(); |
There was a problem hiding this comment.
Handle restore errors before marking flow registry initialized
restoreAttempted is flipped before loadSnapshot() and this function does not catch restore failures. If loading SQLite throws (for example from a transient permission/corruption issue), the first call fails and subsequent calls will skip restore entirely because the flag is already set, leaving the in-memory registry empty for the rest of the process and hiding persisted flows.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR restores the core flow substrate that underpins background orchestration in openclaw. It introduces a Key changes:
Findings:
Confidence Score: 5/5Safe to merge — all findings are P2; no data corruption, security bypass, or runtime error risk identified. The security boundaries (owner-scoped access, cross-owner task/flow link rejection, import boundary enforcement) are all correctly implemented and tested. The SQLite migrations are incremental and safe. The two flagged issues are both minor: stale-snapshot terminal-status check in cancelFlowById causes redundant writes but not incorrect state, and the managed-flow retry gap is a documented design boundary rather than a bug. All other code paths have corresponding test coverage. src/tasks/task-executor.ts — cancelFlowById stale terminal check and retryBlockedFlowAsQueuedTaskRun managed-flow gap.
|
c0ede2a to
7c6f535
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c6f535b84
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| tasks: refreshedTasks, | ||
| }; | ||
| } | ||
| if (isTerminalFlowStatus(flow.status)) { |
There was a problem hiding this comment.
Guard cancellation against stale flow status
cancelFlowById checks terminality using the initial flow snapshot from before child-task cancellation. If one of those tasks finishes and syncs the flow to failed/succeeded while this function is awaiting cancels, this stale check still falls through and the subsequent CAS patch overwrites the terminal status to cancelled, losing the real outcome. Re-check terminal status on the refreshed flow snapshot before applying the cancellation patch.
Useful? React with 👍 / 👎.
| const flow = getFlowById(params.flowId); | ||
| if (!flow) { |
There was a problem hiding this comment.
Revalidate blocked state before retry task creation
After resolveRetryableBlockedFlowTask confirms the flow is blocked, retryBlockedFlowTask fetches the flow again but only checks for existence. A concurrent retry/cancel can move the flow out of blocked between these two reads, and this path will still create another retry task (especially when runId is omitted), producing duplicate retries for a single blocked step. Re-check flow.status === "blocked" on the second read before calling createTaskRecord.
Useful? React with 👍 / 👎.
🔒 Aisle Security AnalysisWe found 5 potential security issue(s) in this PR:
1. 🟡 Symlink/TOCTOU risk when chmod-ing SQLite registry files derived from OPENCLAW_STATE_DIR
DescriptionThe flow registry SQLite store attempts to harden file permissions by calling
Vulnerable code: for (const suffix of FLOW_REGISTRY_SIDECAR_SUFFIXES) {
const candidate = `${pathname}${suffix}`;
if (!existsSync(candidate)) {
continue;
}
chmodSync(candidate, FLOW_REGISTRY_FILE_MODE);
}RecommendationAvoid following symlinks and remove the TOCTOU window. Recommended mitigations:
Example hardening (best-effort in Node): import { chmodSync, lstatSync, mkdirSync, realpathSync } from "node:fs";
import path from "node:path";
function safeChmodFile(candidate: string, rootDir: string, mode: number) {
const resolvedRoot = realpathSync(rootDir);
const resolvedCandidate = realpathSync(candidate);
if (!resolvedCandidate.startsWith(resolvedRoot + path.sep)) {
throw new Error("Refusing to chmod path outside state dir");
}
const st = lstatSync(candidate);
if (st.isSymbolicLink() || !st.isFile()) {
throw new Error("Refusing to chmod non-regular file");
}
chmodSync(candidate, mode);
}Also consider treating 2. 🟡 Sensitive data exposure via `openclaw flows --json` and `flows show --json` dumping full flow/task records
DescriptionThe CLI commands This is risky because these structures can contain sensitive data and identifiers, including:
In environments where CLI output is captured (CI logs, support bundles, telemetry, shell history, redirected output), this can lead to unintended disclosure. Vulnerable code: flows: flows.map((flow) => ({
...flow,
tasks: listTasksForFlowId(flow.flowId),
taskSummary: getFlowTaskSummary(flow.flowId),
})),And: JSON.stringify({ ...flow, tasks, taskSummary }, null, 2)RecommendationRedact or minimize sensitive fields in JSON output by default. Options (can be combined):
Example (default-safe): function toPublicFlow(flow: FlowRecord) {
return {
flowId: flow.flowId,
syncMode: flow.syncMode,
status: flow.status,
revision: flow.revision,
controllerId: flow.controllerId ?? null,
goal: flow.goal,
currentStep: flow.currentStep ?? null,
blockedTaskId: flow.blockedTaskId ?? null,
blockedSummary: flow.blockedSummary ?? null,
cancelRequestedAt: flow.cancelRequestedAt ?? null,
createdAt: flow.createdAt,
updatedAt: flow.updatedAt,
endedAt: flow.endedAt ?? null,
};
}
// usage
flows: flows.map((flow) => ({
...toPublicFlow(flow),
taskSummary: getFlowTaskSummary(flow.flowId),
})),If task details are required, similarly emit a minimal task projection (taskId/status/label) rather than full task records. 3. 🟡 Terminal escape sequence injection in `flows` CLI output (untrusted flow/task fields logged unsanitized)
DescriptionThe An attacker who can influence task metadata (e.g.,
Although the repo contains Vulnerable code: lines.push([
shortToken(flow.flowId).padEnd(ID_PAD),
...,
truncate(flow.controllerId ?? "n/a", CTRL_PAD).padEnd(CTRL_PAD),
counts.padEnd(14),
truncate(flow.goal, 80),
].join(" "));
...
runtime.log(`goal: ${flow.goal}`);
...
runtime.log(`- ${task.taskId} ${task.status} ${task.runId ?? "n/a"} ${task.label ?? task.task}`);RecommendationSanitize any untrusted text before rendering to the terminal. Use the existing helper Example fix: import { sanitizeTerminalText } from "../terminal/safe-text.js";
const safeGoal = sanitizeTerminalText(flow.goal);
const safeController = sanitizeTerminalText(flow.controllerId ?? "n/a");
runtime.log(`goal: ${safeGoal}`);
...
runtime.log(
`- ${sanitizeTerminalText(task.taskId)} ${task.status} ${task.runId ?? "n/a"} ` +
`${sanitizeTerminalText(task.label ?? task.task)}`,
);Optionally, update 4. 🟡 Missing owner authorization when cancelling flows (cancelFlowById used by CLI)
DescriptionThe flow cancellation API has an owner-scoped variant (
Vulnerable code: export async function cancelFlowById(params: { cfg: OpenClawConfig; flowId: string }) {
const flow = getFlowById(params.flowId);
// ... cancels tasks and updates flow status without owner check
}const flow = resolveFlowForLookupToken(opts.lookup);
const result = await cancelFlowById({ cfg: loadConfig(), flowId: flow.flowId });RecommendationEnforce owner scoping at the call boundary(s) that can be influenced by untrusted input. Option A (preferred): require export async function cancelFlowByIdForOwner(params: {
cfg: OpenClawConfig;
flowId: string;
callerOwnerKey: string;
}) {
const flow = getFlowByIdForOwner({ flowId: params.flowId, callerOwnerKey: params.callerOwnerKey });
if (!flow) return { found: false, cancelled: false, reason: "Flow not found." };
return cancelFlowById({ cfg: params.cfg, flowId: flow.flowId });
}Then update const flow = resolveFlowForLookupTokenForOwner({ token: opts.lookup, callerOwnerKey });
const result = await cancelFlowByIdForOwner({ cfg: loadConfig(), flowId: flow.flowId, callerOwnerKey });Option B: if unscoped cancellation is required for internal/admin use, make it non-exported or clearly mark it as privileged and ensure only admin-only paths can call it. 5. 🔵 Terminal/log injection in doctor workspace status recovery hints (flow IDs & blockedTaskId in notes)
Description
If an attacker can influence these identifiers (or other linked metadata persisted in the flow registry), they can embed control characters / ANSI escape sequences, resulting in terminal escape injection within the interactive doctor output.
Vulnerable code: findings.push(
`${flow.flowId}: blocked flow points at missing task ${flow.blockedTaskId}; inspect before retrying.`,
);
...
note([...suspicious.slice(0, 5), ...].join("\n"), "TaskFlow recovery");RecommendationSanitize any flow/task identifiers before embedding them in terminal notes. Example: import { sanitizeTerminalText } from "../terminal/safe-text.js";
findings.push(
`${sanitizeTerminalText(flow.flowId)}: blocked flow points at missing task ` +
`${sanitizeTerminalText(flow.blockedTaskId)}; inspect before retrying.`,
);Alternatively, enforce sanitization inside Analyzed PR: #58930 at commit Last updated on: 2026-04-02T11:06:22Z |
7c6f535 to
9c946d5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e5122946b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } catch (error) { | ||
| log.warn("Failed to create one-task flow for detached run", { | ||
| taskId: params.task.taskId, | ||
| runId: params.task.runId, | ||
| error, |
There was a problem hiding this comment.
Clean up provisional flow when task-linking throws
If createFlowForTask succeeds but linkTaskToFlowById throws (for example, a transient write failure while updating the task row), this catch path returns the original task without deleting the newly created flow. Unlike the explicit !linked/mismatch branches, this leaves an orphan task_mirrored flow with no linked task, which can pollute flow listings and cancellation/retry behavior on subsequent operations.
Useful? React with 👍 / 👎.
1e51229 to
c990938
Compare
|
Merged via squash.
Thanks @mbelinky! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c99093838f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| try { | ||
| syncFlowFromTask(next); | ||
| } catch (error) { |
There was a problem hiding this comment.
Ignore stale task updates when mirroring flow state
Every task update now calls syncFlowFromTask, even updates that only touch delivery metadata. In flows that accumulate retries (old blocked task + new queued/running task), a later update on the older task (for example setTaskRunDeliveryStatusByRunId from subagent cleanup) will re-sync the shared task_mirrored flow from that stale task and can flip the flow back to blocked/succeeded while a newer retry task is active. This produces incorrect flow status and can drive wrong cancel/retry decisions; flow sync should be gated to the newest linked task (or at least lifecycle-changing patches).
Useful? React with 👍 / 👎.
Summary
What This PR Does
TaskFlow comes back into core as the parent job record for detached work.
Tasks still do the actual work and remain the core execution substrate.
TaskFlow adds the higher-level record that can own one or more tasks, keep shared state, and be inspected or cancelled as a unit.
This PR adds:
task_mirroredvsmanagedsync modesopenclaw flowsinspection and operator recovery surfacesWhat This PR Does Not Do
Tasks vs TaskFlow
Validation
mb-server:pnpm buildmb-server: focused Vitest slice across flow/task/doctor surfaces