Skip to content

mksglu/workflais

Repository files navigation

workflais

Declarative workflow primitives for Cloudflare Workflows.

npm install workflais

Native CF Workflows vs workflais

Native CF Workflows workflais
Step definition step.do("name", { retries: { limit: 3, delay: "10s", backoff: "exponential" }, timeout: "30s" }, async () => {...}) step("name", fn).retry(3).timeout("30s")
Result chaining Manual variables between steps Automatic ctx.prev pipeline
Saga compensation Manual implementation .compensate(fn) with automatic LIFO rollback via step.do("⟲ name")
Parallel execution Manual self-spawn + waitForEvent orchestration parallel(step1, step2, step3)
waitForEvent Imperative step.waitForEvent() call waitForEvent("name", opts) in pipeline with ctx.prev
Compile-time validation Runtime errors only Duplicate step names, step count limits, timeout limits, event type validation
Error types Generic errors NonRetryableError, TimeoutTooLongError, DuplicateStepNameError, etc.

Quick Start

import { step, compile, execute } from "workflais";
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from "cloudflare:workers";
import type { WorkflowStep as WfStep, WorkflowEvent as WfEvent } from "workflais";

export class MyWorkflow extends WorkflowEntrypoint {
  async run(event: WorkflowEvent, cfStep: WorkflowStep) {
    const plan = compile([
      step("fetch-data", async (ctx) => {
        return { userId: ctx.event.payload.userId, name: "Alice" };
      }),

      step("process", async (ctx) => {
        const data = ctx.prev; // automatic chaining
        return { ...data, processed: true };
      })
        .retry(3, "1 minute")
        .timeout("30 seconds"),

      step("save", async (ctx) => {
        return { ...ctx.prev, saved: true };
      }).compensate(async () => {
        // runs automatically on failure, wrapped in step.do for CF durability
      }),
    ]);

    return execute(plan, cfStep as unknown as WfStep, event as unknown as WfEvent, this.env);
  }
}

API

DSL

step(name, fn)                    // durable step
  .retry(limit, delay?)           // retry config (default: exponential backoff)
  .timeout(duration)              // step timeout (max 30 min)
  .compensate(fn)                 // saga rollback handler

parallel(step1, step2, ...)       // fan-out/fan-in execution
waitForEvent(name, { type, timeout })  // pause for external event

compile(nodes)                    // validate + build execution plan
execute(plan, step, event, env)   // run against CF Workflows runtime

Context

Every step callback receives ctx:

  • ctx.prev — previous step's return value (or undefined for the first step)
  • ctx.event — workflow event (frozen, immutable)
  • ctx.env — CF bindings

After parallel(), ctx.prev is a tuple of results in declaration order.

Examples

Each example is a standalone, deploy-ready CF Workers project. Pick one and run:

cd examples/ecommerce-checkout
npm install
npx wrangler dev

Then test it:

# Start a checkout workflow
curl -X POST http://localhost:8787/checkout \
  -H "Content-Type: application/json" \
  -d '{"cartId": "cart-42"}'

# Check status
curl http://localhost:8787/status?id=<instanceId>
Example Pattern Test command
ecommerce-checkout Cart → Payment → Invoice curl -X POST localhost:8787/checkout -d '{"cartId":"42"}'
user-onboarding Saga compensation curl -X POST localhost:8787/onboard -d '{"email":"a@b.com"}'
image-tagging Human-in-the-loop curl -X POST localhost:8787/upload -d '{"imageKey":"photo.jpg"}'
parallel-fan-out Parallel fan-out/fan-in (child DO isolation) curl -X POST localhost:8787/notify -d '{"userId":"u1","message":"hi"}'

All examples include console.log at every step — use npx wrangler tail to see execution flow in real time. Hit GET / on any example to see available endpoints.

parallel-fan-out is the only example that demonstrates child workflow DO isolation. Each parallel() branch spawns a separate Durable Object with its own 128 MB memory, CPU budget, and retry policy. See the Resource Isolation Problem section for why this matters.

Why Child Workflows? The Resource Isolation Problem

CF Workflows runs each instance inside a single Durable Object. A DO has hard limits:

Resource Limit
Memory 128 MB per DO
CPU 5 min per invocation
Retry budget Shared across the entire instance

If you run three heavy steps with Promise.all inside one DO, you get:

┌─ Single Durable Object (128 MB shared) ──────────────┐
│  Promise.all([                                        │
│    mlInference(),    ← 80 MB   ← 3 min CPU           │
│    imageProcess(),   ← 60 MB   ← 2 min CPU           │
│    videoTranscode(), ← 50 MB   ← 4 min CPU           │
│  ])                                                   │
│  Total: 190 MB → OOM CRASH                            │
│  Total CPU: 9 min → TIMEOUT                           │
│  If imageProcess fails → all three die                │
└───────────────────────────────────────────────────────┘

The problem is threefold:

  1. Memory — All branches share 128 MB. Two 80 MB allocations = OOM crash, killing the entire workflow.
  2. CPU — All branches share 5 min. Three 2-minute tasks = timeout, even though each one is well under the limit.
  3. Blast radius — One branch throwing an unhandled error kills Promise.all, terminating siblings mid-execution. No partial results, no independent retry.

The Solution: Child Workflow Spawning

parallel() compiles to the self-spawn pattern — each branch becomes a separate workflow instance running in its own DO:

Parent DO                          Child DO #1           Child DO #2           Child DO #3
──────────                        ───────────           ───────────           ───────────
step.do("⊕ spawn") ─────────►   128 MB own memory     128 MB own memory     128 MB own memory
  binding.create(child1)          5 min own CPU         5 min own CPU         5 min own CPU
  binding.create(child2)          own retry budget      own retry budget      own retry budget
  binding.create(child3)
                                  step.do("ml", fn)     step.do("img", fn)    step.do("vid", fn)
waitForEvent("ml:cb")  ◄─ $0 ─  sendEvent(result)
waitForEvent("img:cb") ◄─ $0 ─                        sendEvent(result)
waitForEvent("vid:cb") ◄─ $0 ─                                              sendEvent(result)

ctx.prev = [mlResult, imgResult, vidResult]  // tuple in declaration order
Promise.all (single DO) parallel() (child DOs)
Memory 128 MB shared 128 MB each
CPU 5 min shared 5 min each
Retry All-or-nothing Per-branch
Failure One kills all Isolated
Parent cost while waiting N/A $0 (hibernated)
// workflais — each branch gets its own DO
parallel(
  step("ml-inference", mlFn).retry(5, "exponential").timeout("25m"),
  step("image-process", imgFn).retry(3).timeout("10m"),
  step("video-transcode", vidFn).retry(2).timeout("20m"),
)

The parent spawns all children in a single step.do, then hibernates via waitForEvent. Zero CPU, zero memory, zero cost. When all children report back, the parent wakes up with ctx.prev = [result1, result2, result3].

If any branch fails after the parallel group completes, workflais runs .compensate() for every child in the group — each compensation wrapped in its own step.do for CF-durable retry.

Production Verification

The parallel-fan-out example is deployed and verified on Cloudflare Workers.

Result:

{
  "status": "complete",
  "output": {
    "notified": true,
    "channelCount": 3,
    "results": [
      { "channel": "email", "sent": true, "to": "user@example.com" },
      { "channel": "sms",   "sent": true, "to": "+1234567890" },
      { "channel": "crm",   "updated": true, "userId": "u1" }
    ]
  }
}

Runtime Metrics (wrangler tail)

Metric HTTP Trigger Workflow DO (parent)
executionModel stateless stateless
wallTime 517ms ~3s (complete)
cpuTime 0 0
outcome ok ok

Key observations:

  • Parent hibernation confirmedcpuTime: 0 while wallTime: 300168ms (5 min) on a stale instance proves the parent DO genuinely hibernates at $0 cost during waitForEvent
  • Each step is a separate DO invocationexecutionModel: "stateless" on every log entry confirms CF treats each step as an independent call
  • Results arrive in declaration order[email, sms, crm] tuple matches the parallel() child order, regardless of which child finishes first

How It Works

step("a", fn).retry(3)  →  compile([...])  →  execute(plan, cfStep, event, env)
     DSL                     Validation          CF step.do() / waitForEvent()
  1. DSL — Declarative step definitions with chainable config
  2. Compiler — Validates names, limits, timeouts; builds execution plan
  3. Runtime — Translates plan into CF Workflows API calls with saga compensation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors