Skip to content

DurableDeferred.raceAll([await(deferred), DurableClock.sleep + fail]) hangs when sleep should fire first #6190

@schickling

Description

@schickling

Summary

Under ClusterWorkflowEngine + TestRunner, racing a never-completed DurableDeferred.await against a DurableClock.sleep-then-Effect.fail arm hangs forever. The clock branch never fires, even though it should settle the race.

This happens with both Effect.race and DurableDeferred.raceAll (the upstream-recommended pattern for mixed-durability arms), and with both the in-memory sleep path (default inMemoryThreshold: 60s) and the durable-timer path (inMemoryThreshold: Duration.zero to force engine.scheduleClock).

Versions

  • effect 3.21.1
  • @effect/cluster 0.58.1
  • @effect/workflow 0.18.0
  • @effect/platform 0.96.0
  • runtime: bun test 1.3.11

Minimal repro

import { ClusterWorkflowEngine, TestRunner } from '@effect/cluster'
import { DurableClock, DurableDeferred } from '@effect/workflow'
import * as Workflow from '@effect/workflow/Workflow'
import { describe, expect, it } from 'bun:test'
import { Duration, Effect, Layer, Schema } from 'effect'

const Ack = Schema.Struct({ ackedBy: Schema.NonEmptyTrimmedString })
const Timeout = Schema.TaggedStruct('Timeout', { waitedMs: Schema.Int })

const Wf = Workflow.make({
  name: 'RaceTimeoutRepro',
  payload: Schema.Struct({}),
  success: Ack,
  error: Timeout,
  idempotencyKey: () => 'race-timeout-repro',
})

const Sig = DurableDeferred.make('repro.ack', { success: Ack })

const body = () =>
  DurableDeferred.raceAll({
    name: 'repro.race',
    success: Ack,
    error: Timeout,
    effects: [
      DurableDeferred.await(Sig),
      Effect.gen(function* () {
        yield* DurableClock.sleep({
          name: 'timeout',
          duration: Duration.seconds(1),
          inMemoryThreshold: Duration.zero, // also tried default — same result
        })
        return yield* Effect.fail({ _tag: 'Timeout' as const, waitedMs: 1_000 })
      }),
    ],
  })

const Runtime = ClusterWorkflowEngine.layer.pipe(Layer.provide(TestRunner.layer))

it('should fail with Timeout after ~1s; instead hangs forever', async () => {
  const program = Effect.gen(function* () {
    yield* Wf.toLayer(body).pipe(Layer.launch, Effect.forkScoped)
    const exec = yield* Effect.fork(Wf.execute({}))
    return yield* exec.await
  })
  const exit = await Effect.runPromise(Effect.scoped(program.pipe(Effect.provide(Runtime))))
  expect(exit._tag).toBe('Failure') // never gets here
}, 30_000)

Expected

After ~1s the timeout arm settles and raceAll reports failure; workflow exits with Timeout.

Actual

Hangs past 30s. Tested with both in-memory and durable-clock variants; both hang.

Likely cause (speculative)

raceAll is into(Effect.raceAll(effects), deferred). DurableDeferred.await calls Workflow.suspend(instance) on first poll when the deferred is unresolved. That suspension throws out of the workflow body and likely prevents the sibling sleep arm from progressing — i.e. the workflow is left suspended awaiting the never-completed deferred, and the timeout never has a chance to settle the race.

Workarounds tried

  • Effect.race(await, sleep+fail) — hangs.
  • DurableDeferred.raceAll([await, sleep+fail]) (per recommended pattern) — hangs.
  • inMemoryThreshold: Duration.zero to force engine.scheduleClock — hangs.

If there's a canonical pattern for "workflow waits for external signal OR durable timeout, whichever first", please point me at it — happy to upstream a docs example.

Context

Third sighting in our codebase. We've now hit it in (1) early POC research, (2) an activity-mapper exercise, (3) a walking-skeleton smoke test. Strong signal worth a closer look.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions