Skip to content

fix(dsl): use workflow-safe logging to avoid temporal deadlocks#2170

Merged
daryllimyt merged 4 commits intomainfrom
fix/workflow-deadlock-logging
Feb 25, 2026
Merged

fix(dsl): use workflow-safe logging to avoid temporal deadlocks#2170
daryllimyt merged 4 commits intomainfrom
fix/workflow-deadlock-logging

Conversation

@daryllimyt
Copy link
Contributor

@daryllimyt daryllimyt commented Feb 24, 2026

Checklist

  • Read CONTRIBUTING.md.
  • PR title is short and non-generic (see previously merged PRs for examples).
  • PR only implements a single feature or fixes a single bug.
  • Tests passing (uv run pytest tests)?
  • Lint / pre-commits passing (pre-commit run --all-files)?

Description

This PR fixes TMPRL1101 workflow deadlock risk caused by using process Loguru logger inside Temporal workflow threads.

Changes:

  • Add WorkflowRuntimeLogger adapter in tracecat/dsl/workflow_logging.py.
  • Route DSL workflow/scheduler logs through Temporal workflow logger when in workflow context.
  • Keep process logger fallback for non-workflow contexts (unit tests/local usage).
  • Reduce high-frequency workflow/scheduler log volume (info -> debug) in hot paths.
  • Add unit tests for workflow logging adapter behavior.
  • Add deterministic local repro harness: scripts/temporal/repro_workflow_logging_deadlock.py.
  • Remove unused workflow-side ctx_logger/process_logger wiring from DSLWorkflow.

Related Issues

N/A

Screenshots / Recordings

N/A

Steps to QA

  1. Run targeted checks:
    • uv run ruff check tracecat/dsl/workflow.py tracecat/dsl/scheduler.py tracecat/dsl/workflow_logging.py tests/unit/test_dsl_workflow_logging.py scripts/temporal/repro_workflow_logging_deadlock.py
    • uv run basedpyright tracecat/dsl/workflow.py tracecat/dsl/scheduler.py tracecat/dsl/workflow_logging.py tests/unit/test_dsl_workflow_logging.py scripts/temporal/repro_workflow_logging_deadlock.py
  2. Run unit tests:
    • TRACECAT__SERVICE_KEY=dummy uv run pytest tests/unit/test_dsl_workflow_logging.py
    • TRACECAT__SERVICE_KEY=dummy uv run pytest tests/unit/test_schedule_role_healing.py
    • TRACECAT__SERVICE_KEY=dummy uv run pytest tests/unit/test_dsl_scheduler_determinism.py
  3. Run deadlock repro harness:
    • uv run python scripts/temporal/repro_workflow_logging_deadlock.py --mode both --block-seconds 2.5
  4. Verify expected repro output:
    • Legacy path reports TMPRL1101 in workflow task history.
    • Safe path completes with no TMPRL1101.

Summary by cubic

Fixes Temporal workflow deadlocks (TMPRL1101) by switching DSL logs to a workflow‑safe logger, deferring formatting with lazy callables, and gating formatting behind level checks to prevent blocking in workflow threads. Adds tests and a hardened local repro script to reliably reproduce and validate the fix.

  • Bug Fixes

    • Added WorkflowRuntimeLogger and get_workflow_logger to use Temporal’s logger in workflow context with a process logger fallback; maps trace to debug, skips field formatting when the level is disabled, and uses Loguru lazy callables to defer message/field formatting.
    • Routed DSL workflow and Scheduler logs through the workflow‑safe logger; added optional logger injection; lowered hot‑path info logs to debug and adjusted warnings where appropriate.
    • Hardened repro harness: checks workflow history for TMPRL1101, installs/removes a blocking sink safely, supports mode/block‑seconds flags; expanded tests for context detection, trace mapping, safe repr, disabled‑level gating, and lazy callables.
  • Refactors

    • Removed installed site‑packages tarball build; always build the venv tarball from the builtin source path, and deleted the TRACECAT__REGISTRY_SYNC_BUILTIN_USE_INSTALLED_SITE_PACKAGES flag.
    • Removed unused ctx_logger wiring from DSLWorkflow and routed logging through the new workflow‑safe adapter.

Written for commit b04e23d. Summary will update on new commits.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc71868bc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/registry/sync/tarball.py">

<violation number="1" location="tracecat/registry/sync/tarball.py:222">
P2: The new uncompressed_size calculation performs synchronous filesystem traversal/stat calls inside an async function, which can block the event loop for large site-packages trees. Offload this calculation to a thread, similar to the tarball creation and hashing steps.

(Based on your team's feedback about offloading heavy sync file I/O in async functions.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@daryllimyt daryllimyt force-pushed the fix/workflow-deadlock-logging branch from 279d6ce to 197a1cc Compare February 24, 2026 18:13
@daryllimyt
Copy link
Contributor Author

@cubic rereview

@daryllimyt
Copy link
Contributor Author

@codex rereview

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 24, 2026

@cubic rereview

@daryllimyt I have started the AI code review. It will take a few minutes to complete.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/dsl/workflow_logging.py">

<violation number="1" location="tracecat/dsl/workflow_logging.py:74">
P2: Process logging loses structured fields by flattening them into the message string. Loguru supports structured fields via bind/kwargs, so this adapter should pass fields to the process logger instead of discarding them.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@blacksmith-sh

This comment has been minimized.

@daryllimyt daryllimyt merged commit 7e40a39 into main Feb 25, 2026
17 checks passed
@daryllimyt daryllimyt deleted the fix/workflow-deadlock-logging branch February 25, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant