Skip to content

[Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256

@atlas-apex

Description

@atlas-apex

User Story

As an adopter mapping a business process that already exists in code (state machines, job chains, multi-step API flows — often spanning multiple repos in a microservice setup), I want a /process skill that first reads the codebase(s) to draft a candidate process model, then interviews me only on the gaps and ambiguities, then emits a lint-clean BPMN 2.0 file — so the BPMN reflects what's actually in production, not what I remembered while typing answers.

Acceptance Criteria

Discovery (read first, ask later — mirrors /extract-features pattern)

  • Skill scans across seven process-discovery axes and reports what it found before any question is asked:
    1. Explicit workflow definitions — XState machines (.machine.ts, createMachine(), Temporal workflows (@workflow decorators), Step Functions (*.asl.json), Cadence/Camunda workflow files (*.bpmn, *.cmmn)
    2. Queue/job orchestration — BullMQ flows, Celery chains/chords/groups, Sidekiq workflows, Resque pipelines, Bull queue .add() chains
    3. Cron + scheduled triggerscron definitions, GitHub Actions schedule:, Vercel cron, scheduled Lambda EventBridge rules — including what handler they fire and what downstream the handler dispatches
    4. State-column transitions — DB columns named status, state, phase, etc.; grep for the literal value set + the service-layer code that writes each transition
    5. API choreography — endpoint A emits an event/queues a job that endpoint B handles; trace the chain across route files
    6. Existing BPMN/diagram files.bpmn, .cmmn, docs/processes/*.md Mermaid sequence diagrams; use as starting state, don't overwrite blindly
    7. Documented process steps — README sections, docs/ files with headings like "Onboarding Flow", numbered-step lists
  • Output of discovery: structured candidate model — [{step_id, label, type (task/gateway/event), source_evidence: <file:line>, downstream_steps: [...]}] — printed for operator review before BPMN is generated
  • Discovery is read-only; no files written until operator approves the candidate

Scoping (an anchor is required — no exhaustive scans)

  • Skill requires an anchor before scanning: either a process slug + short description ("onboarding flow — signup through email verify through profile complete") OR an explicit entry point (--from-endpoint POST /signup, --from-machine OnboardingMachine, --from-job ProcessOrderJob, --scope src/onboarding/). Multiple anchors can combine
  • If no anchor given, skill asks the operator first: "Which process are we mapping? Give me a one-line description and (optionally) one of: an HTTP endpoint, a state machine class name, a queue job name, or a directory path"
  • Discovery is reachability-scoped: starting from the anchor, follow only what's connected — endpoint → handler → queues it dispatches → their handlers → state transitions → downstream endpoints. Stop at the connected-component boundary
  • Branches into unrelated subsystems (e.g. shared /api/audit-log used by 50 places) get marked as external touchpoints (single shape, not expanded) — skill asks operator: "Expand into a sub-process, or keep as black-box?"
  • Discovery report prints the anchor + scope size ("12 nodes reachable from OnboardingMachine, 3 external touchpoints not expanded")
  • Re-running /process onboarding later regenerates the same scope (same anchor + reachability), not the whole repo

Cross-repo traversal (microservice architectures)

  • When discovery encounters an outbound call whose target maps to another registered project in apexyard.projects.yaml, the skill follows the trace into that repo — same reachability rules apply, just one connected component spanning N repos
  • Cross-repo handoff detection patterns:
    • HTTP/gRPC calls whose hostname or service-discovery name matches a registered project
    • Message-broker publishes (SNS, Kafka, RabbitMQ, SQS, Redis streams) whose topic/queue name matches a topic another registered repo subscribes to
    • Shared event-bus / outbox-pattern tables with producers and consumers in different repos
  • Each repo encountered in the trace becomes a swimlane in the BPMN output (or pool with message-flow arrows if operator prefers; ask at interview, default to swimlanes within one pool for readability)
  • Cross-repo handoffs render as message flows (dashed arrows) with broker + topic on the arrow label
  • Discovery report enumerates the repo trail: OnboardingMachine in signup-svc → POST /verify-identity in identity-svc → emit "identity.verified" → onboarding-svc subscribes → completes profile
  • When a cross-repo handoff lands on a registered repo whose workspace/ clone is missing, skill OFFERS (default-yes) to clone on-demand. If declined or unclonable, renders as external touchpoint with a <bpmn:documentation> explaining why
  • Cross-repo handoff to a target NOT in the registry (Stripe, SendGrid, etc.) renders as an external participant pool — clearly marked out-of-org so the trust boundary is visible

Interview (gap-fill only — don't re-ask what code already says)

  • Skill identifies ambiguous nodes and asks operator to disambiguate: "step processPayment in src/payments/service.ts:42 — user-driven, service task, or external API call?"
  • Skill identifies invisible lanes/pools that code doesn't reveal: approver roles, external systems that touch the flow but aren't called from this repo, manual fallbacks
  • Skill identifies missing labels — code uses step1, step2 — asks "what should step1 be called in the BPMN?"
  • Skill never asks questions whose answer is already in the discovery report

Generation + lint

  • Emits valid BPMN 2.0 XML at projects/<project>/processes/<slug>.bpmn with <bpmn:definitions> root, embedded <bpmndi:BPMNDiagram> with auto-layout coords via bpmn-auto-layout npm package
  • Runs bpmnlint against the emitted file; on violations, surface with file-line context and offer (a) auto-fix where possible, (b) re-interview the relevant step, or (c) accept-as-is with documented exception
  • Default ruleset bpmnlint/recommended + opt-in (label-required, no-disconnected, no-implicit-split); overridable via .bpmnlintrc in project root
  • Final file passes bpmnlint --max-warnings 0 before the skill exits successfully
  • Opens cleanly in Camunda Modeler (manual smoke verified by QA Engineer on first feature merge)

Output + provenance

  • Each BPMN element carries a <bpmn:documentation> child citing its source evidence (src/onboarding/state.ts:42-58, cron config, operator input)
  • Skill writes sibling <slug>.process-source.md with the full discovery report + interview answers, so re-runs can replay or refresh
  • Re-runs OFFER (default-no) to overwrite — same UX as /extract-features
  • projects/<project>/processes/README.md is maintained as an index — one row per BPMN file with the anchor, last-generated date, one-line description

Docs + AgDR

  • SKILL.md documents the seven discovery axes + cross-repo traversal flow + one end-to-end worked example
  • AgDR captures: BPMN 2.0 over alternatives (DMN, CMMN, Mermaid sequence, raw flowchart); bpmn-auto-layout over manual coords; bpmnlint as the gate; read-first-then-ask over operator-only authoring; swimlanes vs separate pools; registry-lookup as the cross-repo signal vs heuristic URL matching; on-demand cloning vs black-box fallback
  • README discloses Node + npm runtime requirement (same shape as the LSP-plugin disclosure)

Design Notes

Pairs with /c4 (static topology) and /extract-features (feature inventory) as the "what we already have" tooling family:

Skill Produces Source
/extract-features Feature Inventory Exhaustive codebase scan across 6 axes
/c4 C4 L1+L2 Mermaid System-boundary scan (services + deployments)
/process BPMN 2.0 process file Anchor-scoped scan across 7 axes, optionally cross-repo
/threat-model --format=dragon (#255) Threat Dragon JSON DFD section from /threat-model

All four are read-first, ask-only-when-the-code-doesn't-say.

Out of Scope

  • BPMN execution (running the model on a BPM engine)
  • DMN, CMMN — separate formats, separate skills if demanded
  • Round-trip import (reading a hand-edited .bpmn back into the candidate-model interview state) — though re-running the scan and diffing against the previous run IS in scope as part of the OFFER-to-overwrite flow
  • Live syncing — one-off scan-then-generate, not a continuous-update tool
  • Camunda 7 vs Camunda 8 extension elements (vanilla BPMN 2.0 only)
  • DSL output for Cadence / Temporal / Step Functions (BPMN-only target in v1)

Effort Estimate

TBD — sizeable (estimate L → XL). Discovery + multi-repo trace + BPMN serialiser + bpmnlint pipeline + interview UX.

Glossary

Term Definition
BPMN 2.0 Business Process Model and Notation, OMG standard 2.0 — XML format for business processes, openable in Camunda Modeler, bpmn.io, Cawemo, etc.
Anchor An operator-supplied or discovered entry point for the process trace (endpoint, state machine, job, directory) — bounds the scope of the scan
Reachability-scoped Discovery follows only what's connected to the anchor; stops at the connected-component boundary; sister subsystems aren't scanned
Swimlane A horizontal partition inside a single BPMN pool, used here to distinguish per-service responsibilities within one process
Pool A standalone process boundary in BPMN; used here for fully-separated participants where message-flow arrows are the only inter-pool connection
Message flow A dashed arrow in BPMN representing communication between pools (e.g. queue publish, HTTP call, event emit)
bpmnlint Open-source linter for BPMN files; npm package; enforces well-formedness + readability rules
bpmn-auto-layout npm package that generates <bpmndi> coordinates so the file opens with a readable diagram in Camunda Modeler
Process anchor slug Short kebab-case identifier for a named process — used as the BPMN file name and the process index key

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — plan-worthy, not urgentenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions