Skip to content

feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications#260

Merged
atlas-apex merged 3 commits into
devfrom
feature/GH-257-dfd-skill
May 16, 2026
Merged

feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications#260
atlas-apex merged 3 commits into
devfrom
feature/GH-257-dfd-skill

Conversation

@atlas-apex

Copy link
Copy Markdown
Collaborator

Summary

  • Adds /dfd as the canonical Data Flow Diagram producer in the apexyard skill family. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications), reachability-bounded, optionally cross-repo via the registry.
  • Writes Mermaid markdown at projects/<name>/architecture/dfd.md as the primary output (renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON at the same dir on --format=dragon (for visual editing in Threat Dragon).
  • Refactors /threat-model and /compliance-check to consume from the DFD file instead of regenerating their own data-flow views. /threat-model Step 1 now reads dfd.md and OFFERS to run /dfd first if missing; /compliance-check Step 4 reads the classifications + external-services list from the DFD.
  • Three classification pathways are additive: code annotations (@PII, @Sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), schema-column heuristics (email, phone, card_number). Optional explicit registry at docs/data-classification.{md,yaml} overrides.
  • New shared helper .claude/hooks/_lib-multi-repo-trace.sh (mrt_resolve_target, mrt_is_third_party, mrt_workspace_for, mrt_list_projects, mrt_offer_clone) — intended to also be consumed by /process ([Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256) so cross-repo trace logic isn't duplicated.

AgDR-0024 captures the design rationale: Mermaid primary / Threat Dragon secondary; classifications as a first-class concept; single-source-of-truth refactor of the two consumer skills; shared multi-repo trace helper.

Cross-PR coordination

Testing

# Full smoke (52 assertions across 6 fixtures)
bash .claude/skills/dfd/tests/smoke.sh

Asserts:

Fixture What
1 Single-service Express + Prisma + BullMQ + Stripe + SendGrid + Auth0 + admin route — all six discovery axes produce non-empty findings; four trust boundaries (public ↔ backend, user ↔ admin, backend ↔ data, us ↔ third-party) detected
2 Two synthetic services with cross-service flow + Stripe URL — mrt_resolve_target resolves the registered cross-service hostname; mrt_is_third_party flags Stripe; mrt_workspace_for resolves the correct clone
3 Code with @PII annotations + EMAIL_*/*_SECRET env vars + email/phone_number/card_number columns — all three classification pathways fire; PCI labels emitted; no false positive on SOME_PUBLIC_FLAG
4 Mermaid generator output shape — heading, source-of-truth declaration, flowchart block, subgraphs, trust-boundaries table, classifications table, provenance section, footer signature
5 Threat Dragon JSON — valid JSON, required v2 schema keys, tm.actor/tm.process/tm.store/tm.boundary/tm.flow cells, STRIDE threats attached to parent elements
6 Downstream-consumer refactor — /threat-model SKILL.md references projects/.../architecture/dfd.md AND has an offer-to-run-/dfd-first path; /compliance-check SKILL.md references the same DFD path

Existing tests still pass: test_portfolio_paths.sh (38/38), extract-features/tests/smoke.sh (7/7).

Discovery scope (v1)

In scope:

  • HTTP routes (Express, Fastify, NestJS, FastAPI, Flask, Django, Rails, Gin, Spring, Lambda)
  • Job frameworks (BullMQ, Celery, Sidekiq, Active Job, Temporal)
  • Cron / EventBridge schedules
  • Data stores (Prisma, TypeORM, SQLAlchemy, Mongoose, DynamoDB, S3, Redis, Elasticsearch / Algolia / Meilisearch, BigQuery / Snowflake / Redshift)
  • Third-party SDKs (Stripe, SendGrid, Twilio, Anthropic, OpenAI, Sentry, Datadog, PostHog, Amplitude, Mixpanel, Algolia, Meilisearch)

Out of scope (v1):

  • IaC reading (Terraform / CloudFormation / SAM) for trust-boundary inference — code + env config only
  • SAST / DAST substitution
  • Pretty SVG/PNG export (Threat Dragon does that after JSON import)
  • PlantUML DFD format (--format=plantuml-dfd) — flagged as v2 follow-up in the AC
  • Round-trip import (hand-edited dfd.md → re-parsed back into discovery model)
  • L3 / L4 detail

Glossary

Term Definition
DFD Data Flow Diagram — external entities + processes + data stores + trust boundaries + data flows; foundation for STRIDE threat modelling
Trust boundary A line in a DFD where data crosses a privilege / network / ownership boundary; threats concentrate at boundaries
Data classification A sensitivity label (PII, PCI, secrets, internal, public) attached to a data element; informs encryption, access control, regulatory scope
Reachability-bounded Discovery scoped to what's connected to an anchor; doesn't blindly scan the whole repo for irrelevant data flows
Provenance The file:line evidence trail for every DFD element — why each actor / process / store was placed in the diagram
Source of truth One DFD per scope, consumed by /threat-model and /compliance-check — replaces the pattern where each skill rebuilt its own DFD slice
STRIDE Microsoft's threat-classification mnemonic — Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege
Threat Dragon OWASP open-source desktop / web app for creating + editing threat models with STRIDE annotations; reads a v2 JSON schema this skill emits
Cross-repo trace Resolving an outbound call (HTTP host, message-broker topic) to another registered project so the trace can follow into that repo's source for the connected slice

Closes #257

…ies + classifications

Introduces /dfd as the canonical Data Flow Diagram producer. The DFD
becomes the source of truth for /threat-model and /compliance-check —
both refactored in this commit to consume from
projects/<name>/architecture/dfd.md instead of regenerating their own
data-flow views.

Six-axis discovery (external actors, processes, data stores, data
flows, trust boundaries, data classifications). Classifications detect
via three additive pathways: code annotations (@pii, @sensitive,
CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and
schema-column heuristics (email, phone, card_number). An optional
explicit registry at docs/data-classification.{md,yaml} overrides
heuristic labels.

Two output formats: Mermaid markdown (default, renders inline on
GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on
--format=dragon, for visual editing in Threat Dragon). The Dragon
serialiser is a pure function over the in-memory model so #255
(/threat-model --format=dragon) can share or import it.

Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh
(intended to also be consumed by /process #256). Three resolvers:
mrt_resolve_target (hostname/URL/topic → registered project),
mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for
(workspace clone path resolution).

The refactors of /threat-model and /compliance-check are surgical —
Step 1 of /threat-model now reads dfd.md and offers /dfd first if it
doesn't exist; Step 4 of /compliance-check reads the classifications +
external-services list from dfd.md for cross-border-transfer and DPA
detection. Output formats and audit-history persistence contracts in
both consumers are unchanged.

AgDR-0024 captures the design rationale: Mermaid as primary, Threat
Dragon as secondary; classifications as a first-class concept; the
single-source-of-truth refactor; the shared multi-repo trace helper.

Cross-PR coordination:
  - #255 (/threat-model --format=dragon) — shares the Threat Dragon
    serialiser; if #255 lands first with the serialiser inside
    /threat-model, this skill imports from there
  - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if
    /process extracted its own version first, the consumer that lands
    later should consolidate

Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all
three fixtures from the AC: single-service six-axis discovery,
cross-service trust-boundary detection, three-pathway classification
detection. Plus Mermaid generator output shape, Threat Dragon JSON
schema validation, and verification that /threat-model and
/compliance-check actually reference dfd.md.

Closes #257

@atlas-apex atlas-apex left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #260 (/dfd) — REQUEST CHANGES

Commit reviewed: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1

Strong design and impressive surface coverage — six-axis discovery is comprehensive, the /threat-model Step 1 refactor correctly OFFERS to run /dfd first with a legacy fallback for backwards compat, the 52-assertion smoke (incl. fixture 6 verifying the downstream-consumer refactor) is genuinely meaningful, the Mermaid output uses standard flowchart LR + dashed subgraphs that render inline on GitHub, and AgDR-0026 is a clean A/B/C analysis. Blocking on three issues: (1) CI is red on 3 of 4 checks — per .claude/rules/pr-quality.md no merge with red CI, even pre-existing: shellcheck flags SC2221/SC2222 (overlapping case patterns) in _lib-multi-repo-trace.sh:276-278, markdownlint MD032 on .claude/skills/dfd/SKILL.md:260 and :269 (lists need surrounding blank lines), and lychee returns 404 on https://github.com/OWASP/threat-dragon/blob/main/td.vue/src/service/migration/schema/threat-model.v2.json cited in dfd/SKILL.md § 5b (path should be td.vue/src/service/migration/schema/threat-model.v2.json — re-check on Threat Dragon main); (2) AgDR cross-reference bug — the source-of-truth rationale is cited as "AgDR-0024" in at least 5 places (_lib-multi-repo-trace.sh:13, the new dfd/SKILL.md, the refactored threat-model/SKILL.md, the refactored compliance-check/SKILL.md, CLAUDE.md's skill table) but on this branch the AgDR added is 0026 — AgDR-0024 is being filled by PR #258 (Threat Dragon export) and AgDR-0025 by PR #259 (process skill), so a future reader following the citation lands on the wrong record. Renumber every "AgDR-0024" reference in this PR to "AgDR-0026". (3) The "shared" helper isn't shared yet_lib-multi-repo-trace.sh exports mrt_* functions, but PR #259 ships its own process/cross-repo.sh (200 lines, line-output API, no source of the helper). The two solve the same problem with incompatible interfaces. AgDR-0026 advertises one-helper-two-callers; that only holds once the second-landing PR consolidates — flag for landing order. Coordination finding (non-blocking but noting): dfd/generate-dragon.sh (bash+jq, 185 lines) and PR #258's serialize_dragon.py (Python, 695 lines, schema-validating) both emit Threat Dragon v2 JSON via different implementations — this PR's own file header already anticipates consolidation; whichever lands second should import the other. Minor: classify_envvar's *_KEY arm could false-positive on IDEMPOTENCY_KEY / SORT_KEY / CACHE_KEY — smoke fixture 3 verifies SOME_PUBLIC_FLAG isn't tagged but doesn't cover the *_KEY false-positive surface; consider adding a fixture row or narrowing to *_API_KEY|*_SECRET_KEY|*_PRIVATE_KEY|*_ENCRYPTION_KEY. Address the three red checks + the AgDR renumber and this is good to merge.


🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1

…ee 404, AgDR refs

- _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector
  (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222.
- dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032.
- dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd).
- AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream).

@atlas-apex atlas-apex left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #260 (re-review at 86c23bb)

Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

Summary

Re-review of the four blockers raised against ca4c5b4 / 1a8aa27. All four are fixed and all four CI checks (shellcheck, markdownlint-cli2, lychee, PR Title Check) are green. Verified: (1) _lib-multi-repo-trace.sh:265-285 — redundant *api.{vendor}.com* subset patterns dropped, broader *{vendor}.com* alternatives stand alone; SC2221/2222 clear. (2) dfd/SKILL.md:259-261, 269-271, 275-277 — blank lines now surround the Discovery + Classifications lists in § "Worked example"; MD032 clear. (3) dfd/SKILL.md:190 — Threat Dragon schema URL now points at the repo root (https://github.com/OWASP/threat-dragon) with a prose pointer to td.vue/src/service/migration/schema/, replacing the 404'd deep file path; lychee clear. (4) AgDR-0024 → AgDR-0026 renumber landed cleanly across all 5 cross-referencing files (_lib-multi-repo-trace.sh:7, dfd/SKILL.md:12, threat-model/SKILL.md:13, compliance-check/SKILL.md:13, CLAUDE.md:228); no stale 0024/0025 cites remain and the AgDR file itself is docs/agdr/AgDR-0026-dfd-skill-as-source-of-truth.md.

Checklist Results

  • Architecture & Design: Pass
  • Code Quality: Pass
  • Testing: Pass (smoke.sh covers the discovery shape; bash 3.2 compat documented)
  • Security: Pass (read-only against the codebase; no shell injection paths in the trace lib)
  • Performance: Pass
  • PR Description & Glossary: Pass
  • Technical Decisions (AgDR):Pass — AgDR-0026 covers the canonical-DFD-producer + cross-skill consumption pattern
  • Adopter Handbooks: N/A (no handbooks/ in this fork)

Issues Found

None blocking at 86c23bb.

Suggestions (deferred to follow-up tickets — non-blocking)

  1. Helper consolidation — PR #259 (merged to dev) shipped process/cross-repo.sh duplicating the registry-lookup logic that _lib-multi-repo-trace.sh was intended to centralise. The "shared helper" framing in AgDR-0026 is now aspirational until /process is refactored onto mrt_*. File a follow-up to retrofit.
  2. Threat-dragon serialiser duplication#258 (Python) and #260 (bash + jq) emit the same v2 JSON shape via different toolchains. Pick one when the two skills next intersect.
  3. *_KEY env-var classifier false-positivesIDEMPOTENCY_KEY / SORT_KEY / CACHE_KEY will misfire as secrets in classify.sh. Minor; tighten the regex or add a deny-list when seen in the wild.

Verdict

APPROVED


Reviewed by Rex (Code Reviewer Agent)
Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

@atlas-apex atlas-apex left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #260 (re-review)

Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

All four blockers from the prior REQUEST CHANGES at ca4c5b4/1a8aa27 are addressed at 86c23bb: (1) shellcheck SC2221/2222 cleared by dropping redundant subset patterns in _lib-multi-repo-trace.sh (e.g. *api.amplitude.com* is already covered by the broader *amplitude.com* glob — patch confirms 5 lines simplified at the third-party detector); (2) MD032 cleared by adding blank lines around the Discovery + Classifications lists in dfd/SKILL.md § "Worked example" at :260 and :269; (3) lychee 404 cleared by repointing the Threat Dragon schema link from the deep file path to the repo root with an inline pointer to td.vue/src/service/migration/schema/; (4) the AgDR-0024 → AgDR-0026 rename now propagates across all 5 cross-refs (_lib-multi-repo-trace.sh, dfd/SKILL.md, threat-model/SKILL.md, compliance-check/SKILL.md, CLAUDE.md). CI is green on all 4 checks at this SHA (shellcheck, markdownlint-cli2, lychee, Verify Ticket ID). Diff is mechanical, scoped to the fixes, no scope creep. Three items remain deferred — helper consolidation with #259's /process cross-repo.sh (now merged on dev), Threat Dragon serialiser duplication with #258 (Python vs bash+jq), and the *_KEY env-var false-positive risk in the classifier — all reasonable as follow-up tickets rather than blockers on this PR.

Checklist Results

  • Architecture & Design: Pass
  • Code Quality: Pass
  • Testing: Pass (smoke.sh + AgDR-0026 unchanged from prior review)
  • Security: Pass (no auth/secrets/crypto surface touched in fix commit)
  • Performance: Pass
  • PR Description & Glossary: Pass
  • Technical Decisions (AgDR):Pass (AgDR-0026 in tree, cross-refs now consistent)

Verdict

APPROVED (posted as --comment because gh blocks self-approval; treat this as Rex's approval signal for the merge gate, which keys on the marker file rather than the GitHub review state)


🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

@atlas-apex atlas-apex merged commit 289fe81 into dev May 16, 2026
4 checks passed
@atlas-apex atlas-apex deleted the feature/GH-257-dfd-skill branch May 16, 2026 14:18
me2resh added a commit that referenced this pull request Jun 5, 2026
…ies + classifications (#260)

* feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications

Introduces /dfd as the canonical Data Flow Diagram producer. The DFD
becomes the source of truth for /threat-model and /compliance-check —
both refactored in this commit to consume from
projects/<name>/architecture/dfd.md instead of regenerating their own
data-flow views.

Six-axis discovery (external actors, processes, data stores, data
flows, trust boundaries, data classifications). Classifications detect
via three additive pathways: code annotations (@pii, @sensitive,
CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and
schema-column heuristics (email, phone, card_number). An optional
explicit registry at docs/data-classification.{md,yaml} overrides
heuristic labels.

Two output formats: Mermaid markdown (default, renders inline on
GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on
--format=dragon, for visual editing in Threat Dragon). The Dragon
serialiser is a pure function over the in-memory model so #255
(/threat-model --format=dragon) can share or import it.

Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh
(intended to also be consumed by /process #256). Three resolvers:
mrt_resolve_target (hostname/URL/topic → registered project),
mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for
(workspace clone path resolution).

The refactors of /threat-model and /compliance-check are surgical —
Step 1 of /threat-model now reads dfd.md and offers /dfd first if it
doesn't exist; Step 4 of /compliance-check reads the classifications +
external-services list from dfd.md for cross-border-transfer and DPA
detection. Output formats and audit-history persistence contracts in
both consumers are unchanged.

AgDR-0024 captures the design rationale: Mermaid as primary, Threat
Dragon as secondary; classifications as a first-class concept; the
single-source-of-truth refactor; the shared multi-repo trace helper.

Cross-PR coordination:
  - #255 (/threat-model --format=dragon) — shares the Threat Dragon
    serialiser; if #255 lands first with the serialiser inside
    /threat-model, this skill imports from there
  - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if
    /process extracted its own version first, the consumer that lands
    later should consolidate

Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all
three fixtures from the AC: single-service six-axis discovery,
cross-service trust-boundary detection, three-pathway classification
detection. Plus Mermaid generator output shape, Threat Dragon JSON
schema validation, and verification that /threat-model and
/compliance-check actually reference dfd.md.

Closes #257

* fix(#257): renumber AgDR to 0026 — #258 + #259 took 0024 + 0025 during parallel landing

* fix(#257): address Rex blockers — shellcheck SC2221/2222, MD032, lychee 404, AgDR refs

- _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector
  (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222.
- dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032.
- dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd).
- AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream).

---------

Co-authored-by: me2resh <ahmed.abdelaliem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants