feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications#260
Conversation
…ies + classifications Introduces /dfd as the canonical Data Flow Diagram producer. The DFD becomes the source of truth for /threat-model and /compliance-check — both refactored in this commit to consume from projects/<name>/architecture/dfd.md instead of regenerating their own data-flow views. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications). Classifications detect via three additive pathways: code annotations (@pii, @sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and schema-column heuristics (email, phone, card_number). An optional explicit registry at docs/data-classification.{md,yaml} overrides heuristic labels. Two output formats: Mermaid markdown (default, renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on --format=dragon, for visual editing in Threat Dragon). The Dragon serialiser is a pure function over the in-memory model so #255 (/threat-model --format=dragon) can share or import it. Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh (intended to also be consumed by /process #256). Three resolvers: mrt_resolve_target (hostname/URL/topic → registered project), mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for (workspace clone path resolution). The refactors of /threat-model and /compliance-check are surgical — Step 1 of /threat-model now reads dfd.md and offers /dfd first if it doesn't exist; Step 4 of /compliance-check reads the classifications + external-services list from dfd.md for cross-border-transfer and DPA detection. Output formats and audit-history persistence contracts in both consumers are unchanged. AgDR-0024 captures the design rationale: Mermaid as primary, Threat Dragon as secondary; classifications as a first-class concept; the single-source-of-truth refactor; the shared multi-repo trace helper. Cross-PR coordination: - #255 (/threat-model --format=dragon) — shares the Threat Dragon serialiser; if #255 lands first with the serialiser inside /threat-model, this skill imports from there - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if /process extracted its own version first, the consumer that lands later should consolidate Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all three fixtures from the AC: single-service six-axis discovery, cross-service trust-boundary detection, three-pathway classification detection. Plus Mermaid generator output shape, Threat Dragon JSON schema validation, and verification that /threat-model and /compliance-check actually reference dfd.md. Closes #257
…g parallel landing
atlas-apex
left a comment
There was a problem hiding this comment.
Code Review: PR #260 (/dfd) — REQUEST CHANGES
Commit reviewed: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1
Strong design and impressive surface coverage — six-axis discovery is comprehensive, the /threat-model Step 1 refactor correctly OFFERS to run /dfd first with a legacy fallback for backwards compat, the 52-assertion smoke (incl. fixture 6 verifying the downstream-consumer refactor) is genuinely meaningful, the Mermaid output uses standard flowchart LR + dashed subgraphs that render inline on GitHub, and AgDR-0026 is a clean A/B/C analysis. Blocking on three issues: (1) CI is red on 3 of 4 checks — per .claude/rules/pr-quality.md no merge with red CI, even pre-existing: shellcheck flags SC2221/SC2222 (overlapping case patterns) in _lib-multi-repo-trace.sh:276-278, markdownlint MD032 on .claude/skills/dfd/SKILL.md:260 and :269 (lists need surrounding blank lines), and lychee returns 404 on https://github.com/OWASP/threat-dragon/blob/main/td.vue/src/service/migration/schema/threat-model.v2.json cited in dfd/SKILL.md § 5b (path should be td.vue/src/service/migration/schema/threat-model.v2.json — re-check on Threat Dragon main); (2) AgDR cross-reference bug — the source-of-truth rationale is cited as "AgDR-0024" in at least 5 places (_lib-multi-repo-trace.sh:13, the new dfd/SKILL.md, the refactored threat-model/SKILL.md, the refactored compliance-check/SKILL.md, CLAUDE.md's skill table) but on this branch the AgDR added is 0026 — AgDR-0024 is being filled by PR #258 (Threat Dragon export) and AgDR-0025 by PR #259 (process skill), so a future reader following the citation lands on the wrong record. Renumber every "AgDR-0024" reference in this PR to "AgDR-0026". (3) The "shared" helper isn't shared yet — _lib-multi-repo-trace.sh exports mrt_* functions, but PR #259 ships its own process/cross-repo.sh (200 lines, line-output API, no source of the helper). The two solve the same problem with incompatible interfaces. AgDR-0026 advertises one-helper-two-callers; that only holds once the second-landing PR consolidates — flag for landing order. Coordination finding (non-blocking but noting): dfd/generate-dragon.sh (bash+jq, 185 lines) and PR #258's serialize_dragon.py (Python, 695 lines, schema-validating) both emit Threat Dragon v2 JSON via different implementations — this PR's own file header already anticipates consolidation; whichever lands second should import the other. Minor: classify_envvar's *_KEY arm could false-positive on IDEMPOTENCY_KEY / SORT_KEY / CACHE_KEY — smoke fixture 3 verifies SOME_PUBLIC_FLAG isn't tagged but doesn't cover the *_KEY false-positive surface; consider adding a fixture row or narrowing to *_API_KEY|*_SECRET_KEY|*_PRIVATE_KEY|*_ENCRYPTION_KEY. Address the three red checks + the AgDR renumber and this is good to merge.
🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1
…ee 404, AgDR refs - _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222. - dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032. - dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd). - AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream).
atlas-apex
left a comment
There was a problem hiding this comment.
Code Review: PR #260 (re-review at 86c23bb)
Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8
Summary
Re-review of the four blockers raised against ca4c5b4 / 1a8aa27. All four are fixed and all four CI checks (shellcheck, markdownlint-cli2, lychee, PR Title Check) are green. Verified: (1) _lib-multi-repo-trace.sh:265-285 — redundant *api.{vendor}.com* subset patterns dropped, broader *{vendor}.com* alternatives stand alone; SC2221/2222 clear. (2) dfd/SKILL.md:259-261, 269-271, 275-277 — blank lines now surround the Discovery + Classifications lists in § "Worked example"; MD032 clear. (3) dfd/SKILL.md:190 — Threat Dragon schema URL now points at the repo root (https://github.com/OWASP/threat-dragon) with a prose pointer to td.vue/src/service/migration/schema/, replacing the 404'd deep file path; lychee clear. (4) AgDR-0024 → AgDR-0026 renumber landed cleanly across all 5 cross-referencing files (_lib-multi-repo-trace.sh:7, dfd/SKILL.md:12, threat-model/SKILL.md:13, compliance-check/SKILL.md:13, CLAUDE.md:228); no stale 0024/0025 cites remain and the AgDR file itself is docs/agdr/AgDR-0026-dfd-skill-as-source-of-truth.md.
Checklist Results
- Architecture & Design: Pass
- Code Quality: Pass
- Testing: Pass (smoke.sh covers the discovery shape; bash 3.2 compat documented)
- Security: Pass (read-only against the codebase; no shell injection paths in the trace lib)
- Performance: Pass
- PR Description & Glossary: Pass
- Technical Decisions (AgDR):Pass — AgDR-0026 covers the canonical-DFD-producer + cross-skill consumption pattern
- Adopter Handbooks: N/A (no
handbooks/in this fork)
Issues Found
None blocking at 86c23bb.
Suggestions (deferred to follow-up tickets — non-blocking)
- Helper consolidation — PR #259 (merged to dev) shipped
process/cross-repo.shduplicating the registry-lookup logic that_lib-multi-repo-trace.shwas intended to centralise. The "shared helper" framing in AgDR-0026 is now aspirational until/processis refactored ontomrt_*. File a follow-up to retrofit. - Threat-dragon serialiser duplication — #258 (Python) and #260 (bash + jq) emit the same v2 JSON shape via different toolchains. Pick one when the two skills next intersect.
*_KEYenv-var classifier false-positives —IDEMPOTENCY_KEY/SORT_KEY/CACHE_KEYwill misfire as secrets inclassify.sh. Minor; tighten the regex or add a deny-list when seen in the wild.
Verdict
APPROVED
Reviewed by Rex (Code Reviewer Agent)
Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8
atlas-apex
left a comment
There was a problem hiding this comment.
Code Review: PR #260 (re-review)
Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8
All four blockers from the prior REQUEST CHANGES at ca4c5b4/1a8aa27 are addressed at 86c23bb: (1) shellcheck SC2221/2222 cleared by dropping redundant subset patterns in _lib-multi-repo-trace.sh (e.g. *api.amplitude.com* is already covered by the broader *amplitude.com* glob — patch confirms 5 lines simplified at the third-party detector); (2) MD032 cleared by adding blank lines around the Discovery + Classifications lists in dfd/SKILL.md § "Worked example" at :260 and :269; (3) lychee 404 cleared by repointing the Threat Dragon schema link from the deep file path to the repo root with an inline pointer to td.vue/src/service/migration/schema/; (4) the AgDR-0024 → AgDR-0026 rename now propagates across all 5 cross-refs (_lib-multi-repo-trace.sh, dfd/SKILL.md, threat-model/SKILL.md, compliance-check/SKILL.md, CLAUDE.md). CI is green on all 4 checks at this SHA (shellcheck, markdownlint-cli2, lychee, Verify Ticket ID). Diff is mechanical, scoped to the fixes, no scope creep. Three items remain deferred — helper consolidation with #259's /process cross-repo.sh (now merged on dev), Threat Dragon serialiser duplication with #258 (Python vs bash+jq), and the *_KEY env-var false-positive risk in the classifier — all reasonable as follow-up tickets rather than blockers on this PR.
Checklist Results
- Architecture & Design: Pass
- Code Quality: Pass
- Testing: Pass (smoke.sh + AgDR-0026 unchanged from prior review)
- Security: Pass (no auth/secrets/crypto surface touched in fix commit)
- Performance: Pass
- PR Description & Glossary: Pass
- Technical Decisions (AgDR):Pass (AgDR-0026 in tree, cross-refs now consistent)
Verdict
APPROVED (posted as --comment because gh blocks self-approval; treat this as Rex's approval signal for the merge gate, which keys on the marker file rather than the GitHub review state)
🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8
…ies + classifications (#260) * feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications Introduces /dfd as the canonical Data Flow Diagram producer. The DFD becomes the source of truth for /threat-model and /compliance-check — both refactored in this commit to consume from projects/<name>/architecture/dfd.md instead of regenerating their own data-flow views. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications). Classifications detect via three additive pathways: code annotations (@pii, @sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and schema-column heuristics (email, phone, card_number). An optional explicit registry at docs/data-classification.{md,yaml} overrides heuristic labels. Two output formats: Mermaid markdown (default, renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on --format=dragon, for visual editing in Threat Dragon). The Dragon serialiser is a pure function over the in-memory model so #255 (/threat-model --format=dragon) can share or import it. Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh (intended to also be consumed by /process #256). Three resolvers: mrt_resolve_target (hostname/URL/topic → registered project), mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for (workspace clone path resolution). The refactors of /threat-model and /compliance-check are surgical — Step 1 of /threat-model now reads dfd.md and offers /dfd first if it doesn't exist; Step 4 of /compliance-check reads the classifications + external-services list from dfd.md for cross-border-transfer and DPA detection. Output formats and audit-history persistence contracts in both consumers are unchanged. AgDR-0024 captures the design rationale: Mermaid as primary, Threat Dragon as secondary; classifications as a first-class concept; the single-source-of-truth refactor; the shared multi-repo trace helper. Cross-PR coordination: - #255 (/threat-model --format=dragon) — shares the Threat Dragon serialiser; if #255 lands first with the serialiser inside /threat-model, this skill imports from there - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if /process extracted its own version first, the consumer that lands later should consolidate Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all three fixtures from the AC: single-service six-axis discovery, cross-service trust-boundary detection, three-pathway classification detection. Plus Mermaid generator output shape, Threat Dragon JSON schema validation, and verification that /threat-model and /compliance-check actually reference dfd.md. Closes #257 * fix(#257): renumber AgDR to 0026 — #258 + #259 took 0024 + 0025 during parallel landing * fix(#257): address Rex blockers — shellcheck SC2221/2222, MD032, lychee 404, AgDR refs - _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222. - dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032. - dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd). - AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream). --------- Co-authored-by: me2resh <ahmed.abdelaliem@gmail.com>
Summary
/dfdas the canonical Data Flow Diagram producer in the apexyard skill family. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications), reachability-bounded, optionally cross-repo via the registry.projects/<name>/architecture/dfd.mdas the primary output (renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON at the same dir on--format=dragon(for visual editing in Threat Dragon)./threat-modeland/compliance-checkto consume from the DFD file instead of regenerating their own data-flow views./threat-modelStep 1 now readsdfd.mdand OFFERS to run/dfdfirst if missing;/compliance-checkStep 4 reads the classifications + external-services list from the DFD.@PII,@Sensitive,CLASSIFIED:), env-var heuristics (*_SECRET,*_TOKEN,EMAIL_*), schema-column heuristics (email,phone,card_number). Optional explicit registry atdocs/data-classification.{md,yaml}overrides..claude/hooks/_lib-multi-repo-trace.sh(mrt_resolve_target,mrt_is_third_party,mrt_workspace_for,mrt_list_projects,mrt_offer_clone) — intended to also be consumed by/process([Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256) so cross-repo trace logic isn't duplicated.AgDR-0024 captures the design rationale: Mermaid primary / Threat Dragon secondary; classifications as a first-class concept; single-source-of-truth refactor of the two consumer skills; shared multi-repo trace helper.
Cross-PR coordination
/threat-model --format=dragon) —generate-dragon.shis a pure function over the in-memory DFD model. If [Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255 lands first with the Threat Dragon serialiser inside/threat-model,/dfdshould import from there during code review. If this PR lands first, [Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255 imports from here. End-state: one serialiser, two callers./process— multi-repo BPMN) —/processneeds the same cross-service trace logic. This PR ships_lib-multi-repo-trace.shfor both consumers. If [Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256 has already extracted its own version, the consumer that lands later should consolidate to a single file (Rex will flag the duplication).Testing
# Full smoke (52 assertions across 6 fixtures) bash .claude/skills/dfd/tests/smoke.shAsserts:
mrt_resolve_targetresolves the registered cross-service hostname;mrt_is_third_partyflags Stripe;mrt_workspace_forresolves the correct clone@PIIannotations +EMAIL_*/*_SECRETenv vars +email/phone_number/card_numbercolumns — all three classification pathways fire; PCI labels emitted; no false positive onSOME_PUBLIC_FLAGtm.actor/tm.process/tm.store/tm.boundary/tm.flowcells, STRIDE threats attached to parent elements/threat-modelSKILL.md referencesprojects/.../architecture/dfd.mdAND has an offer-to-run-/dfd-first path;/compliance-checkSKILL.md references the same DFD pathExisting tests still pass:
test_portfolio_paths.sh(38/38),extract-features/tests/smoke.sh(7/7).Discovery scope (v1)
In scope:
Out of scope (v1):
--format=plantuml-dfd) — flagged as v2 follow-up in the ACdfd.md→ re-parsed back into discovery model)Glossary
/threat-modeland/compliance-check— replaces the pattern where each skill rebuilt its own DFD sliceCloses #257