feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications by atlas-apex · Pull Request #260 · me2resh/apexyard

atlas-apex · 2026-05-16T08:33:21Z

Summary

Adds /dfd as the canonical Data Flow Diagram producer in the apexyard skill family. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications), reachability-bounded, optionally cross-repo via the registry.
Writes Mermaid markdown at projects/<name>/architecture/dfd.md as the primary output (renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON at the same dir on --format=dragon (for visual editing in Threat Dragon).
Refactors /threat-model and /compliance-check to consume from the DFD file instead of regenerating their own data-flow views. /threat-model Step 1 now reads dfd.md and OFFERS to run /dfd first if missing; /compliance-check Step 4 reads the classifications + external-services list from the DFD.
Three classification pathways are additive: code annotations (@PII, @Sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), schema-column heuristics (email, phone, card_number). Optional explicit registry at docs/data-classification.{md,yaml} overrides.
New shared helper .claude/hooks/_lib-multi-repo-trace.sh (mrt_resolve_target, mrt_is_third_party, mrt_workspace_for, mrt_list_projects, mrt_offer_clone) — intended to also be consumed by /process ([Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256) so cross-repo trace logic isn't duplicated.

AgDR-0024 captures the design rationale: Mermaid primary / Threat Dragon secondary; classifications as a first-class concept; single-source-of-truth refactor of the two consumer skills; shared multi-repo trace helper.

Cross-PR coordination

[Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255 (/threat-model --format=dragon) — generate-dragon.sh is a pure function over the in-memory DFD model. If [Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255 lands first with the Threat Dragon serialiser inside /threat-model, /dfd should import from there during code review. If this PR lands first, [Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255 imports from here. End-state: one serialiser, two callers.
[Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256 (/process — multi-repo BPMN) — /process needs the same cross-service trace logic. This PR ships _lib-multi-repo-trace.sh for both consumers. If [Feature] /process — extract business process from code (multi-repo), augment via interview, generate lint-clean BPMN 2.0 #256 has already extracted its own version, the consumer that lands later should consolidate to a single file (Rex will flag the duplication).

Testing

# Full smoke (52 assertions across 6 fixtures)
bash .claude/skills/dfd/tests/smoke.sh

Asserts:

Fixture	What
1	Single-service Express + Prisma + BullMQ + Stripe + SendGrid + Auth0 + admin route — all six discovery axes produce non-empty findings; four trust boundaries (public ↔ backend, user ↔ admin, backend ↔ data, us ↔ third-party) detected
2	Two synthetic services with cross-service flow + Stripe URL — `mrt_resolve_target` resolves the registered cross-service hostname; `mrt_is_third_party` flags Stripe; `mrt_workspace_for` resolves the correct clone
3	Code with `@PII` annotations + `EMAIL_`/`_SECRET` env vars + `email`/`phone_number`/`card_number` columns — all three classification pathways fire; PCI labels emitted; no false positive on `SOME_PUBLIC_FLAG`
4	Mermaid generator output shape — heading, source-of-truth declaration, flowchart block, subgraphs, trust-boundaries table, classifications table, provenance section, footer signature
5	Threat Dragon JSON — valid JSON, required v2 schema keys, `tm.actor`/`tm.process`/`tm.store`/`tm.boundary`/`tm.flow` cells, STRIDE threats attached to parent elements
6	Downstream-consumer refactor — `/threat-model` SKILL.md references `projects/.../architecture/dfd.md` AND has an offer-to-run-/dfd-first path; `/compliance-check` SKILL.md references the same DFD path

Existing tests still pass: test_portfolio_paths.sh (38/38), extract-features/tests/smoke.sh (7/7).

Discovery scope (v1)

In scope:

HTTP routes (Express, Fastify, NestJS, FastAPI, Flask, Django, Rails, Gin, Spring, Lambda)
Job frameworks (BullMQ, Celery, Sidekiq, Active Job, Temporal)
Cron / EventBridge schedules
Data stores (Prisma, TypeORM, SQLAlchemy, Mongoose, DynamoDB, S3, Redis, Elasticsearch / Algolia / Meilisearch, BigQuery / Snowflake / Redshift)
Third-party SDKs (Stripe, SendGrid, Twilio, Anthropic, OpenAI, Sentry, Datadog, PostHog, Amplitude, Mixpanel, Algolia, Meilisearch)

Out of scope (v1):

IaC reading (Terraform / CloudFormation / SAM) for trust-boundary inference — code + env config only
SAST / DAST substitution
Pretty SVG/PNG export (Threat Dragon does that after JSON import)
PlantUML DFD format (--format=plantuml-dfd) — flagged as v2 follow-up in the AC
Round-trip import (hand-edited dfd.md → re-parsed back into discovery model)
L3 / L4 detail

Glossary

Term	Definition
DFD	Data Flow Diagram — external entities + processes + data stores + trust boundaries + data flows; foundation for STRIDE threat modelling
Trust boundary	A line in a DFD where data crosses a privilege / network / ownership boundary; threats concentrate at boundaries
Data classification	A sensitivity label (PII, PCI, secrets, internal, public) attached to a data element; informs encryption, access control, regulatory scope
Reachability-bounded	Discovery scoped to what's connected to an anchor; doesn't blindly scan the whole repo for irrelevant data flows
Provenance	The file:line evidence trail for every DFD element — why each actor / process / store was placed in the diagram
Source of truth	One DFD per scope, consumed by `/threat-model` and `/compliance-check` — replaces the pattern where each skill rebuilt its own DFD slice
STRIDE	Microsoft's threat-classification mnemonic — Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege
Threat Dragon	OWASP open-source desktop / web app for creating + editing threat models with STRIDE annotations; reads a v2 JSON schema this skill emits
Cross-repo trace	Resolving an outbound call (HTTP host, message-broker topic) to another registered project so the trace can follow into that repo's source for the connected slice

Closes #257

@pii

…ies + classifications Introduces /dfd as the canonical Data Flow Diagram producer. The DFD becomes the source of truth for /threat-model and /compliance-check — both refactored in this commit to consume from projects/<name>/architecture/dfd.md instead of regenerating their own data-flow views. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications). Classifications detect via three additive pathways: code annotations (@pii, @sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and schema-column heuristics (email, phone, card_number). An optional explicit registry at docs/data-classification.{md,yaml} overrides heuristic labels. Two output formats: Mermaid markdown (default, renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on --format=dragon, for visual editing in Threat Dragon). The Dragon serialiser is a pure function over the in-memory model so #255 (/threat-model --format=dragon) can share or import it. Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh (intended to also be consumed by /process #256). Three resolvers: mrt_resolve_target (hostname/URL/topic → registered project), mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for (workspace clone path resolution). The refactors of /threat-model and /compliance-check are surgical — Step 1 of /threat-model now reads dfd.md and offers /dfd first if it doesn't exist; Step 4 of /compliance-check reads the classifications + external-services list from dfd.md for cross-border-transfer and DPA detection. Output formats and audit-history persistence contracts in both consumers are unchanged. AgDR-0024 captures the design rationale: Mermaid as primary, Threat Dragon as secondary; classifications as a first-class concept; the single-source-of-truth refactor; the shared multi-repo trace helper. Cross-PR coordination: - #255 (/threat-model --format=dragon) — shares the Threat Dragon serialiser; if #255 lands first with the serialiser inside /threat-model, this skill imports from there - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if /process extracted its own version first, the consumer that lands later should consolidate Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all three fixtures from the AC: single-service six-axis discovery, cross-service trust-boundary detection, three-pathway classification detection. Plus Mermaid generator output shape, Threat Dragon JSON schema validation, and verification that /threat-model and /compliance-check actually reference dfd.md. Closes #257

…g parallel landing

atlas-apex

Code Review: PR #260 (`/dfd`) — REQUEST CHANGES

Commit reviewed: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1

Strong design and impressive surface coverage — six-axis discovery is comprehensive, the /threat-model Step 1 refactor correctly OFFERS to run /dfd first with a legacy fallback for backwards compat, the 52-assertion smoke (incl. fixture 6 verifying the downstream-consumer refactor) is genuinely meaningful, the Mermaid output uses standard flowchart LR + dashed subgraphs that render inline on GitHub, and AgDR-0026 is a clean A/B/C analysis. Blocking on three issues: (1) CI is red on 3 of 4 checks — per .claude/rules/pr-quality.md no merge with red CI, even pre-existing: shellcheck flags SC2221/SC2222 (overlapping case patterns) in _lib-multi-repo-trace.sh:276-278, markdownlint MD032 on .claude/skills/dfd/SKILL.md:260 and :269 (lists need surrounding blank lines), and lychee returns 404 on https://github.com/OWASP/threat-dragon/blob/main/td.vue/src/service/migration/schema/threat-model.v2.json cited in dfd/SKILL.md § 5b (path should be td.vue/src/service/migration/schema/threat-model.v2.json — re-check on Threat Dragon main); (2) AgDR cross-reference bug — the source-of-truth rationale is cited as "AgDR-0024" in at least 5 places (_lib-multi-repo-trace.sh:13, the new dfd/SKILL.md, the refactored threat-model/SKILL.md, the refactored compliance-check/SKILL.md, CLAUDE.md's skill table) but on this branch the AgDR added is 0026 — AgDR-0024 is being filled by PR #258 (Threat Dragon export) and AgDR-0025 by PR #259 (process skill), so a future reader following the citation lands on the wrong record. Renumber every "AgDR-0024" reference in this PR to "AgDR-0026". (3) The "shared" helper isn't shared yet — _lib-multi-repo-trace.sh exports mrt_* functions, but PR #259 ships its own process/cross-repo.sh (200 lines, line-output API, no source of the helper). The two solve the same problem with incompatible interfaces. AgDR-0026 advertises one-helper-two-callers; that only holds once the second-landing PR consolidates — flag for landing order. Coordination finding (non-blocking but noting): dfd/generate-dragon.sh (bash+jq, 185 lines) and PR #258's serialize_dragon.py (Python, 695 lines, schema-validating) both emit Threat Dragon v2 JSON via different implementations — this PR's own file header already anticipates consolidation; whichever lands second should import the other. Minor: classify_envvar's *_KEY arm could false-positive on IDEMPOTENCY_KEY / SORT_KEY / CACHE_KEY — smoke fixture 3 verifies SOME_PUBLIC_FLAG isn't tagged but doesn't cover the *_KEY false-positive surface; consider adding a fixture row or narrowing to *_API_KEY|*_SECRET_KEY|*_PRIVATE_KEY|*_ENCRYPTION_KEY. Address the three red checks + the AgDR renumber and this is good to merge.

🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 1a8aa2702a350b5b27e9b7864faac13dfd03deb1

…ee 404, AgDR refs - _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222. - dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032. - dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd). - AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream).

atlas-apex

Code Review: PR #260 (re-review at `86c23bb`)

Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

Summary

Re-review of the four blockers raised against ca4c5b4 / 1a8aa27. All four are fixed and all four CI checks (shellcheck, markdownlint-cli2, lychee, PR Title Check) are green. Verified: (1) _lib-multi-repo-trace.sh:265-285 — redundant *api.{vendor}.com* subset patterns dropped, broader *{vendor}.com* alternatives stand alone; SC2221/2222 clear. (2) dfd/SKILL.md:259-261, 269-271, 275-277 — blank lines now surround the Discovery + Classifications lists in § "Worked example"; MD032 clear. (3) dfd/SKILL.md:190 — Threat Dragon schema URL now points at the repo root (https://github.com/OWASP/threat-dragon) with a prose pointer to td.vue/src/service/migration/schema/, replacing the 404'd deep file path; lychee clear. (4) AgDR-0024 → AgDR-0026 renumber landed cleanly across all 5 cross-referencing files (_lib-multi-repo-trace.sh:7, dfd/SKILL.md:12, threat-model/SKILL.md:13, compliance-check/SKILL.md:13, CLAUDE.md:228); no stale 0024/0025 cites remain and the AgDR file itself is docs/agdr/AgDR-0026-dfd-skill-as-source-of-truth.md.

Checklist Results

Architecture & Design: Pass
Code Quality: Pass
Testing: Pass (smoke.sh covers the discovery shape; bash 3.2 compat documented)
Security: Pass (read-only against the codebase; no shell injection paths in the trace lib)
Performance: Pass
PR Description & Glossary: Pass
Technical Decisions (AgDR):Pass — AgDR-0026 covers the canonical-DFD-producer + cross-skill consumption pattern
Adopter Handbooks: N/A (no handbooks/ in this fork)

Issues Found

None blocking at 86c23bb.

Suggestions (deferred to follow-up tickets — non-blocking)

Helper consolidation — PR #259 (merged to dev) shipped process/cross-repo.sh duplicating the registry-lookup logic that _lib-multi-repo-trace.sh was intended to centralise. The "shared helper" framing in AgDR-0026 is now aspirational until /process is refactored onto mrt_*. File a follow-up to retrofit.
Threat-dragon serialiser duplication — #258 (Python) and #260 (bash + jq) emit the same v2 JSON shape via different toolchains. Pick one when the two skills next intersect.
*_KEY env-var classifier false-positives — IDEMPOTENCY_KEY / SORT_KEY / CACHE_KEY will misfire as secrets in classify.sh. Minor; tighten the regex or add a deny-list when seen in the wild.

Verdict

APPROVED

Reviewed by Rex (Code Reviewer Agent)
Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

atlas-apex

Code Review: PR #260 (re-review)

Commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

All four blockers from the prior REQUEST CHANGES at ca4c5b4/1a8aa27 are addressed at 86c23bb: (1) shellcheck SC2221/2222 cleared by dropping redundant subset patterns in _lib-multi-repo-trace.sh (e.g. *api.amplitude.com* is already covered by the broader *amplitude.com* glob — patch confirms 5 lines simplified at the third-party detector); (2) MD032 cleared by adding blank lines around the Discovery + Classifications lists in dfd/SKILL.md § "Worked example" at :260 and :269; (3) lychee 404 cleared by repointing the Threat Dragon schema link from the deep file path to the repo root with an inline pointer to td.vue/src/service/migration/schema/; (4) the AgDR-0024 → AgDR-0026 rename now propagates across all 5 cross-refs (_lib-multi-repo-trace.sh, dfd/SKILL.md, threat-model/SKILL.md, compliance-check/SKILL.md, CLAUDE.md). CI is green on all 4 checks at this SHA (shellcheck, markdownlint-cli2, lychee, Verify Ticket ID). Diff is mechanical, scoped to the fixes, no scope creep. Three items remain deferred — helper consolidation with #259's /process cross-repo.sh (now merged on dev), Threat Dragon serialiser duplication with #258 (Python vs bash+jq), and the *_KEY env-var false-positive risk in the classifier — all reasonable as follow-up tickets rather than blockers on this PR.

Checklist Results

Architecture & Design: Pass
Code Quality: Pass
Testing: Pass (smoke.sh + AgDR-0026 unchanged from prior review)
Security: Pass (no auth/secrets/crypto surface touched in fix commit)
Performance: Pass
PR Description & Glossary: Pass
Technical Decisions (AgDR):Pass (AgDR-0026 in tree, cross-refs now consistent)

Verdict

APPROVED (posted as --comment because gh blocks self-approval; treat this as Rex's approval signal for the merge gate, which keys on the marker file rather than the GitHub review state)

🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 86c23bbce2d4b15a15fff426e18154df3e08bde8

@pii

…ies + classifications (#260) * feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications Introduces /dfd as the canonical Data Flow Diagram producer. The DFD becomes the source of truth for /threat-model and /compliance-check — both refactored in this commit to consume from projects/<name>/architecture/dfd.md instead of regenerating their own data-flow views. Six-axis discovery (external actors, processes, data stores, data flows, trust boundaries, data classifications). Classifications detect via three additive pathways: code annotations (@pii, @sensitive, CLASSIFIED:), env-var heuristics (*_SECRET, *_TOKEN, EMAIL_*), and schema-column heuristics (email, phone, card_number). An optional explicit registry at docs/data-classification.{md,yaml} overrides heuristic labels. Two output formats: Mermaid markdown (default, renders inline on GitHub, zero new toolchain) and OWASP Threat Dragon v2 JSON (on --format=dragon, for visual editing in Threat Dragon). The Dragon serialiser is a pure function over the in-memory model so #255 (/threat-model --format=dragon) can share or import it. Multi-repo discovery via a new shared helper _lib-multi-repo-trace.sh (intended to also be consumed by /process #256). Three resolvers: mrt_resolve_target (hostname/URL/topic → registered project), mrt_is_third_party (known SaaS vendor detection), mrt_workspace_for (workspace clone path resolution). The refactors of /threat-model and /compliance-check are surgical — Step 1 of /threat-model now reads dfd.md and offers /dfd first if it doesn't exist; Step 4 of /compliance-check reads the classifications + external-services list from dfd.md for cross-border-transfer and DPA detection. Output formats and audit-history persistence contracts in both consumers are unchanged. AgDR-0024 captures the design rationale: Mermaid as primary, Threat Dragon as secondary; classifications as a first-class concept; the single-source-of-truth refactor; the shared multi-repo trace helper. Cross-PR coordination: - #255 (/threat-model --format=dragon) — shares the Threat Dragon serialiser; if #255 lands first with the serialiser inside /threat-model, this skill imports from there - #256 (/process) — shares the _lib-multi-repo-trace.sh helper; if /process extracted its own version first, the consumer that lands later should consolidate Tests (.claude/skills/dfd/tests/smoke.sh, 52 assertions) cover all three fixtures from the AC: single-service six-axis discovery, cross-service trust-boundary detection, three-pathway classification detection. Plus Mermaid generator output shape, Threat Dragon JSON schema validation, and verification that /threat-model and /compliance-check actually reference dfd.md. Closes #257 * fix(#257): renumber AgDR to 0026 — #258 + #259 took 0024 + 0025 during parallel landing * fix(#257): address Rex blockers — shellcheck SC2221/2222, MD032, lychee 404, AgDR refs - _lib-multi-repo-trace.sh: drop redundant subset patterns in third-party detector (e.g. *api.amplitude.com* covered by broader *amplitude.com*) — clears SC2221/2222. - dfd/SKILL.md: blank lines around Discovery + Classifications lists — clears MD032. - dfd/SKILL.md: Threat Dragon schema link now points to the repo root (sub-path 404'd). - AgDR cross-refs across 5 files renumbered 0024 → 0026 (renumber missed downstream). --------- Co-authored-by: me2resh <ahmed.abdelaliem@gmail.com>

github-actions Bot added github-issue has-ticket labels May 16, 2026

fix(#257): renumber AgDR to 0026 — #258 + #259 took 0024 + 0025 durin…

1a8aa27

…g parallel landing

atlas-apex mentioned this pull request May 16, 2026

feat(#256): /process skill — extract process from code, BPMN 2.0 output #259

Merged

atlas-apex commented May 16, 2026

View reviewed changes

atlas-apex merged commit 289fe81 into dev May 16, 2026
4 checks passed

atlas-apex deleted the feature/GH-257-dfd-skill branch May 16, 2026 14:18

atlas-apex mentioned this pull request May 16, 2026

[Feature] /dfd — extract Data Flow Diagram from code (multi-repo), with trust boundaries + classifications, emit Mermaid + Threat Dragon JSON #257

Closed

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications#260

feat(#257): /dfd skill — extract Data Flow Diagram with trust boundaries + classifications#260
atlas-apex merged 3 commits into
devfrom
feature/GH-257-dfd-skill

atlas-apex commented May 16, 2026

Uh oh!

atlas-apex left a comment

Uh oh!

atlas-apex left a comment

Uh oh!

atlas-apex left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atlas-apex commented May 16, 2026

Summary

Cross-PR coordination

Testing

Discovery scope (v1)

Glossary

Uh oh!

atlas-apex left a comment

Choose a reason for hiding this comment

Code Review: PR #260 (/dfd) — REQUEST CHANGES

Uh oh!

atlas-apex left a comment

Choose a reason for hiding this comment

Code Review: PR #260 (re-review at 86c23bb)

Summary

Checklist Results

Issues Found

Suggestions (deferred to follow-up tickets — non-blocking)

Verdict

Uh oh!

atlas-apex left a comment

Choose a reason for hiding this comment

Code Review: PR #260 (re-review)

Checklist Results

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Code Review: PR #260 (`/dfd`) — REQUEST CHANGES

Code Review: PR #260 (re-review at `86c23bb`)