You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an adopter doing security review, compliance work, or onboarding a new team to a service, I want a /dfd skill that scans the codebase(s) to build a Data Flow Diagram showing external actors, processes, data stores, trust boundaries, and data classifications — so I get a single canonical DFD for the system that downstream skills (/threat-model, /compliance-check) can consume instead of each rebuilding their own.
Acceptance Criteria
Discovery (read first, ask later — same pattern as /process and /extract-features)
Skill scans across six DFD-discovery axes and reports findings before any question:
External actors — user-facing endpoints (HTTP routes with public auth scope), auth providers (Auth0, Cognito, Clerk), admin interfaces, third-party API callers detected via webhook signatures, SDK keys, OAuth client registrations
Processes — service handlers that transform data (HTTP route handlers, queue consumers, scheduled jobs, message-broker subscribers, gRPC service methods)
Data stores — RDBMSes (Postgres, MySQL, SQLite — detected via ORM config + connection strings), document stores (Mongo, Dynamo), caches (Redis, Memcached), object storage (S3, GCS), file systems (local disk paths used as persistence), data warehouses (BigQuery, Snowflake, Redshift), search indexes (Elasticsearch, Algolia, Meilisearch)
Data flows — what crosses what: request payload → handler → DB write, queue payload → consumer → external API call, scheduled job → DB read → S3 export, etc. Traced via the same call-graph reachability used by /process
Trust boundaries — network boundaries (public ↔ internal VPC subnet, detected from IaC or env config), authentication transitions (anonymous → authenticated, user → admin, internal user → service account), org boundaries (us → third-party SaaS), data-classification transitions (PII enters a service that doesn't handle PII elsewhere)
Output of discovery: structured candidate model — [{actor_id, type, evidence: <file:line>, classification?}] + flows + boundaries — printed for operator review before final DFD is generated
Discovery is read-only; no files written until operator approves
Scoping
Skill takes an optional anchor to scope the DFD: --scope <service-name> (single service), --scope-all (every registered project — full system DFD), or default (asks operator: "Scope this DFD to one service, or the whole system?")
Single-service scope: reachability-bounded same as /process — only external actors that talk TO the service, data stores the service touches, data flows that cross the service's boundary
System-wide scope: walks the registry, builds per-service sub-DFDs, then composes one master DFD with trust boundaries between services
Cross-repo traversal (microservice architectures — same registry pattern as /process)
When the trace lands on another registered project (cross-service data flow), follow into that repo's source for the connected slice
Each managed service becomes a trust-boundary box in the DFD (services are isolation units — every cross-service flow crosses a boundary)
Unmanaged third parties (Stripe, SendGrid, Salesforce, etc.) render as external entities with their own trust-boundary marker
Interview (gap-fill only)
Skill asks operator to disambiguate when the code is silent: "this field user.identifier — is this an email, a UUID, or something else? Classification?"
Skill asks operator to confirm inferred trust boundaries: "I'm placing a trust boundary between the public API gateway and the internal services — confirm or override?"
Skill asks operator about external actors the code doesn't reveal: "any human admin actors who interact via a console / runbook outside this codebase?"
Output formats
Default: Mermaid flowchart at projects/<project>/architecture/dfd.md — renders inline on GitHub, the same format /threat-model embeds today (chore(#223): add Data Flow Diagram section to threat-model template #225). This becomes the single source of truth DFD — /threat-model consumes from here instead of regenerating its own
--format=plantuml-dfd (optional, v2): PlantUML DFD syntax for adopters who use PlantUML toolchain
--format=all writes all configured formats in one go
Each DFD element carries provenance comments / metadata: source file:line where it was discovered, classification labels, trust-boundary rationale
Re-runs + cache
Re-running /dfd on the same scope OFFERS (default-no) to overwrite — same UX as /extract-features and /process
Stores discovery output at projects/<project>/architecture/dfd-source.{yaml,md} so re-runs can diff "what changed since last time" — useful for spotting newly-introduced data flows that may warrant security review
Refactor: extract /threat-model's DFD into /dfd as the producer
PR refactors /threat-model so its DFD section comes from /dfd output (read from projects/<project>/architecture/dfd.md), not regenerated internally. If /dfd hasn't been run, /threat-model OFFERS to run it first
/compliance-check (the GDPR/ePrivacy audit skill) gains a discovery step: read the DFD's data classifications and flow-targets to find cross-border data transfers, third-party processors, etc.
Coordinate the /threat-model and /compliance-check updates with the existing skill maintainers (mostly cosmetic given the framework is single-org)
Docs + AgDR
SKILL.md documents the six discovery axes + scoping + one worked example
AgDR captures: Mermaid as the primary output (renders on GitHub, no toolchain dep), Threat Dragon JSON as the secondary; classifications as a first-class concept (annotations + heuristics + explicit registry); single-source-of-truth refactor of /threat-model and /compliance-check
Index in projects/<project>/architecture/README.md updated to list the DFD alongside the C4 diagrams
Design Notes
Completes the "what we already have" visualization tooling family:
Manual hand-authoring of DFDs in the skill (operator should edit the Mermaid file directly OR re-run /dfd with corrected anchor/answers)
Real-time DFD updates (one-shot, not continuous — same as siblings)
Replacing dedicated SAST/DAST tools — /dfd informs threat modelling and compliance, doesn't substitute for actual security scans
Pretty graphical export (SVG/PNG) — that's what Threat Dragon does after JSON import
Reading IaC (Terraform, CloudFormation) for trust boundaries in v1 — single signal source (code + env config) for v1; IaC integration is a follow-up
Effort Estimate
TBD — L → XL. Discovery engine + classification heuristics + cross-service trust-boundary inference + Mermaid + Threat Dragon serialisers + /threat-model and /compliance-check refactor to consume.
Glossary
Term
Definition
DFD
Data Flow Diagram — shows external entities, processes, data stores, trust boundaries, and the data that crosses between them; foundation for STRIDE threat modelling
Trust boundary
A line in a DFD where data crosses a privilege/network/ownership boundary (anonymous → authenticated, public ↔ internal, us → third-party); threats concentrate at boundaries
Data classification
A label applied to data identifying its sensitivity tier (PII, PCI, secrets, internal, public); informs encryption-at-rest, access controls, and regulatory scope (GDPR, HIPAA, PCI-DSS)
Reachability-bounded
Discovery scopes to what's connected to an anchor; doesn't blindly scan the whole repo for irrelevant data flows
Provenance
The file:line evidence trail for every DFD element — why this actor/process/store was placed in the diagram, so re-runs can refresh accurately
Single source of truth
One DFD per scope, consumed by /threat-model and /compliance-check — replaces the current pattern where each skill rebuilds its own DFD slice
User Story
As an adopter doing security review, compliance work, or onboarding a new team to a service, I want a
/dfdskill that scans the codebase(s) to build a Data Flow Diagram showing external actors, processes, data stores, trust boundaries, and data classifications — so I get a single canonical DFD for the system that downstream skills (/threat-model,/compliance-check) can consume instead of each rebuilding their own.Acceptance Criteria
Discovery (read first, ask later — same pattern as /process and /extract-features)
@PII,@Sensitive,// CLASSIFIED:comments), env-var naming heuristics (*_SECRET,*_TOKEN,*_KEY,*_PASSWORD), schema column names matching PII patterns (email,phone,ssn,dob,address,name,ip_address), PCI-relevant patterns (card_number,cvv,exp_month), explicit data-classification registries if the project maintains one (docs/data-classification.{md,yaml})[{actor_id, type, evidence: <file:line>, classification?}]+ flows + boundaries — printed for operator review before final DFD is generatedScoping
--scope <service-name>(single service),--scope-all(every registered project — full system DFD), or default (asks operator: "Scope this DFD to one service, or the whole system?")Cross-repo traversal (microservice architectures — same registry pattern as /process)
Interview (gap-fill only)
user.identifier— is this an email, a UUID, or something else? Classification?"Output formats
projects/<project>/architecture/dfd.md— renders inline on GitHub, the same format /threat-model embeds today (chore(#223): add Data Flow Diagram section to threat-model template #225). This becomes the single source of truth DFD — /threat-model consumes from here instead of regenerating its own--format=dragonemits OWASP Threat Dragon v2 JSON (shares the serialiser added in [Feature] /threat-model --format=dragon flag for OWASP Threat Dragon JSON export #255) — for visual editing in Threat Dragon--format=plantuml-dfd(optional, v2): PlantUML DFD syntax for adopters who use PlantUML toolchain--format=allwrites all configured formats in one goRe-runs + cache
projects/<project>/architecture/dfd-source.{yaml,md}so re-runs can diff "what changed since last time" — useful for spotting newly-introduced data flows that may warrant security reviewRefactor: extract /threat-model's DFD into /dfd as the producer
projects/<project>/architecture/dfd.md), not regenerated internally. If /dfd hasn't been run, /threat-model OFFERS to run it firstDocs + AgDR
projects/<project>/architecture/README.mdupdated to list the DFD alongside the C4 diagramsDesign Notes
Completes the "what we already have" visualization tooling family:
Out of Scope
Effort Estimate
TBD — L → XL. Discovery engine + classification heuristics + cross-service trust-boundary inference + Mermaid + Threat Dragon serialisers + /threat-model and /compliance-check refactor to consume.
Glossary