[Feature] /dfd — extract Data Flow Diagram from code (multi-repo), with trust boundaries + classifications, emit Mermaid + Threat Dragon JSON

## User Story
As an adopter doing security review, compliance work, or onboarding a new team to a service, I want a `/dfd` skill that scans the codebase(s) to build a Data Flow Diagram showing external actors, processes, data stores, trust boundaries, and data classifications — so I get a single canonical DFD for the system that downstream skills (`/threat-model`, `/compliance-check`) can consume instead of each rebuilding their own.

## Acceptance Criteria

### Discovery (read first, ask later — same pattern as /process and /extract-features)

- [ ] Skill scans across **six DFD-discovery axes** and reports findings before any question:
  1. **External actors** — user-facing endpoints (HTTP routes with public auth scope), auth providers (Auth0, Cognito, Clerk), admin interfaces, third-party API callers detected via webhook signatures, SDK keys, OAuth client registrations
  2. **Processes** — service handlers that transform data (HTTP route handlers, queue consumers, scheduled jobs, message-broker subscribers, gRPC service methods)
  3. **Data stores** — RDBMSes (Postgres, MySQL, SQLite — detected via ORM config + connection strings), document stores (Mongo, Dynamo), caches (Redis, Memcached), object storage (S3, GCS), file systems (local disk paths used as persistence), data warehouses (BigQuery, Snowflake, Redshift), search indexes (Elasticsearch, Algolia, Meilisearch)
  4. **Data flows** — what crosses what: request payload → handler → DB write, queue payload → consumer → external API call, scheduled job → DB read → S3 export, etc. Traced via the same call-graph reachability used by /process
  5. **Trust boundaries** — network boundaries (public ↔ internal VPC subnet, detected from IaC or env config), authentication transitions (anonymous → authenticated, user → admin, internal user → service account), org boundaries (us → third-party SaaS), data-classification transitions (PII enters a service that doesn't handle PII elsewhere)
  6. **Data classifications** — detected via: field-level annotations (`@PII`, `@Sensitive`, `// CLASSIFIED:` comments), env-var naming heuristics (`*_SECRET`, `*_TOKEN`, `*_KEY`, `*_PASSWORD`), schema column names matching PII patterns (`email`, `phone`, `ssn`, `dob`, `address`, `name`, `ip_address`), PCI-relevant patterns (`card_number`, `cvv`, `exp_month`), explicit data-classification registries if the project maintains one (`docs/data-classification.{md,yaml}`)
- [ ] Output of discovery: structured candidate model — `[{actor_id, type, evidence: <file:line>, classification?}]` + flows + boundaries — printed for operator review before final DFD is generated
- [ ] Discovery is read-only; no files written until operator approves

### Scoping

- [ ] Skill takes an optional **anchor** to scope the DFD: `--scope <service-name>` (single service), `--scope-all` (every registered project — full system DFD), or default (asks operator: "Scope this DFD to one service, or the whole system?")
- [ ] Single-service scope: reachability-bounded same as /process — only external actors that talk TO the service, data stores the service touches, data flows that cross the service's boundary
- [ ] System-wide scope: walks the registry, builds per-service sub-DFDs, then composes one master DFD with trust boundaries between services

### Cross-repo traversal (microservice architectures — same registry pattern as /process)

- [ ] When the trace lands on another registered project (cross-service data flow), follow into that repo's source for the connected slice
- [ ] Each managed service becomes a **trust-boundary box** in the DFD (services are isolation units — every cross-service flow crosses a boundary)
- [ ] Unmanaged third parties (Stripe, SendGrid, Salesforce, etc.) render as **external entities** with their own trust-boundary marker

### Interview (gap-fill only)

- [ ] Skill asks operator to disambiguate when the code is silent: \"this field `user.identifier` — is this an email, a UUID, or something else? Classification?\"
- [ ] Skill asks operator to confirm inferred trust boundaries: \"I'm placing a trust boundary between the public API gateway and the internal services — confirm or override?\"
- [ ] Skill asks operator about external actors the code doesn't reveal: \"any human admin actors who interact via a console / runbook outside this codebase?\"

### Output formats

- [ ] Default: **Mermaid flowchart** at `projects/<project>/architecture/dfd.md` — renders inline on GitHub, the same format /threat-model embeds today (#225). This becomes the *single source of truth* DFD — /threat-model consumes from here instead of regenerating its own
- [ ] `--format=dragon` emits OWASP Threat Dragon v2 JSON (shares the serialiser added in #255) — for visual editing in Threat Dragon
- [ ] `--format=plantuml-dfd` (optional, v2): PlantUML DFD syntax for adopters who use PlantUML toolchain
- [ ] `--format=all` writes all configured formats in one go
- [ ] Each DFD element carries provenance comments / metadata: source file:line where it was discovered, classification labels, trust-boundary rationale

### Re-runs + cache

- [ ] Re-running /dfd on the same scope OFFERS (default-no) to overwrite — same UX as /extract-features and /process
- [ ] Stores discovery output at `projects/<project>/architecture/dfd-source.{yaml,md}` so re-runs can diff \"what changed since last time\" — useful for spotting newly-introduced data flows that may warrant security review

### Refactor: extract /threat-model's DFD into /dfd as the producer

- [ ] PR refactors /threat-model so its DFD section comes from /dfd output (read from `projects/<project>/architecture/dfd.md`), not regenerated internally. If /dfd hasn't been run, /threat-model OFFERS to run it first
- [ ] /compliance-check (the GDPR/ePrivacy audit skill) gains a discovery step: read the DFD's data classifications and flow-targets to find cross-border data transfers, third-party processors, etc.
- [ ] Coordinate the /threat-model and /compliance-check updates with the existing skill maintainers (mostly cosmetic given the framework is single-org)

### Docs + AgDR

- [ ] SKILL.md documents the six discovery axes + scoping + one worked example
- [ ] AgDR captures: Mermaid as the primary output (renders on GitHub, no toolchain dep), Threat Dragon JSON as the secondary; classifications as a first-class concept (annotations + heuristics + explicit registry); single-source-of-truth refactor of /threat-model and /compliance-check
- [ ] Index in `projects/<project>/architecture/README.md` updated to list the DFD alongside the C4 diagrams

## Design Notes
Completes the **\"what we already have\"** visualization tooling family:

| Skill | Produces | View | Source |
|-------|----------|------|--------|
| /extract-features | Feature Inventory | What | Exhaustive code scan |
| /c4 | C4 L1+L2 Mermaid | Static topology | System-boundary scan |
| /process (#256) | BPMN 2.0 | Dynamic control flow | Anchor-scoped multi-repo trace |
| /dfd (this ticket) | Mermaid + Threat Dragon JSON | Data movement + trust + classification | Reachability + classification heuristics |
| /threat-model | STRIDE markdown | Security analysis ON the DFD | Consumes /dfd output |
| /compliance-check | GDPR/ePrivacy audit | Compliance analysis on data flows | Consumes /dfd classifications |
| /threat-model --format=dragon (#255) | Threat Dragon JSON | Editable threat model | Uses /dfd + STRIDE findings |

## Out of Scope
- Manual hand-authoring of DFDs in the skill (operator should edit the Mermaid file directly OR re-run /dfd with corrected anchor/answers)
- Real-time DFD updates (one-shot, not continuous — same as siblings)
- Replacing dedicated SAST/DAST tools — /dfd informs threat modelling and compliance, doesn't substitute for actual security scans
- Pretty graphical export (SVG/PNG) — that's what Threat Dragon does after JSON import
- Reading IaC (Terraform, CloudFormation) for trust boundaries in v1 — single signal source (code + env config) for v1; IaC integration is a follow-up

## Effort Estimate
TBD — L → XL. Discovery engine + classification heuristics + cross-service trust-boundary inference + Mermaid + Threat Dragon serialisers + /threat-model and /compliance-check refactor to consume.

## Glossary
| Term | Definition |
|------|------------|
| DFD | Data Flow Diagram — shows external entities, processes, data stores, trust boundaries, and the data that crosses between them; foundation for STRIDE threat modelling |
| Trust boundary | A line in a DFD where data crosses a privilege/network/ownership boundary (anonymous → authenticated, public ↔ internal, us → third-party); threats concentrate at boundaries |
| Data classification | A label applied to data identifying its sensitivity tier (PII, PCI, secrets, internal, public); informs encryption-at-rest, access controls, and regulatory scope (GDPR, HIPAA, PCI-DSS) |
| Reachability-bounded | Discovery scopes to what's connected to an anchor; doesn't blindly scan the whole repo for irrelevant data flows |
| Provenance | The file:line evidence trail for every DFD element — why this actor/process/store was placed in the diagram, so re-runs can refresh accurately |
| Single source of truth | One DFD per scope, consumed by /threat-model and /compliance-check — replaces the current pattern where each skill rebuilds its own DFD slice |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] /dfd — extract Data Flow Diagram from code (multi-repo), with trust boundaries + classifications, emit Mermaid + Threat Dragon JSON #257

User Story

Acceptance Criteria

Discovery (read first, ask later — same pattern as /process and /extract-features)

Scoping

Cross-repo traversal (microservice architectures — same registry pattern as /process)

Interview (gap-fill only)

Output formats

Re-runs + cache

Refactor: extract /threat-model's DFD into /dfd as the producer

Docs + AgDR

Design Notes

Out of Scope

Effort Estimate

Glossary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Skill	Produces	View	Source
/extract-features	Feature Inventory	What	Exhaustive code scan
/c4	C4 L1+L2 Mermaid	Static topology	System-boundary scan
/process (#256)	BPMN 2.0	Dynamic control flow	Anchor-scoped multi-repo trace
/dfd (this ticket)	Mermaid + Threat Dragon JSON	Data movement + trust + classification	Reachability + classification heuristics
/threat-model	STRIDE markdown	Security analysis ON the DFD	Consumes /dfd output
/compliance-check	GDPR/ePrivacy audit	Compliance analysis on data flows	Consumes /dfd classifications
/threat-model --format=dragon (#255)	Threat Dragon JSON	Editable threat model	Uses /dfd + STRIDE findings

Term	Definition
DFD	Data Flow Diagram — shows external entities, processes, data stores, trust boundaries, and the data that crosses between them; foundation for STRIDE threat modelling
Trust boundary	A line in a DFD where data crosses a privilege/network/ownership boundary (anonymous → authenticated, public ↔ internal, us → third-party); threats concentrate at boundaries
Data classification	A label applied to data identifying its sensitivity tier (PII, PCI, secrets, internal, public); informs encryption-at-rest, access controls, and regulatory scope (GDPR, HIPAA, PCI-DSS)
Reachability-bounded	Discovery scopes to what's connected to an anchor; doesn't blindly scan the whole repo for irrelevant data flows
Provenance	The file:line evidence trail for every DFD element — why this actor/process/store was placed in the diagram, so re-runs can refresh accurately
Single source of truth	One DFD per scope, consumed by /threat-model and /compliance-check — replaces the current pattern where each skill rebuilds its own DFD slice

[Feature] /dfd — extract Data Flow Diagram from code (multi-repo), with trust boundaries + classifications, emit Mermaid + Threat Dragon JSON #257

Description

User Story

Acceptance Criteria

Discovery (read first, ask later — same pattern as /process and /extract-features)

Scoping

Cross-repo traversal (microservice architectures — same registry pattern as /process)

Interview (gap-fill only)

Output formats

Re-runs + cache

Refactor: extract /threat-model's DFD into /dfd as the producer

Docs + AgDR

Design Notes

Out of Scope

Effort Estimate

Glossary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions