Replies: 8 comments 7 replies
-
|
@niksacdev - moved this over to a discussion for now given that we're weighing claude code integration and a shim layer to make that work with the existing migration to skills. I'll have an update on this shortly and I'm going to start laying out how we plan to do persona-based distribution of hve-core components so that we can provide different users different prompt suites. One of the issues I have with SpecKit and Awesome copilot is they have turned into a bit of a grab bag ... and it's hard to sort the cruft from the gold. |
Beta Was this translation helpful? Give feedback.
-
|
I would probably expand the security frameworks to include other AI related OWASP top 10s like: Aside from those two, you may also want to consider adding other non-AI related OWASP top 10s like:
Adding these would cover a wider breath of topics a user may be working on and need reviewing but the agent. I've gotten a head start of this. I've extracted documentation from OWASP and created skills based on them in this repo https://github.com/JasonTheDeveloper/owasp-skills. You'll also find the source documentation under |
Beta Was this translation helpful? Give feedback.
-
Architectural Proposal: Security Review Agent CompositionI've been thinking through how we might architect the security review agent work and wanted to share a proposal for the community to react to. This isn't set in stone; I'm looking for feedback on the approach before we start building. The Core IdeaMy thinking is that a single monolithic security instruction can't provide meaningful interrogation of a codebase. Each security standard (OWASP Web vs. OWASP LLM vs. STRIDE, for example) asks fundamentally different questions, and trying to cram them all into one file forces you into surface-level pattern matching rather than deep analysis. I'd like to propose that each security standard gets its own instruction file, and an agent selects the right ones based on what kind of code it's looking at. This same pattern would cover OWASP's 11 frameworks and adjacent standards like Microsoft RAI, STRIDE, MITRE ATLAS, Zero Trust, and NIST AI RMF. Proposed Architecture: Layered Composition with Shared SpecificationHere's how I'm imagining the layers fitting together: graph TB
subgraph "Entry Points (Prompts)"
P1["security-review.prompt.md<br/>(general — agent auto-selects)"]
P2["security-review-web.prompt.md"]
P3["security-review-ai.prompt.md"]
P4["security-review-devops.prompt.md"]
P5["security-review-compliance.prompt.md"]
end
subgraph "Agent Layer"
A["security-review.agent.md<br/>Codebase Classification<br/>+ Standard Selection"]
end
subgraph "Shared Specification"
SPEC["security-review-planning.instructions.md<br/>Review protocol · Severity taxonomy<br/>Output format · Cross-standard dedup<br/>CWE mapping conventions<br/>(~650–800 lines)"]
end
subgraph "OWASP Framework Instructions (11)"
OW["owasp-web · owasp-llm<br/>owasp-agentic · owasp-mcp<br/>owasp-cicd · owasp-docker<br/>owasp-infrastructure · owasp-mobile<br/>owasp-ml · owasp-oss<br/>owasp-serverless"]
end
subgraph "Adjacent Standard Instructions (5)"
AS1["rai.instructions.md<br/>Microsoft Responsible AI<br/>6 principles + evaluation"]
AS2["stride.instructions.md<br/>STRIDE threat modeling<br/>protocol"]
AS3["mitre-atlas.instructions.md<br/>Adversarial ML threat<br/>evaluation"]
AS4["zero-trust.instructions.md<br/>Zero Trust architecture<br/>assessment"]
AS5["nist-ai-rmf.instructions.md<br/>NIST AI Risk Mgmt<br/>Framework compliance"]
end
subgraph "Skills Layer (on-demand deep reference)"
SK1["11 OWASP vulnerability skills<br/>(adopted from owasp-skills)"]
SK2["rai-principles/<br/>Fairness · Reliability · Privacy<br/>Inclusiveness · Transparency<br/>Accountability"]
SK3["stride-methodology/<br/>Threat modeling patterns<br/>+ data flow analysis"]
SK4["mitre-atlas-techniques/<br/>Adversarial ML technique<br/>catalog"]
end
P1 --> A
P2 --> A
P3 --> A
P4 --> A
P5 --> A
A --> SPEC
A -->|"selects 2-4<br/>standards"| OW
A -->|"selects"| AS1
A -->|"selects"| AS2
A -->|"selects"| AS3
A -->|"selects"| AS4
A -->|"selects"| AS5
SPEC -.->|"shared contract"| OW
SPEC -.->|"shared contract"| AS1
SPEC -.->|"shared contract"| AS2
OW -.->|"deep reference"| SK1
AS1 -.->|"deep reference"| SK2
AS2 -.->|"deep reference"| SK3
AS3 -.->|"deep reference"| SK4
The key insight is that the agent classifies the codebase first, then pulls in only the 2-4 standards that actually apply. You never load everything at once. Why Per-Standard Instead of Monolithic?I considered three approaches and wanted to lay out the tradeoffs transparently so we can discuss:
I think per-standard hits the sweet spot, but I'd love to hear if anyone sees a different balance here. Standards CoverageI'm proposing three tiers. The first two tiers get dedicated instruction files; the third tier lives as reference mappings inside the shared specification. Tier 1: OWASP Frameworks (11 instruction files)Each framework would get its own instruction with detection checklists, severity guidance, and skill references for all 10 categories within that framework.
Tier 2: Adjacent Security Standards (5 instruction files)These complement OWASP's vulnerability focus with methodology, compliance, and ethical dimensions. I think this is where the architecture gets interesting, because vulnerability lists alone don't cover everything a thorough review needs.
If the community sees other standards that should be in this tier, I'm open to expanding it. The pattern is the same regardless of count. Tier 3: Reference Mappings (in shared specification)These standards serve as cross-reference taxonomies rather than evaluation protocols, so I'd suggest embedding them in the shared specification rather than giving them dedicated instruction files.
How RAI Differs from OWASPI want to call this out because I think it's worth understanding why both are needed. They ask fundamentally different questions: A
RAI assessments should be included in threat models, like we do here in hve-core. We should consider RAI assessment to me more than just adjacent to core security. Proposed File TreeHere's what the file layout could look like. Happy to iterate on naming conventions or organization if people have preferences: Token Budget EstimatesOne concern I wanted to address head-on is context window cost. Here's my rough estimate of what each review scenario would consume:
The per-standard approach keeps any single review well within context limits. The monolithic alternative would consume most of the context window before the agent even starts looking at code. Composition PatternThe idea is that the agent selects standards based on codebase classification, not manual selection. A developer shouldn't need to know which OWASP framework applies to their code.
Suggested Implementation PathI'd suggest phasing this so each phase delivers standalone value. No need to build everything before any of it is useful:
Thoughts?A few open questions I'd love the community's input on:
|
Beta Was this translation helpful? Give feedback.
-
|
Love the phased approach and the per-standard modularity — Phase 1 makes total sense as Why not
|
| Approach | Always-loaded | Per standard (on-demand) | 4-standard review |
|---|---|---|---|
Instructions applyTo: '**' |
~25,600 | 0 | 25,600 |
Instructions (no applyTo) |
0 | ~1,600 | ~6,400 |
| Skills (proposed) | ~1,600 (metadata) | ~1,600 | ~8,000 |
Skills add ~1,600 tokens baseline for discovery awareness (the model knows all 16
standards exist) while keeping any single review well within context limits.
Cross-platform portability
SKILL.md follows the open Agent Skills standard, supported
by VS Code, Copilot CLI, Claude Code, and 15+ other AI tools. Instructions are VS Code
and GitHub.com only. For a distributable artifact like this, skills reach the broadest
audience.
On Claude Code specifically, the subagent equivalent can declare skills: [owasp-top-10]
in its frontmatter, which preloads the full skill content at startup — the runtime injects
the domain knowledge automatically.
Phasing (same as proposed, different packaging)
- Phase 1: OWASP Top 10 + STRIDE + RAI as skills, plus the security reviewer agent
with subagent composition - Phase 2: LLM + Agentic + MCP + ML + MITRE ATLAS skills
- Phase 3: CI/CD + Docker + Infrastructure + Zero Trust skills
- Phase 4: Mobile + OSS + Serverless + NIST AI RMF + remaining standards
Same phasing, same per-standard modularity, same token-efficient selective loading —
just packaged as skills rather than instructions for better discovery, portability, and
progressive disclosure.
Tagging @agreaves-ms for insights into SKILLs usage and experience.
And @obrocki for joining with his initial work on a Security champion agent and skills.
Thoughts?
Beta Was this translation helpful? Give feedback.
-
|
Adding another +1 to the per-standard approach, and a +1 to the use of skills. One of the ideas @JasonTheDeveloper had was around adding an evaluation piece for skills. For each of the different standards we could create a golden data set of example code snippets and evaluate how effective the skill is at identifying vulnerabilities in them. The benefit being that we can track regressions and compare different versions of a skill over time. I wondered if there is any existing capability in HVE-core that would support this? Or if that was functionality that would need to be designed? And then finally, and the reason for raising it here, is where might that fit in the phased approach (if it does)? Another point, is that I feel the RAI topic is a big one. I don't know everyone's backgrounds here, but it would be good to include input from RAI champ(s) to help shape that. |
Beta Was this translation helpful? Give feedback.
-
|
@willvelida you mentioned you were keen to contribute STRIDE related content. See above. |
Beta Was this translation helpful? Give feedback.
-
|
Had a sync with @katriendg and @JasonTheDeveloper this week. We converged on OWASP Web/API as the MVP — start narrow, deliver value, iterate. Security is a minefield and trying to land all 16 standards at once will stall us. Where PR #408 fits: @katriendg's analysis identified the key gaps: the monolithic skill needs splitting into per-standard packages, the agent needs subagent delegation, we need pre-built prompts for domain-specific reviews, and hardcoded paths should give way to semantic discovery. All agreed. Proposed MVP (Phase 1) scope: STRIDE, RAI, Secure by Design and the remaining OWASP frameworks slot into Phase 2+ once the skeleton is proven. The foundation matters more than completeness — tooling will evolve anyway, so the value is in the composition pattern, not coverage breadth. Next steps: |
Beta Was this translation helpful? Give feedback.
-
|
Here's our current thinking for MVP @JasonTheDeveloper @obrocki - we start planning this in the backlog. Security Reviewer — MVP Scope & Phased ApproachBased on our discussion consensus, here's the proposed MVP and phased roadmap. Key ApproachSkills-first, agent-thin. OWASP domain knowledge lives entirely in self-contained skills (SKILL.md + reference files). A single thin orchestrator agent classifies the codebase, selects relevant skills, and delegates per-skill reviews via inline subagent prompts — adopting @JasonTheDeveloper's proven vulnerability-scanner pattern from owasp-skills. No OWASP knowledge lives in the agent body itself. Output is structured and consistent: a two-layer taxonomy (assessment status: PASS/FAIL/PARTIAL/NOT_ASSESSED × severity: CRITICAL/HIGH/MEDIUM/LOW) with findings written to MVP (Phase 1)
The 3 skills were chosen because Web covers the most common case, and LLM + Agentic directly align with the team's daily work building AI-integrated systems. Taking this approach in MVP form with a limited set allows for testing in Depending on success rates, subagents may be created in addition to skills. Testing feedback and evolution of tooling will play a role. Phase 2: AI + Methodology Skills
Phase 3: DevOps + Infrastructure
Phase 4: Remaining Standards + Evaluation
Implementation Order
Issues 2–4 (skills) can run in parallel after the collection rename (1). The agent (5) can start in parallel with skills (references skill paths, content can be stubbed). Prompts (6) depend on the agent. 7 and 8 are final integration steps. Full research document with detailed agent frontmatter, skill adaptation tables, output format templates, and subagent prompt templates is available for reference internally. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
Issue Description
Labels:
enhancement,agents,security,code-quality,size: largeEpic: #63 - Engineering Agents Integration
Related Issues: Part of breaking down #63 into focused implementation tasks
Epic Context
This issue is part of the larger Engineering Agents Integration epic (#63), which aims to integrate 6 collaborative engineering agents from the engineering-team-agents repository into HVE Core. The epic provides multi-platform support (GitHub Copilot + Claude Code + AGENTS.md) while enhancing the existing research → plan → implement workflow with quality gates at strategic points.
This specific issue focuses on the during-implementation security validation phase, adding the Code Reviewer agent that catches OWASP vulnerabilities early with specific fixes before reaching
@pr-reviewfinal gate.Overview
Integrate the Code Reviewer agent to provide OWASP security pattern validation and code quality checks during development, with consideration for integration into existing PR workflows or as a new command.
User Story
As a developer on the HVE Core team, I want security validation during development with specific code fixes, so that I catch OWASP vulnerabilities early when fixes are small rather than during final PR review when context is lost.
Context
@pr-review= large refactors)@pr-reviewfinal gateSource Repository
Acceptance Criteria
.claude/agents/).github/agents/and.github/chatmodes/)docs/code-review/docs/templates/Technical Requirements
docs/code-review/[date]-[component]-review.mddocs/templates/code-review-report-template.mdImplementation Phases
Phase 1: Analysis & Design
@code-review)Phase 2: Port Agent
.claude/agents/).github/agents/).github/chatmodes/)Phase 3: Documentation Setup
docs/code-review/directoryPhase 4: Integration & Testing
Definition of Done
Dependencies
Related Documentation
Notes
Additional Context
No response
Beta Was this translation helpful? Give feedback.
All reactions