Pulse

Landing Page
Operation Summary Diagrams
Detailed Vulnerability Report
Ev

💡 Inspiration

Penetration testing costs $15,000–$40,000 per engagement and takes weeks - because every step requires a skilled human: choose the right tools, run them, parse dense terminal output, separate real findings from noise, and write a coherent report. Meanwhile, most software teams ship code that's never been properly tested.

We wanted to find out: can an LLM-orchestrated agentic pipeline close that gap? Not a wrapper around a single tool - a genuine end-to-end pipeline that plans its own approach, runs 12 industry-standard security tools, interprets the results, constructs a causal exploit graph, and delivers a professional vulnerability report, all without a human in the chair.

The constraint we set: the entire system must run offline (including LLMs) on ARM architecture. When you're scanning a target, you're mapping its attack surface - that data can't go to a cloud API. Pulse defaults to Ollama with llama3.1:8b, meaning zero bytes leave your machine.

Revive your application before it becomes a headline with Pulse!

⚙️ What it does

Pulse accepts a URL or a local repository path and runs a fully automated penetration testing pipeline end-to-end:

Dynamic planning: An LLM reads a structural fingerprint of the target and decides which security agents to invoke. For repositories, this fingerprint is built from file tree/extensions/dependency manifests. For URLs, Pulse first runs a quick pre-scan fingerprint (httpx + whatweb) and then selects relevant web agents. Under weak/uncertain URL signal, guardrails keep a conservative baseline (recon + SQLi + XSS) to avoid under-testing.
Parallel tool execution per agent: Each agent node runs its toolset, drops the LLM call if tools produce no output, and accumulates structured findings into shared state. Operational tool errors — missing binaries, timeouts — are logged but never promoted into findings.
Live LLM streaming: Every LLM call — planner reasoning, tool interpretation, attack chain construction, and report writing — streams individual tokens to the UI in real time via Server-Sent Events.
Attack chain synthesis: After all agents complete, a dedicated node reasons over the full findings set using a MITRE ATT&CK-aligned prompt to produce a causal exploit graph with nodes, directed edges, justifications, and a rendered Mermaid diagram.
Structured vulnerability report: A final agent synthesises everything into a Markdown report: executive summary, findings table, detailed per-finding breakdown with evidence and remediation, attack chain narrative, and a risk score out of 10.

✨ Features

LangGraph state machine with 12 registered agent nodes
LLM-driven dynamic agent selection for both repository and URL targets
Real-time LLM token streaming to the UI via SSE
MITRE ATT&CK-aligned multi-step attack chain graph with causal edge validation
Live execution pipeline panel with per-agent running/done/queued status
Findings sorted by severity with expandable evidence snippets
Findings-by-component bar chart and interactive attack chain visualisation
Exportable Markdown vulnerability report
Swappable LLM backend: Ollama (local/offline), OpenAI, or Anthropic Claude
Fully Dockerised backend — no security tools installed on the host machine
Hot-reload of backend Python code via Docker volume mounts
OWASP Juice Shop bundled as a ready-to-scan test target
Dual target mode: web URLs and source code repositories, with distinct planning pipelines for each
Full evidence drawer: click any finding to open a slide-in panel with untruncated raw tool output

- Port map visualisation allowing nmap results rendered as a scannable table with risk-tiered colour coding

Tools our Agents Use

🌐 Recon httpx probes live HTTP endpoints for status codes, titles, redirect chains, and detected technologies. nmap performs a real TCP port scan across 10,000 ports with service and version detection. whatweb fingerprints server software, frameworks, and CMS versions.

💉 SQL Injection sqlmap fires real injection payloads against discovered forms and endpoints in batch detection mode. If a parameter is injectable, sqlmap finds it — these are confirmed exploitable injections, not theoretical warnings.

🎯 XSS dalfox injects real reflected XSS payloads using its built-in payload library against every discovered input surface. Findings are confirmed hits against the live target.

🔬 Static Analysis semgrep runs its full auto-detect ruleset across the repository source. bandit applies Python-specific security linting. cppcheck checks C/C++ code for memory safety violations and undefined behaviour.

📦 Dependency Auditing pip-audit queries the OSV database for CVEs in Python packages. npm audit queries the npm advisory registry for Node.js packages. Only agents relevant to the detected stack are selected — a pure-Python repo never runs npm audit.

🔐 Secrets Scanning trufflehog scans files and commit history for high-entropy strings and known credential patterns. detect-secrets runs a second independent pass with its own pattern matcher. Two tools, one pass, no single point of failure.

⛓️ Attack Chain Synthesis After all agents complete, a dedicated reasoning node receives the full confirmed findings set and constructs a causal exploit graph aligned to MITRE ATT&CK. A "no phantom edges" rule is enforced — an edge from A → B is only drawn if exploiting A is a necessary prerequisite for B. Output includes typed nodes, justified directed edges, a plain-English narrative, and a rendered Mermaid diagram.

📄 Report Generation A final agent synthesises everything into a structured Markdown report: executive summary, findings table sorted by severity, per-finding breakdown with evidence and remediation, the full attack chain narrative, and a risk score out of 10.

Every finding Pulse reports was produced by an industry-standard offensive tool executing against the real target. sqlmap found the injection. dalfox confirmed the reflection. trufflehog found the secret. The LLM interprets and synthesises — the ground truth comes from the tools.

🛠️ How we built it

Pulse is split into a Python backend and a Next.js frontend, connected by a FastAPI REST + SSE API.

Orchestration layer — LangGraph The core of Pulse is a compiled StateGraph. All nodes share a single GraphState (Pydantic model) that accumulates findings, the attack chain, and the report as execution progresses. Routing between nodes is fully deterministic — driven by the agents_plan list the planner emits — using conditional edges that advance through the plan without any further LLM routing decisions.

Planner For repository targets the planner walks the file tree (up to 1,000 files), builds a compact fingerprint of root files and per-directory extension counts, and passes it to the LLM with a strict JSON schema prompt. For URL targets it first performs a fast pre-scan fingerprint (httpx + whatweb) and then asks the LLM to pick relevant web agents, with conservative guardrails when signal quality is weak. Planner responses are parsed with brace-depth JSON extraction, validated against allowlists, and retried once with a JSON-repair prompt before a safe fallback is used.

Tool layer Each tool is a @tool-decorated function wrapping a subprocess.run call. Tools return a typed dict with results, total, and error. Every agent node calls _has_real_output() before passing anything to the LLM — operational errors never become findings.

Tools in use: httpx, nmap, whatweb, sqlmap, dalfox, cppcheck, semgrep, bandit, pip-audit, npm audit, trufflehog, detect-secrets.

LLM interpretation Each agent node calls a shared _llm_interpret() helper with the combined raw tool output. The LLM returns a JSON array of findings (severity, title, description, evidence, remediation, component). A markdown-fence stripper and brace-depth JSON extractor handle minor formatting deviations before parsing.

Streaming ScanStreamCallback (a LangChain callback) intercepts tokens on every LLM call and appends them — prefixed with null-byte sentinels to delimit agent blocks — to scan.llm_log. The frontend consumes this via an EventSource on the SSE endpoint, re-parsing the sentinel structure to render per-agent reasoning blocks with a live cursor animation.

Frontend Next.js 15 app router, Tailwind CSS, shadcn/ui components, Motion for animations, ReactFlow for the attack chain graph, Recharts for the findings bar chart, and ReactMarkdown with remark-gfm for the vulnerability report.

🚧 Challenges we ran into

LLM output reliability: Smaller local models (llama3.1:8b on Ollama) struggle to emit clean JSON on the first pass, especially for the planner. We layered three recovery strategies: markdown-fence stripping, brace-depth JSON extraction, and a second-pass JSON-repair LLM call, before falling back to a broad safe plan.
Tool error vs. real findings: Security tools produce a lot of noise — installation errors, timeouts, warnings on stderr — that an LLM can mistake for vulnerabilities. We built _has_real_output() to gate every LLM call and added explicit false-positive suppression rules in the interpreter system prompt.
Shared accumulating state across nodes: LangGraph's state passing meant we had to be deliberate about how findings grow. Each node returns {"findings": state.findings + new_findings} to extend, not overwrite, the list as the graph progresses.
Docker networking for local repo scanning: The backend runs inside Linux containers but needs to scan folders on the macOS host. The dev compose mounts /tmp and /Users read-only into the container, and the planner validates path accessibility before attempting a scan.
Streaming across a thread boundary: Piping per-token callbacks through SSE to the frontend while the graph blocks inside asyncio.to_thread() required careful sentinel design so the frontend can reconstruct which agent emitted which tokens without a race condition.

🏆 Accomplishments we're proud of

Entirely local by default: Pulse runs the full pipeline — LLM, all security tools, and the frontend — without sending a single byte to the cloud, using Ollama as the default backend.
Genuinely dynamic agent selection: The planner doesn't run every tool blindly. A pure-C repo will never waste time on pip-audit. A JavaScript project won't get cppcheck. The LLM reads file evidence and the routing plan reflects it, with hallucination grounding to prevent the model from claiming languages it didn't actually see.
Attack chain with real causal reasoning: The attack chain node applies MITRE ATT&CK categories, enforces an explicit "no phantom edges" rule (an edge from A → B is only valid if exploiting A is a necessary prerequisite for B), and produces a plain-English narrative alongside a rendered graph.
Zero-latency-feel streaming: Watching the LLM reason through tool output token-by-token in the Agent Reasoning Console while the scan is still in progress makes the system feel genuinely live in a way that polling-based approaches can't match.

📚 What we learned

Prompt grounding is load-bearing. An LLM told to summarise a repo's architecture will confidently invent languages it didn't see. Adding explicit grounding rules — cross-checking LLM output against the actual file-tree signals — cut hallucinated architecture summaries to near zero.
The gap between "agentic" and "multi-agent" is real. Pulse is one orchestrator with specialist stages, not independent agents with their own memory and goals. That constraint makes it more predictable and auditable, but true multi-agent delegation would unlock significantly deeper exploitation reasoning.
Tool-use reliability requires defensive engineering at every layer. FileNotFoundError, subprocess timeout, malformed JSON, LLM false positives, HTML-entity escaping in cppcheck XML — each required its own guard. Security tooling is not clean.
LangGraph is well-suited to sequential agentic pipelines. Compiled state machines with conditional edges give you reproducible execution order, clean state passing, and out-of-the-box traceability — all things that matter when a scan takes several minutes and must not silently skip a step.

🚀 Future work

Parallel agent execution: The current pipeline is sequential. Independent agents (e.g., static_c and deps_py on a repo) have no data dependency and could run concurrently using LangGraph's parallel branches, halving scan time.
Iterative exploitation loops: After the attack chain is built, dispatch targeted follow-up agents to probe the most critical link. For example, if SQLi is identified, automatically attempt schema extraction and report what was accessible.
Authenticated scan support: Add session cookie / API key injection so agents can scan authenticated routes and internal API surfaces, not just public-facing endpoints.
CVE enrichment: Cross-reference dependency findings against the NVD and GitHub advisories API to pull real CVSS scores, PoC links, and patch versions directly into findings.
Persistent scan history: Replace the in-memory scan store with a database so scans survive backend restarts and users can compare reports across time.
CI/CD integration: Expose a scan-trigger API and publish a GitHub Action so Pulse can run on every pull request and block merges when critical-severity findings are introduced.

🧱 Stack

Layer	Technology
Frontend	Next.js 15, Tailwind CSS, shadcn/ui, Motion, ReactFlow, Recharts
Backend API	FastAPI, Python 3.13, uv
Orchestration	LangGraph, LangChain
LLM (default)	Ollama — llama3.1:8b (local, offline)
LLM (optional)	OpenAI GPT-4o, Anthropic Claude Sonnet
Security tools	httpx, nmap, whatweb, sqlmap, dalfox, cppcheck, semgrep, bandit, pip-audit, npm audit, trufflehog, detect-secrets
Streaming	Server-Sent Events via sse-starlette
Containerisation	Docker, Docker Compose
Test target	OWASP Juice Shop

Built With

docker
fastapi
langchain
langgraph
nextjs
ollama
python

Submitted to

birmingHack 2.0
- Winner [Hackathons UK] Best Use of AI Agents on Arm

Created by

Implemented scan planning and execution logic for Pulse's agentic pentesting pipeline; added public GitHub repository ingestion, enhanced threat-modelling architecture, and frontend UI/UX improvements.

Siddharth Shringarpure
BSc Artificial Intelligence & Computer Science
Adwit Mukherji
Parla Tellioglu

Updates

Siddharth Shringarpure started this project — Mar 15, 2026 07:07 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.