Spider-Sense

Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

An event-driven defense framework allowing agents to maintain latent vigilance and trigger defenses only upon risk perception.

📖 Abstract

As large language models (LLMs) evolve into autonomous agents, most existing defense mechanisms adopt a mandatory checking paradigm, which forcibly triggers security validation at predefined stages regardless of actual risk. This approach leads to high latency and computational redundancy.

We propose Spider-Sense, an event-driven defense framework based on Intrinsic Risk Sensing (IRS). It allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, Spider-Sense invokes a Hierarchical Adaptive Screening (HAS) mechanism that trades off efficiency and precision: resolving known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning.

⚡ Framework Comparison

The Problem: Mandatory Checking

Existing frameworks rely on forced, repetitive external security checks at every stage (Plan, Action, Observation), leading to rapidly accumulating latency and disrupting normal user interaction.

The Solution: Intrinsic Risk Sensing (IRS)

Spider-Sense utilizes proactive, endogenous risk awareness to dynamically trigger targeted analysis only when anomalies are sensed, significantly reducing overhead.

🔬 Method Overview

The framework operates on a Detect-Audit-Respond cycle:

Intrinsic Risk Sensing (IRS): The agent maintains a latent state of vigilance. It continuously monitors artifacts across four stages (Query, Plan, Action, Observation).
Sensing Indicator: Upon perceiving a risk, the agent generates a specific indicator (e.g., <|verify_user_intent|>), pausing execution.
Hierarchical Adaptive Screening (HAS):
- Coarse-grained Detection: Fast vector matching against a database of known attack patterns.
- Fine-grained Analysis: Deep reasoning by an LLM for ambiguous or low-similarity cases.
Autonomous Decision: The agent decides to Resume execution (if safe) or Refuse/Sanitize (if unsafe).

🛡️ Defense Modules

Spider-Sense protects four security-critical stages using specialized defense tags:

Stage	Module Tag	Function	Trigger Condition
Query	`<\|verify_user_intent\|>`	Agent Logic Hijacking	When user input attempts to jailbreak or override instructions.
Plan	`<\|validate_memory_plan\|>`	Thought-Process Manipulation	When reasoning or retrieved memories show signs of poisoning.
Action	`<\|audit_action_parameters\|>`	Tool-Use Exploitation	Before executing high-risk tools with suspicious parameters.
Observation	`<\|sanitize_observation\|>`	Indirect Prompt Injection	Upon receiving tool outputs containing hidden malicious commands.

📊 S²Bench

To facilitate rigorous evaluation, we introduce S²Bench, a lifecycle-aware benchmark featuring:

Multi-stage Attacks: Covers Query, Plan, Action, and Observation stages.
Realistic Tool Execution: Involves actual tool selection and parameter generation (approx. 300 functions).
Hard Benign Prompts: Includes 153 carefully constructed benign samples to test for over-defense (False Positives).

📈 Performance Results

Spider-Sense achieves state-of-the-art defense performance with minimal latency on S2Bench, Mind2Web, and eICU.

Lowest Attack Success Rate (ASR): Effectively blocks sophisticated multi-stage attacks.
Lowest False Positive Rate (FPR): Distinguishes subtle benign intents from attacks.
Marginal Latency Overhead: Only ~8.3%, compared to >200% for some mandatory checking baselines.

Ablation Studies

IRS Importance: Removing sensing at any stage leads to significant ASR proliferation, especially at the Action stage.
HAS Balance: Combining coarse detection and fine analysis yields the best trade-off between safety and efficiency.

🔍 Case Study

Scenario: A clinical analysis agent retrieving patient records via a utility tool.

Attack: The tool return content is poisoned with injected code import fake_module to induce unauthorized execution.
Detection: The agent's IRS activates the sensing indicator <|sanitize_observation|>.
HAS Screening: The content is routed to the inspector. It is identified as contextually unjustified.
Response: The agent autonomously terminates execution, intercepting the attack.

🚀 Quick Start

1. Installation

git clone https://github.com/your-repo/SpiderSense.git
cd SpiderSense
pip install -r requirements.txt

2. Configuration

Set up your API keys. We recommend using OpenRouter for broad model support.

# Create .env file
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxx

3. Run Defense Test

Enable Intrinsic Risk Sensing (IRS) using the defense template:

python main_attacker.py \
    --llm_name qwen-max \
    --defense_template template/spider_template.txt \
    --template_position system \
    --res_file results/spider_sense_test.csv

📂 Project Structure

SpiderSense/
├── config/                         # Experiment configs
│   ├── Defense_2.yml               # Default SpiderSense config
│   ├── DPI.yml                     # Direct Prompt Injection config
│   ├── OPI.yml                     # Indirect Prompt Injection config
│   └── MP.yml                      # Memory Poisoning config
├── data/                           # Benchmarks & Attack Datasets
├── pyopenagi/
│   └── agents/                     # Agent implementations (inc. sandbox.py)
├── template/
│   ├── spider_template.txt         # Core Defense-y1 Protocol
│   └── sandbox_judge_*.txt         # Judge prompts for each stage
├── main_attacker.py                # Entry point for single-case testing
└── scripts/                        # Batch execution scripts

🔧 Configuration

SpiderSense uses YAML files in the config/ directory to manage experiment settings.

Config File	Description
Defense_2.yml	Standard configuration enabling IRS and all defense modules.
DPI.yml	Specialized for testingDirect Prompt Injection attacks.
OPI.yml	Specialized forIndirect Prompt Injection (Observation stage).
MP.yml	Specialized forMemory Poisoning (Retrieval stage).
DPI_MP.yml	CombinedDPI + Memory Poisoning attack.
DPI_OPI.yml	CombinedDPI + Indirect Prompt Injection attack.
OPI_MP.yml	CombinedOPI + Memory Poisoning attack.
Tool_Injection.yml	Tool Description Injection: Manipulating tool definitions to mislead agents.
Adv_Tools.yml	Adversarial Tools: Injecting decoy tools with confusing names.
Lies_Loop.yml	Lies in the Loop: Forcing agents to deceive simulated human approvers.
Logic_Backdoor.yml	Reasoning Backdoor: Triggering logical fallacies or suspended safety protocols.

To use a specific config:

python scripts/agent_attack.py --cfg_path config/DPI.yml

🚀 Advanced Usage

For large-scale benchmarking, use the scripts in the scripts/ directory.

Run Serial Benchmark (Stage 1)

Executes attacks sequentially. Useful for debugging or smaller models.

python scripts/run_stage_1_serial.py \
    --llm_name gpt-4o-mini \
    --num_attack 10 \
    --num_fp 10

Run Parallel Benchmark (Stage 4)

Executes attacks in parallel to speed up evaluation.

python scripts/run_stage_4_parallel.py \
    --llm_name qwen-max \
    --num_attack 50

📖 Citation

If you use Spider-Sense or S²Bench in your research, please cite our paper:

@misc{yu2026spidersenseintrinsicrisksensing,
      title={Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening}, 
      author={Zhenxiong Yu and Zhi Yang and Zhiheng Jin and Shuhe Wang and Heng Zhang and Yanlin Fei and Lingfeng Zeng and Fangqi Lou and Shuo Zhang and Tu Hu and Jingping Liu and Rongze Chen and Xingyu Zhu and Kunyi Wang and Chaofa Yuan and Xin Guo and Zhaowei Liu and Feipeng Zhang and Jie Huang and Huacan Wang and Ronghao Chen and Liwen Zhang},
      year={2026},
      eprint={2602.05386},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2602.05386}, 
}

🙏 Acknowledgments

This work was supported by the AIFin Lab and the QuantaAlpha.

About AIFin Lab

Initiated by Professor Liwen Zhang from Shanghai University of Finance and Economics (SUFE), AIFin Lab is deeply rooted in the interdisciplinary fields of AI + Finance, Statistics, and Data Science. The team brings together cutting-edge scholars from top institutions such as SUFE, FDU, SEU, CMU, and CUHK. We are dedicated to building a comprehensive "full-link" system covering data, models, benchmarks, and intelligent prompting. We are actively looking for talented students (UG/Master/PhD) and researchers worldwide who are passionate about AI Agent security and financial intelligence to join AIFin Lab!

If you are interested in contributing to this project or exploring research collaborations, please send your CV/introduction to: 📩 aifinlab.sufe@gmail.com and CC to: 📧 zhang.liwen@shufe.edu.cn

We look forward to hearing from you!

About QuantaAlpha

Founded in April 2025, QuantaAlpha is composed of professors, postdoctoral researchers, PhDs, and Master's students from prestigious universities including THU, PKU, CAS, CMU, and HKUST. Our mission is to explore the "Quanta" of intelligence and lead the "Alpha" frontier of agent research—ranging from CodeAgents and self-evolving intelligence to financial and cross-domain specialized agents—dedicated to reshaping the boundaries of artificial intelligence.

In 2026, we will continue to produce high-quality research results in areas such as CodeAgent (end-to-end autonomous execution of real-world tasks), DeepResearch, Agentic Reasoning/Agentic RL, and Self-Evolution & Collaborative Learning. We welcome students interested in our research directions to join us!
Team Homepage: https://quantaalpha.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
HAS_db		HAS_db
aios		aios
config		config
data		data
images		images
memory_db		memory_db
part_meta_pattern		part_meta_pattern
pyopenagi		pyopenagi
scripts		scripts
template		template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ch.md		README_ch.md
Spider_Sense_Arxiv.pdf		Spider_Sense_Arxiv.pdf
main_attacker.py		main_attacker.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider-Sense

📖 Abstract

⚡ Framework Comparison

The Problem: Mandatory Checking

The Solution: Intrinsic Risk Sensing (IRS)

🔬 Method Overview

🛡️ Defense Modules

📊 S²Bench

📈 Performance Results

Ablation Studies

🔍 Case Study

🚀 Quick Start

1. Installation

2. Configuration

3. Run Defense Test

📂 Project Structure

🔧 Configuration

🚀 Advanced Usage

Run Serial Benchmark (Stage 1)

Run Parallel Benchmark (Stage 4)

📖 Citation

🙏 Acknowledgments

About AIFin Lab

About QuantaAlpha

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spider-Sense

📖 Abstract

⚡ Framework Comparison

The Problem: Mandatory Checking

The Solution: Intrinsic Risk Sensing (IRS)

🔬 Method Overview

🛡️ Defense Modules

📊 S2Bench

📈 Performance Results

Ablation Studies

🔍 Case Study

🚀 Quick Start

1. Installation

2. Configuration

3. Run Defense Test

📂 Project Structure

🔧 Configuration

🚀 Advanced Usage

Run Serial Benchmark (Stage 1)

Run Parallel Benchmark (Stage 4)

📖 Citation

🙏 Acknowledgments

About AIFin Lab

About QuantaAlpha

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📊 S²Bench

Packages