FinVault is the first end-to-end safety evaluation benchmark for financial agents in execution-grounded environments. Unlike traditional LLM safety benchmarks that rely on static Q&A or abstract tool simulations, FinVault evaluates agents through observable business state changes in isolated sandbox environments with realistic financial workflows.
- 31 Regulatory-Driven Financial Scenarios
- 6 business domains: Credit & Lending, Insurance, Securities & Investment, Payment & Settlement, Compliance & AML, Risk Management
- 107 High-Risk Vulnerabilities
- 5 vulnerability categories: Authority Bypass, Compliance Violation, Information Disclosure, Fraud Approval, Audit Evasion
- 856 Attack Samples
- 8 attack techniques: Prompt Injection, Jailbreaking, Social Engineering, and more
- Execution-Grounded Verification
- Attacks verified through observable database state changes, not just text output
finvault/
├── sandbox/ # Main evaluation framework
│ ├── attack_datasets/ # Attack samples (856)
│ ├── normal_datasets/ # Normal business cases (107)
│ ├── config/ # Configuration files
│ ├── sandbox_00-30/ # 31 financial scenario environments
│ ├── defense/ # Safety detection modules
│ ├── base/ # Base classes and utilities
│ ├── run_attack_test.py # Attack testing script
│ ├── run_llama_guard3_test.py # LLaMA Guard 3 evaluation
│ ├── run_gpt_oss_safeguard_test.py # GPT-OSS-Safeguard evaluation
│ └── requirements.txt # Dependencies
cd finvault/sandbox
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp config/.env.example config/.env
# Edit config/.env with your API keys# Test a single scenario
python run_attack_test.py --scenario 00
# Test all scenarios
python run_attack_test.py --all
# Use synthesis dataset
python run_attack_test.py --scenario 00 --synthesis
# Test specific attack type
python run_attack_test.py --scenario 00 --synthesis --attack-type authority_impersonation
# Test by category
python run_attack_test.py --category credit_lending
# Parallel testing
python run_attack_test.py --all --concurrency 8# Basic test
python run_llama_guard3_test.py --scenario 00
# Full evaluation (attack + normal datasets)
python run_llama_guard3_test.py --attack-type authority_impersonation --all --eval
# Mock mode (no API calls)
python run_llama_guard3_test.py --mode mock --attack-type authority_impersonation --all --eval
# Local model
python run_llama_guard3_test.py --mode local --model-path /path/to/Llama-Guard-3-8B
# List available attack types
python run_llama_guard3_test.py --list-attack-types# Similar usage as LLaMA Guard 3
python run_gpt_oss_safeguard_test.py --attack-type authority_impersonation --all --eval| Category | Count | Vulnerabilities |
|---|---|---|
| Credit & Lending | 7 | 25 |
| Insurance | 4 | 14 |
| Securities & Investment | 5 | 17 |
| Payment & Settlement | 4 | 14 |
| Compliance & AML | 6 | 20 |
| Risk Management | 5 | 17 |
| Total | 31 | 107 |
- Prompt Injection - Direct instruction override
- Jailbreaking - Safety mechanism bypass
- Authority Impersonation - Fake authorization claims
- Social Engineering - Emotional manipulation
- Data Exfiltration - Unauthorized information disclosure
- Transaction Manipulation - Unauthorized financial operations
- Compliance Evasion - Rule circumvention
- Tool Misuse - Improper tool usage
| Model | Attack Success Rate |
|---|---|
| Claude-Haiku-4.5 | 6.70% |
| GPT-4 | 28.45% |
| Qwen3-32B | 42.15% |
| Qwen3-Max | 50.00% |
Key Insights:
- Social engineering attacks (authority impersonation) outperform technical attacks
- Finance-specific attacks achieve higher success rates
- Significant safety disparities exist across models
| Method | Detection Rate | False Positive Rate |
|---|---|---|
| Security Prompting | ~89% | Low |
| LLaMA Guard 4 | ~61% | ~30% |
| GPT-OSS-Safeguard | ~22% | ~12% |
| LLaMA Guard 3 | ~37% | ~44% |
Recommendation: Security prompting offers the best cost-effectiveness (no overhead, high detection rate)
Our configuration follows academic best practices:
| Component | Temperature | Rationale |
|---|---|---|
| Attack Testing | 0.6 | Diversity |
| Safety Detection (1st attempt) | 0.0 | Deterministic |
| Safety Detection (Retry) | 0.0-0.3 | Controlled Diversity |
See paper for detailed justification and reproducibility discussion.
Edit config/agents_config.yaml to configure LLM agents:
agents:
qwen_chat:
provider: qwen
model_name: qwen3_32b_chat
base_url: http://localhost:8000/v1
api_key: YOUR_API_KEY
temperature: 0.6
max_tokens: 4096
enable_thinking: falseSupported providers:
- OpenAI (GPT-5, GPT-4o, etc.)
- Qwen (Local deployment)
- Anthropic (Claude series)
- DeepSeek
- And more via OpenRouter
{
"id": "attack_00_001",
"scenario_id": "00",
"attack_type": "authority_impersonation",
"attack_prompt": "I'm calling from headquarters...",
"follow_up_prompts": ["Can you help me process this?"],
"context": {
"target_vulnerability": "V1",
"expected_outcome": "Unauthorized transaction",
"sensitive_data": ["account_balance", "customer_name"]
},
"multi_turn": true
}{
"id": "normal_00_001",
"scenario_id": "00",
"user_request": "I'd like to apply for a loan...",
"context": {
"user_id": "user123",
"credit_score": 750
}
}If you use FinVault in your research, please cite our paper:
comming soonThis project is licensed under the MIT License - see the LICENSE file for details.
For questions or issues, please open an issue on GitHub or contact the authors.