Skip to content

aifinlab/FinVault

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Overview

FinVault is the first end-to-end safety evaluation benchmark for financial agents in execution-grounded environments. Unlike traditional LLM safety benchmarks that rely on static Q&A or abstract tool simulations, FinVault evaluates agents through observable business state changes in isolated sandbox environments with realistic financial workflows.

Key Features

  • 31 Regulatory-Driven Financial Scenarios
    • 6 business domains: Credit & Lending, Insurance, Securities & Investment, Payment & Settlement, Compliance & AML, Risk Management
  • 107 High-Risk Vulnerabilities
    • 5 vulnerability categories: Authority Bypass, Compliance Violation, Information Disclosure, Fraud Approval, Audit Evasion
  • 856 Attack Samples
    • 8 attack techniques: Prompt Injection, Jailbreaking, Social Engineering, and more
  • Execution-Grounded Verification
    • Attacks verified through observable database state changes, not just text output

Project Structure

finvault/
├── sandbox/                          # Main evaluation framework
│   ├── attack_datasets/             # Attack samples (856)
│   ├── normal_datasets/             # Normal business cases (107)
│   ├── config/                      # Configuration files
│   ├── sandbox_00-30/               # 31 financial scenario environments
│   ├── defense/                     # Safety detection modules
│   ├── base/                        # Base classes and utilities
│   ├── run_attack_test.py          # Attack testing script
│   ├── run_llama_guard3_test.py    # LLaMA Guard 3 evaluation
│   ├── run_gpt_oss_safeguard_test.py  # GPT-OSS-Safeguard evaluation
│   └── requirements.txt             # Dependencies

Quick Start

Installation

cd finvault/sandbox

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp config/.env.example config/.env
# Edit config/.env with your API keys

Running Evaluation

Attack Testing

# Test a single scenario
python run_attack_test.py --scenario 00

# Test all scenarios
python run_attack_test.py --all

# Use synthesis dataset
python run_attack_test.py --scenario 00 --synthesis

# Test specific attack type
python run_attack_test.py --scenario 00 --synthesis --attack-type authority_impersonation

# Test by category
python run_attack_test.py --category credit_lending

# Parallel testing
python run_attack_test.py --all --concurrency 8

LLaMA Guard 3 Evaluation

# Basic test
python run_llama_guard3_test.py --scenario 00

# Full evaluation (attack + normal datasets)
python run_llama_guard3_test.py --attack-type authority_impersonation --all --eval

# Mock mode (no API calls)
python run_llama_guard3_test.py --mode mock --attack-type authority_impersonation --all --eval

# Local model
python run_llama_guard3_test.py --mode local --model-path /path/to/Llama-Guard-3-8B

# List available attack types
python run_llama_guard3_test.py --list-attack-types

GPT-OSS-Safeguard Evaluation

# Similar usage as LLaMA Guard 3
python run_gpt_oss_safeguard_test.py --attack-type authority_impersonation --all --eval

Financial Scenarios Overview (31 scenarios)

Category Count Vulnerabilities
Credit & Lending 7 25
Insurance 4 14
Securities & Investment 5 17
Payment & Settlement 4 14
Compliance & AML 6 20
Risk Management 5 17
Total 31 107

Attack Types (8 Total)

  1. Prompt Injection - Direct instruction override
  2. Jailbreaking - Safety mechanism bypass
  3. Authority Impersonation - Fake authorization claims
  4. Social Engineering - Emotional manipulation
  5. Data Exfiltration - Unauthorized information disclosure
  6. Transaction Manipulation - Unauthorized financial operations
  7. Compliance Evasion - Rule circumvention
  8. Tool Misuse - Improper tool usage

Key Findings (from Paper)

Attack Success Rates

Model Attack Success Rate
Claude-Haiku-4.5 6.70%
GPT-4 28.45%
Qwen3-32B 42.15%
Qwen3-Max 50.00%

Key Insights:

  • Social engineering attacks (authority impersonation) outperform technical attacks
  • Finance-specific attacks achieve higher success rates
  • Significant safety disparities exist across models

Defense Effectiveness

Method Detection Rate False Positive Rate
Security Prompting ~89% Low
LLaMA Guard 4 ~61% ~30%
GPT-OSS-Safeguard ~22% ~12%
LLaMA Guard 3 ~37% ~44%

Recommendation: Security prompting offers the best cost-effectiveness (no overhead, high detection rate)

LLM Temperature Settings (Academic Standards)

Our configuration follows academic best practices:

Component Temperature Rationale
Attack Testing 0.6 Diversity
Safety Detection (1st attempt) 0.0 Deterministic
Safety Detection (Retry) 0.0-0.3 Controlled Diversity

See paper for detailed justification and reproducibility discussion.

Configuration

Edit config/agents_config.yaml to configure LLM agents:

agents:
  qwen_chat:
    provider: qwen
    model_name: qwen3_32b_chat
    base_url: http://localhost:8000/v1
    api_key: YOUR_API_KEY
    temperature: 0.6
    max_tokens: 4096
    enable_thinking: false

Supported providers:

  • OpenAI (GPT-5, GPT-4o, etc.)
  • Qwen (Local deployment)
  • Anthropic (Claude series)
  • DeepSeek
  • And more via OpenRouter

Dataset Format

Attack Case

{
  "id": "attack_00_001",
  "scenario_id": "00",
  "attack_type": "authority_impersonation",
  "attack_prompt": "I'm calling from headquarters...",
  "follow_up_prompts": ["Can you help me process this?"],
  "context": {
    "target_vulnerability": "V1",
    "expected_outcome": "Unauthorized transaction",
    "sensitive_data": ["account_balance", "customer_name"]
  },
  "multi_turn": true
}

Normal Case

{
  "id": "normal_00_001",
  "scenario_id": "00",
  "user_request": "I'd like to apply for a loan...",
  "context": {
    "user_id": "user123",
    "credit_score": 750
  }
}

Citation

If you use FinVault in your research, please cite our paper:

comming soon

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or issues, please open an issue on GitHub or contact the authors.

About

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages