[FEATURE][SECURITY]: Implement database-level security constraints and SQL injection prevention

# [FEATURE][SECURITY]: Database-Level Security Constraints and SQL Injection Prevention

## Goal

Add database-level CHECK constraints and validation to ensure data integrity and provide defense-in-depth against SQL injection attacks. While SQLAlchemy ORM provides parameterized queries, these additional constraints prevent malicious content storage even if application-layer validation is bypassed.

## Why Now?

1. **Defense-in-Depth**: Multiple validation layers reduce attack surface
2. **Data Integrity**: Ensure only valid data formats stored in database
3. **Compliance**: Security audits require database-level protections
4. **Path Traversal Prevention**: Block `../` patterns in URIs
5. **XSS Prevention**: Block script tags in user-controlled fields

---

## 📖 User Stories

<details>
<summary><strong>US-1: Security Engineer - Prevent Malicious Data Storage</strong></summary>

**As a** security engineer
**I want** CHECK constraints to reject malicious input
**So that** even bypassed validation can't store harmful data

**Acceptance Criteria:**

```gherkin
Scenario: Reject HTML in prompt names
  Given a database with CHECK constraints
  When inserting a prompt with name "<script>alert(1)</script>"
  Then the database should reject the insert
  And raise an IntegrityError

Scenario: Reject path traversal in resource URIs
  Given a database with CHECK constraints
  When inserting a resource with uri "../../../etc/passwd"
  Then the database should reject the insert
  And log the attempt

Scenario: Reject JavaScript URLs in tools
  Given a database with CHECK constraints
  When inserting a tool with url "javascript:alert(1)"
  Then the database should reject the insert
  And the error should indicate invalid URL format
```

**Technical Requirements:**
- CHECK constraints on prompts, tools, resources, gateways, servers tables
- Pattern matching for HTML tags, JavaScript URLs, path traversal
- SQLite and PostgreSQL compatible regex patterns

</details>

<details>
<summary><strong>US-2: Developer - Safe Query Construction</strong></summary>

**As a** developer
**I want** query sanitization utilities
**So that** I can safely build dynamic queries

**Acceptance Criteria:**

```gherkin
Scenario: Sanitize LIKE patterns
  Given a search term "test%drop"
  When sanitizing for LIKE query
  Then the result should be "test\\%drop"
  And special characters should be escaped

Scenario: Validate column names
  Given a sort column from user input
  When validating against whitelist ["name", "created_at"]
  Then unknown columns should raise ValueError
  And valid columns should pass through

Scenario: Build safe IN clauses
  Given a list of IDs [1, 2, 3]
  When building an IN clause
  Then the result should use parameterized placeholders
  And no raw values should be in the SQL
```

**Technical Requirements:**
- QuerySanitizer class with utility methods
- Column whitelist validation
- LIKE pattern escaping
- Parameterized IN clause builder

</details>

<details>
<summary><strong>US-3: Operator - Database Security Configuration</strong></summary>

**As a** platform operator
**I want** configurable database security settings
**So that** I can tune security for my environment

**Acceptance Criteria:**

```gherkin
Scenario: Configure statement timeout
  Given DB_STATEMENT_TIMEOUT=30000
  When a query runs for 35 seconds
  Then the query should be cancelled
  And an error should be logged

Scenario: Configure max connections
  Given DB_MAX_CONNECTIONS=100
  When 101 connections are attempted
  Then the 101st should be rejected
  And connection pool should remain stable

Scenario: Enable query logging in debug mode
  Given LOG_LEVEL=DEBUG
  And DB_ENABLE_SQL_LOG=true
  When a query executes
  Then the SQL should be logged
```

**Technical Requirements:**
- DB_STATEMENT_TIMEOUT setting
- DB_MAX_CONNECTIONS setting
- DB_ENABLE_SQL_LOG setting
- Connection timeout configuration

</details>

<details>
<summary><strong>US-4: Security Admin - Model-Level Validation</strong></summary>

**As a** security administrator
**I want** SQLAlchemy models to validate data before insert
**So that** invalid data is caught at the application layer

**Acceptance Criteria:**

```gherkin
Scenario: Validate tool name format
  Given a Tool model instance
  When setting name to "invalid name with spaces!"
  Then a ValueError should be raised
  And the message should indicate valid format

Scenario: Validate URL scheme
  Given a Tool model instance
  When setting url to "file:///etc/passwd"
  Then a ValueError should be raised
  And only http/https/ws/wss should be allowed

Scenario: Validate name length
  Given a Tool model instance
  When setting name longer than 255 characters
  Then a ValueError should be raised
```

**Technical Requirements:**
- @validates decorators on models
- Name format regex validation
- URL scheme validation
- Length validation

</details>

---

## 🏗 Architecture

### Validation Layers

```mermaid
flowchart TD
    A[API Request] --> B[Pydantic Validation]
    B --> C[SQLAlchemy @validates]
    C --> D[Database CHECK Constraints]
    D --> E[Data Stored]

    B -->|Invalid| F[HTTP 422]
    C -->|Invalid| G[ValueError]
    D -->|Invalid| H[IntegrityError]
```

### CHECK Constraints by Table

```mermaid
erDiagram
    PROMPTS {
        string name "CHECK: no HTML, length <= 255"
        string template "CHECK: no script tags"
    }
    TOOLS {
        string name "CHECK: alphanumeric_underscore"
        string url "CHECK: http(s)/ws(s) scheme"
    }
    RESOURCES {
        string uri "CHECK: no path traversal"
        string mime_type "CHECK: valid MIME format"
    }
    GATEWAYS {
        string transport "CHECK: IN (SSE, STREAMABLEHTTP, STDIO)"
        string url "CHECK: http(s)/ws(s) scheme"
    }
    SERVERS {
        string name "CHECK: safe characters only"
        string icon "CHECK: https or data:image"
    }
```

---

## 📋 Implementation Tasks

### Phase 1: CHECK Constraints Migration
- [ ] Create alembic migration for CHECK constraints
- [ ] Add prompts table constraints (name format, template safety)
- [ ] Add tools table constraints (name format, URL scheme)
- [ ] Add resources table constraints (URI format, no traversal)
- [ ] Add gateways table constraints (transport enum, URL scheme)
- [ ] Add servers table constraints (name format, icon URL)
- [ ] Make migration idempotent (check before adding)

### Phase 2: Model-Level Validation
- [ ] Add @validates to Tool model (name, url)
- [ ] Add @validates to Prompt model (name)
- [ ] Add @validates to Resource model (uri)
- [ ] Add @validates to Gateway model (transport, url)
- [ ] Add @validates to Server model (name, icon)
- [ ] Add table-level CheckConstraint definitions

### Phase 3: Query Sanitization Utilities
- [ ] Create QuerySanitizer class
- [ ] Implement sanitize_like_pattern()
- [ ] Implement validate_column_name()
- [ ] Implement validate_sort_order()
- [ ] Implement build_safe_in_clause()

### Phase 4: Database Connection Security
- [ ] Add connection security settings to config
- [ ] Implement statement timeout
- [ ] Implement connection pooling limits
- [ ] Add SQLite PRAGMA settings
- [ ] Add application_name for monitoring

### Phase 5: Testing
- [ ] Unit tests for CHECK constraint violations
- [ ] Unit tests for model validation
- [ ] Unit tests for query sanitization
- [ ] Integration tests for SQL injection attempts
- [ ] Performance tests for constraint overhead

### Phase 6: Data Cleanup Script
- [ ] Create scripts/sanitize_database.py
- [ ] Fix existing HTML in names
- [ ] Fix invalid URLs
- [ ] Log all changes made

---

## ⚙️ Configuration Example

```bash
# Database Security Settings
DB_STATEMENT_TIMEOUT=30000       # 30 second query timeout
DB_MAX_CONNECTIONS=100           # Maximum connection pool size
DB_CONNECTION_TIMEOUT=10         # Connection acquisition timeout
DB_ENABLE_SQL_LOG=false          # Only enable in debug mode

# Query Limits
MAX_QUERY_RESULTS=1000           # Maximum results per query
MAX_BATCH_SIZE=100               # Maximum batch operation size

# Security Flags
ENABLE_QUERY_SANITIZATION=true   # Enable query sanitization utilities
```

---

## ✅ Success Criteria

- [ ] All tables have appropriate CHECK constraints
- [ ] Model-level validation catches invalid data
- [ ] SQL injection attempts are blocked
- [ ] Path traversal attempts are blocked
- [ ] XSS vectors in data are rejected
- [ ] Query sanitization utilities working
- [ ] Database connection security configured
- [ ] Performance impact < 5% overhead

---

## 🏁 Definition of Done

- [ ] Alembic migration created and tested
- [ ] CHECK constraints on all relevant tables
- [ ] @validates decorators on models
- [ ] QuerySanitizer class implemented
- [ ] Database security settings in config
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Data cleanup script created
- [ ] Code passes `make verify`
- [ ] Documentation complete

---

## 🔗 Related Issues

- Related: #535 (Audit Logging - logs rejected attempts)
- Related: #534 (Security Config Validation)
- Reference: OWASP SQL Injection Prevention Cheat Sheet



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE][SECURITY]: Implement database-level security constraints and SQL injection prevention #342

[FEATURE][SECURITY]: Database-Level Security Constraints and SQL Injection Prevention

Goal

Why Now?

📖 User Stories

🏗 Architecture

Validation Layers

CHECK Constraints by Table

📋 Implementation Tasks

Phase 1: CHECK Constraints Migration

Phase 2: Model-Level Validation

Phase 3: Query Sanitization Utilities

Phase 4: Database Connection Security

Phase 5: Testing

Phase 6: Data Cleanup Script

⚙️ Configuration Example

✅ Success Criteria

🏁 Definition of Done

🔗 Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE][SECURITY]: Implement database-level security constraints and SQL injection prevention #342

Description

[FEATURE][SECURITY]: Database-Level Security Constraints and SQL Injection Prevention

Goal

Why Now?

📖 User Stories

🏗 Architecture

Validation Layers

CHECK Constraints by Table

📋 Implementation Tasks

Phase 1: CHECK Constraints Migration

Phase 2: Model-Level Validation

Phase 3: Query Sanitization Utilities

Phase 4: Database Connection Security

Phase 5: Testing

Phase 6: Data Cleanup Script

⚙️ Configuration Example

✅ Success Criteria

🏁 Definition of Done

🔗 Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions