[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC

# [EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC

**Labels:** `enhancement`, `python`, `security`, `epic`, `MUST`

---

## Goal

Implement enterprise-grade security controls for production deployments including **API credential protection**, **SSRF prevention** for cloud environments, **secure multi-tenant isolation** via token scoping, and **granular RBAC** for delegated administration. These capabilities enable ContextForge to meet SOC2, FedRAMP, and enterprise security requirements.

## Why Now?

Enterprise customers require robust security controls before production deployment:

1. **Credential Protection**: Enterprises need assurance that API credentials are never exposed in responses, logs, or caches
2. **Cloud-Native Security**: Deployments on AWS/GCP/Azure require SSRF protection against cloud metadata attacks
3. **Multi-Tenant Isolation**: Organizations with multiple teams need cryptographically-enforced resource boundaries
4. **Delegated Administration**: Platform admins want to grant limited admin access without full superuser privileges
5. **Zero-Trust Architecture**: All authentication contexts must flow through WebSocket and RPC layers

These capabilities position ContextForge as enterprise-ready for regulated industries.

---

## 📖 User Stories

<details>
<summary>US-1: Security Engineer - API Credential Protection</summary>

**As a** Security Engineer
**I want** all API responses to protect sensitive credentials
**So that** secrets cannot be extracted via API access or response caching

**Acceptance Criteria:**

```gherkin
Given a gateway is configured with auth credentials:
 auth_type: "bearer"
 auth_token: "production-secret-token"
When any API returns gateway data (GET, POST, PUT, LIST)
Then the response should contain:
 - authToken: "*****" (masked display value)
 - authTokenUnmasked: null (never populated)
And cached responses should also be masked
And the pattern applies to all credential fields (token, header, username, password)
```

**Capabilities:**
- `GatewayRead.masked()` method for consistent credential protection
- All service return paths apply masking automatically
- Cache layer returns masked responses
- Applies to create, read, update, list, and cache operations

</details>

<details>
<summary>US-2: Cloud Architect - SSRF Prevention for Cloud Deployments</summary>

**As a** Cloud Architect
**I want** configurable SSRF protection that blocks cloud metadata access
**So that** the gateway is safe to deploy on AWS, GCP, and Azure

**Acceptance Criteria:**

```gherkin
Given the gateway is deployed in a cloud environment
When a tool, gateway, or resource URL targets cloud metadata:
 - http://169.254.169.254/latest/meta-data/
 - http://metadata.google.internal/
 - http://169.254.169.123/ (AWS IMDSv2)
Then the request is rejected with a clear validation error

Given development mode (default):
 SSRF_ALLOW_LOCALHOST=true
 SSRF_ALLOW_PRIVATE_NETWORKS=true
When targeting localhost or RFC1918 addresses
Then the request is allowed for local development

Given production mode:
 SSRF_ALLOW_LOCALHOST=false
 SSRF_ALLOW_PRIVATE_NETWORKS=false
When targeting any internal address
Then the request is rejected
```

**Capabilities:**
- `SSRF_PROTECTION_ENABLED` master switch (default: true)
- Configurable localhost and private network policies
- Hardcoded blocklist for cloud metadata (cannot be disabled)
- IPv4 and IPv6 support including link-local addresses

</details>

<details>
<summary>US-3: Platform Admin - Secure Multi-Tenant Resource Isolation</summary>

**As a** Platform Administrator
**I want** secure-first token scoping with explicit team boundaries
**So that** users only access resources they're authorized for

**Acceptance Criteria:**

```gherkin
Given a JWT token with various team claim states:

Scenario: Missing teams claim (secure default)
 When teams claim is absent from token
 Then user sees only public resources
 And private/team resources are hidden

Scenario: Empty teams array (explicit public-only)
 When token has teams: []
 Then user sees only public resources

Scenario: Null teams without admin (secure default)
 When token has teams: null AND is_admin: false
 Then user sees only public resources

Scenario: Null teams with admin (explicit admin bypass)
 When token has teams: null AND is_admin: true
 Then user sees all resources (admin override)

Scenario: Specific teams (team-scoped access)
 When token has teams: ["team-a", "team-b"]
 Then user sees public + team-a + team-b resources
 And other team resources are hidden
```

**Capabilities:**
- `normalize_token_teams()` for consistent token interpretation
- Secure-first defaults (ambiguous = minimum access)
- Team-scoped caching (public-only queries cached, team queries not)
- Dict-format team normalization (`[{"id": "t1"}]` → `["t1"]`)

</details>

<details>
<summary>US-4: Platform Admin - Granular RBAC for Delegated Administration</summary>

**As a** Platform Administrator
**I want** to grant specific admin capabilities without full superuser access
**So that** I can delegate tasks like "manage servers" without exposing other admin functions

**Acceptance Criteria:**

```gherkin
Given a user with limited permissions:
 permissions: ["servers.read", "servers.create", "servers.update"]
When accessing /admin/servers endpoints
Then access is granted for server operations
When accessing /admin/tools or /admin/gateways
Then access is denied with 403 Forbidden

Given a user with is_admin: true flag
When accessing any admin endpoint
Then explicit permission is still required
Because allow_admin_bypass=False on all routes

Given a user with any admin.* permission
When accessing the admin UI entry point
Then the admin middleware allows UI access
And specific operations require their own permissions
```

**Capabilities:**
- `@require_permission` decorators on all 177 admin routes
- `allow_admin_bypass=False` prevents superuser override
- `has_admin_permission()` for UI entry gate
- New fine-grained permissions: `admin.overview`, `admin.dashboard`, `admin.events`, `admin.grpc`, `admin.plugins`
- Entity permissions: `servers.*`, `tools.*`, `gateways.*`, `resources.*`, `prompts.*`, `a2a.*`, `tags.*`

</details>

<details>
<summary>US-5: Integration Developer - End-to-End Auth Context Propagation</summary>

**As an** Integration Developer
**I want** WebSocket connections to propagate authentication to RPC handlers
**So that** all request paths enforce consistent authorization

**Acceptance Criteria:**

```gherkin
Given a WebSocket connection with authenticated token:
 ws://gateway:4444/ws?token=<jwt>
When the client sends JSON-RPC requests:
 {"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}
Then the RPC handler receives:
 - Authorization: Bearer <validated-token>
 - X-Proxy-User: <user-identity>
And team scoping is enforced on RPC responses
And the same user context applies to all transports (HTTP, WS, RPC)
```

**Capabilities:**
- WebSocket auth token forwarding to `/rpc` endpoint
- X-Proxy-User header propagation for user identity
- Consistent auth context across all transport layers
- Token validation before forwarding (no raw passthrough)

</details>

<details>
<summary>US-6: DevOps Engineer - Consistent User Context Across Endpoints</summary>

**As a** DevOps Engineer
**I want** all endpoints to use a consistent user context format
**So that** logging, auditing, and debugging show uniform user information

**Acceptance Criteria:**

```gherkin
Given any authenticated request to any endpoint
When the request is processed
Then the user context should contain:
 - email: user identifier
 - is_admin: boolean flag
 - teams: normalized team list
And the format is consistent across:
 - REST API endpoints
 - Admin API endpoints
 - Teams router
 - RBAC router
 - RPC handlers
```

**Capabilities:**
- Standardized `current_user_ctx` format across all routers
- Consistent team normalization in all contexts
- Uniform logging and audit trail format

</details>

---

## 🏗 Architecture

### Token Scoping Flow

```mermaid
graph TB
 subgraph "Token Claims"
 T1[teams: missing]
 T2[teams: null + is_admin: false]
 T3[teams: null + is_admin: true]
 T4[teams: empty array]
 T5[teams: list of IDs]
 end

 subgraph "normalize_token_teams"
 N[Normalize Function]
 end

 subgraph "Access Level"
 PO[Public Only]
 AB[Admin Bypass - All]
 TS[Team Scoped]
 end

 T1 --> N --> PO
 T2 --> N --> PO
 T3 --> N --> AB
 T4 --> N --> PO
 T5 --> N --> TS
```

### SSRF Protection Flow

```mermaid
graph TB
 subgraph "URL Validation"
 URL[Incoming URL]
 PARSE[Parse Host/IP]
 CHECK{SSRF Check}
 end

 subgraph "Always Blocked"
 META[Cloud Metadata 169.254.169.254]
 GCP[GCP Metadata metadata.google.internal]
 LINK[Link-Local fe80::/10]
 end

 subgraph "Configurable"
 LOCAL[Localhost SSRF_ALLOW_LOCALHOST]
 PRIVATE[Private Networks SSRF_ALLOW_PRIVATE]
 end

 subgraph "Result"
 ALLOW[Allow Request]
 BLOCK[422 Validation Error]
 end

 URL --> PARSE --> CHECK
 CHECK -->|Cloud Metadata| BLOCK
 CHECK -->|Localhost| LOCAL
 CHECK -->|Private IP| PRIVATE
 LOCAL -->|Allowed| ALLOW
 LOCAL -->|Blocked| BLOCK
 PRIVATE -->|Allowed| ALLOW
 PRIVATE -->|Blocked| BLOCK
 CHECK -->|Public IP| ALLOW
```

---

## 📋 Implementation Tasks

### Credential Protection ✅

- [x] Implement `GatewayRead.masked()` to null out unmasked fields
- [x] Apply masking in `create_gateway()` response
- [x] Apply masking in `update_gateway()` response
- [x] Apply masking in `get_gateway()` response
- [x] Apply masking in `list_gateways()` response
- [x] Apply masking to cached gateway reads
- [x] Add security tests for credential protection

### SSRF Prevention ✅

- [x] Add SSRF configuration settings to `config.py`
- [x] Implement `_validate_ssrf()` URL validator
- [x] Hardcode cloud metadata blocklist (169.254.x.x, metadata.google.internal)
- [x] Make localhost policy configurable (default: allow)
- [x] Make private network policy configurable (default: allow)
- [x] Document settings in `.env.example`
- [x] Add Helm chart configuration for Kubernetes
- [x] Add comprehensive documentation

### Multi-Tenant Token Scoping ✅

- [x] Implement `normalize_token_teams()` in `auth.py`
- [x] Integrate with `_get_token_teams_from_request()` in `main.py`
- [x] Update `token_scoping.py` middleware
- [x] Implement secure caching (only cache public-only queries)
- [x] Apply token scoping to all list endpoints
- [x] Apply token scoping to gateway forwarding
- [x] Add tests for all token claim combinations

### Granular Admin RBAC ✅

- [x] Add new permissions to `Permissions` class in `db.py`
- [x] Add `allow_admin_bypass` parameter to RBAC decorators
- [x] Implement `has_admin_permission()` in permission service
- [x] Update `AdminAuthMiddleware` to use capability check
- [x] Apply `@require_permission` to all 177 admin routes
- [x] Set `allow_admin_bypass=False` on all admin decorators
- [x] Update RBAC documentation

### Auth Context Propagation ✅

- [x] Forward Authorization header in WebSocket to RPC
- [x] Forward X-Proxy-User header for identity
- [x] Validate token before forwarding
- [x] Test end-to-end auth flow

### Consistent User Context ✅

- [x] Standardize user context format across routers
- [x] Update Teams router endpoints
- [x] Update RBAC router endpoints
- [x] Ensure consistent logging format

---

## ⚙️ Configuration

### SSRF Protection Settings

```bash
# Master switch (default: enabled)
SSRF_PROTECTION_ENABLED=true

# Development-friendly defaults
SSRF_ALLOW_LOCALHOST=true
SSRF_ALLOW_PRIVATE_NETWORKS=true

# Always blocked (hardcoded, cannot be overridden)
# - 169.254.169.254/32 (AWS/Azure metadata)
# - 169.254.169.123/32 (AWS IMDSv2)
# - 169.254.0.0/16 (link-local)
# - metadata.google.internal (GCP)
# - fe80::/10 (IPv6 link-local)
```

### Production Hardening

```bash
# Strict mode for cloud deployments
SSRF_ALLOW_LOCALHOST=false
SSRF_ALLOW_PRIVATE_NETWORKS=false
```

---

## ✅ Success Criteria

- [x] API credentials never exposed in any response path
- [x] Cloud metadata endpoints blocked on all cloud platforms
- [x] SSRF policies configurable for dev vs prod environments
- [x] Tokens with missing/empty teams get public-only access
- [x] Admin bypass requires explicit `teams: null` + `is_admin: true`
- [x] All 177 admin routes enforce granular permissions
- [x] `is_admin` flag alone cannot bypass permission checks
- [x] WebSocket auth propagates to RPC layer
- [x] User context format consistent across all endpoints
- [x] All existing tests pass
- [x] New security tests added for each capability

---

## 🏁 Definition of Done

- [x] Credential masking implemented and tested
- [x] SSRF protection with configurable policies
- [x] Secure-first token scoping with `normalize_token_teams()`
- [x] Granular RBAC on all admin routes
- [x] Token-scoped filtering on list endpoints and gateway forwarding
- [x] WebSocket auth forwarding to RPC
- [x] Consistent user context across all endpoints
- [x] Documentation updated (configuration, RBAC guide)
- [x] Code passes `make verify` checks

---

## 📝 Additional Notes

🔹 **Secure-First Design**: Ambiguous token states (missing teams, empty teams) default to minimum access. This prevents privilege escalation from malformed tokens.

🔹 **Cloud Metadata Protection**: The blocklist for cloud metadata IPs is hardcoded and cannot be disabled via configuration. This ensures protection even if operators misconfigure SSRF settings.

🔹 **Strict RBAC**: The `is_admin` flag no longer bypasses permission checks on admin routes. Admins must have explicit permissions granted, enabling fine-grained delegation.

🔹 **Backward Compatibility**: Properly-formed tokens with explicit team claims continue to work unchanged. Only edge cases with missing/null claims are affected.

🔹 **Database Sessions**: All endpoints use `db: Session = Depends(get_db)`. Never use `current_user_ctx["db"]` which is `None` by design.

---

## 📚 References

- [OWASP SSRF Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html)
- [AWS IMDS Security Best Practices](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html)
- [OWASP API Security Top 10](https://owasp.org/API-Security/)
- [NIST Zero Trust Architecture](https://csrc.nist.gov/publications/detail/sp/800-207/final)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC #2663

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC

Goal

Why Now?

📖 User Stories

🏗 Architecture

Token Scoping Flow

SSRF Protection Flow

📋 Implementation Tasks

Credential Protection ✅

SSRF Prevention ✅

Multi-Tenant Token Scoping ✅

Granular Admin RBAC ✅

Auth Context Propagation ✅

Consistent User Context ✅

⚙️ Configuration

SSRF Protection Settings

Production Hardening

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

📚 References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC #2663

Description

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC

Goal

Why Now?

📖 User Stories

🏗 Architecture

Token Scoping Flow

SSRF Protection Flow

📋 Implementation Tasks

Credential Protection ✅

SSRF Prevention ✅

Multi-Tenant Token Scoping ✅

Granular Admin RBAC ✅

Auth Context Propagation ✅

Consistent User Context ✅

⚙️ Configuration

SSRF Protection Settings

Production Hardening

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

📚 References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions