Skip to content

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC #2663

@crivetimihai

Description

@crivetimihai

[EPIC][SECURITY]: Enterprise Security Controls - Credential Protection, SSRF Prevention, Multi-Tenant Isolation & Granular RBAC

Labels: enhancement, python, security, epic, MUST


Goal

Implement enterprise-grade security controls for production deployments including API credential protection, SSRF prevention for cloud environments, secure multi-tenant isolation via token scoping, and granular RBAC for delegated administration. These capabilities enable ContextForge to meet SOC2, FedRAMP, and enterprise security requirements.

Why Now?

Enterprise customers require robust security controls before production deployment:

  1. Credential Protection: Enterprises need assurance that API credentials are never exposed in responses, logs, or caches
  2. Cloud-Native Security: Deployments on AWS/GCP/Azure require SSRF protection against cloud metadata attacks
  3. Multi-Tenant Isolation: Organizations with multiple teams need cryptographically-enforced resource boundaries
  4. Delegated Administration: Platform admins want to grant limited admin access without full superuser privileges
  5. Zero-Trust Architecture: All authentication contexts must flow through WebSocket and RPC layers

These capabilities position ContextForge as enterprise-ready for regulated industries.


📖 User Stories

US-1: Security Engineer - API Credential Protection

As a Security Engineer
I want all API responses to protect sensitive credentials
So that secrets cannot be extracted via API access or response caching

Acceptance Criteria:

Given a gateway is configured with auth credentials:
  auth_type: "bearer"
  auth_token: "production-secret-token"
When any API returns gateway data (GET, POST, PUT, LIST)
Then the response should contain:
  - authToken: "*****" (masked display value)
  - authTokenUnmasked: null (never populated)
And cached responses should also be masked
And the pattern applies to all credential fields (token, header, username, password)

Capabilities:

  • GatewayRead.masked() method for consistent credential protection
  • All service return paths apply masking automatically
  • Cache layer returns masked responses
  • Applies to create, read, update, list, and cache operations
US-2: Cloud Architect - SSRF Prevention for Cloud Deployments

As a Cloud Architect
I want configurable SSRF protection that blocks cloud metadata access
So that the gateway is safe to deploy on AWS, GCP, and Azure

Acceptance Criteria:

Given the gateway is deployed in a cloud environment
When a tool, gateway, or resource URL targets cloud metadata:
  - http://169.254.169.254/latest/meta-data/
  - http://metadata.google.internal/
  - http://169.254.169.123/ (AWS IMDSv2)
Then the request is rejected with a clear validation error

Given development mode (default):
  SSRF_ALLOW_LOCALHOST=true
  SSRF_ALLOW_PRIVATE_NETWORKS=true
When targeting localhost or RFC1918 addresses
Then the request is allowed for local development

Given production mode:
  SSRF_ALLOW_LOCALHOST=false
  SSRF_ALLOW_PRIVATE_NETWORKS=false
When targeting any internal address
Then the request is rejected

Capabilities:

  • SSRF_PROTECTION_ENABLED master switch (default: true)
  • Configurable localhost and private network policies
  • Hardcoded blocklist for cloud metadata (cannot be disabled)
  • IPv4 and IPv6 support including link-local addresses
US-3: Platform Admin - Secure Multi-Tenant Resource Isolation

As a Platform Administrator
I want secure-first token scoping with explicit team boundaries
So that users only access resources they're authorized for

Acceptance Criteria:

Given a JWT token with various team claim states:

Scenario: Missing teams claim (secure default)
  When teams claim is absent from token
  Then user sees only public resources
  And private/team resources are hidden

Scenario: Empty teams array (explicit public-only)
  When token has teams: []
  Then user sees only public resources

Scenario: Null teams without admin (secure default)
  When token has teams: null AND is_admin: false
  Then user sees only public resources

Scenario: Null teams with admin (explicit admin bypass)
  When token has teams: null AND is_admin: true
  Then user sees all resources (admin override)

Scenario: Specific teams (team-scoped access)
  When token has teams: ["team-a", "team-b"]
  Then user sees public + team-a + team-b resources
  And other team resources are hidden

Capabilities:

  • normalize_token_teams() for consistent token interpretation
  • Secure-first defaults (ambiguous = minimum access)
  • Team-scoped caching (public-only queries cached, team queries not)
  • Dict-format team normalization ([{"id": "t1"}]["t1"])
US-4: Platform Admin - Granular RBAC for Delegated Administration

As a Platform Administrator
I want to grant specific admin capabilities without full superuser access
So that I can delegate tasks like "manage servers" without exposing other admin functions

Acceptance Criteria:

Given a user with limited permissions:
  permissions: ["servers.read", "servers.create", "servers.update"]
When accessing /admin/servers endpoints
Then access is granted for server operations
When accessing /admin/tools or /admin/gateways
Then access is denied with 403 Forbidden

Given a user with is_admin: true flag
When accessing any admin endpoint
Then explicit permission is still required
Because allow_admin_bypass=False on all routes

Given a user with any admin.* permission
When accessing the admin UI entry point
Then the admin middleware allows UI access
And specific operations require their own permissions

Capabilities:

  • @require_permission decorators on all 177 admin routes
  • allow_admin_bypass=False prevents superuser override
  • has_admin_permission() for UI entry gate
  • New fine-grained permissions: admin.overview, admin.dashboard, admin.events, admin.grpc, admin.plugins
  • Entity permissions: servers.*, tools.*, gateways.*, resources.*, prompts.*, a2a.*, tags.*
US-5: Integration Developer - End-to-End Auth Context Propagation

As an Integration Developer
I want WebSocket connections to propagate authentication to RPC handlers
So that all request paths enforce consistent authorization

Acceptance Criteria:

Given a WebSocket connection with authenticated token:
  ws://gateway:4444/ws?token=<jwt>
When the client sends JSON-RPC requests:
  {"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}
Then the RPC handler receives:
  - Authorization: Bearer <validated-token>
  - X-Proxy-User: <user-identity>
And team scoping is enforced on RPC responses
And the same user context applies to all transports (HTTP, WS, RPC)

Capabilities:

  • WebSocket auth token forwarding to /rpc endpoint
  • X-Proxy-User header propagation for user identity
  • Consistent auth context across all transport layers
  • Token validation before forwarding (no raw passthrough)
US-6: DevOps Engineer - Consistent User Context Across Endpoints

As a DevOps Engineer
I want all endpoints to use a consistent user context format
So that logging, auditing, and debugging show uniform user information

Acceptance Criteria:

Given any authenticated request to any endpoint
When the request is processed
Then the user context should contain:
  - email: user identifier
  - is_admin: boolean flag
  - teams: normalized team list
And the format is consistent across:
  - REST API endpoints
  - Admin API endpoints
  - Teams router
  - RBAC router
  - RPC handlers

Capabilities:

  • Standardized current_user_ctx format across all routers
  • Consistent team normalization in all contexts
  • Uniform logging and audit trail format

🏗 Architecture

Token Scoping Flow

graph TB
    subgraph "Token Claims"
        T1[teams: missing]
        T2[teams: null + is_admin: false]
        T3[teams: null + is_admin: true]
        T4[teams: empty array]
        T5[teams: list of IDs]
    end

    subgraph "normalize_token_teams"
        N[Normalize Function]
    end

    subgraph "Access Level"
        PO[Public Only]
        AB[Admin Bypass - All]
        TS[Team Scoped]
    end

    T1 --> N --> PO
    T2 --> N --> PO
    T3 --> N --> AB
    T4 --> N --> PO
    T5 --> N --> TS
Loading

SSRF Protection Flow

graph TB
    subgraph "URL Validation"
        URL[Incoming URL]
        PARSE[Parse Host/IP]
        CHECK{SSRF Check}
    end

    subgraph "Always Blocked"
        META[Cloud Metadata<br/>169.254.169.254]
        GCP[GCP Metadata<br/>metadata.google.internal]
        LINK[Link-Local<br/>fe80::/10]
    end

    subgraph "Configurable"
        LOCAL[Localhost<br/>SSRF_ALLOW_LOCALHOST]
        PRIVATE[Private Networks<br/>SSRF_ALLOW_PRIVATE]
    end

    subgraph "Result"
        ALLOW[Allow Request]
        BLOCK[422 Validation Error]
    end

    URL --> PARSE --> CHECK
    CHECK -->|Cloud Metadata| BLOCK
    CHECK -->|Localhost| LOCAL
    CHECK -->|Private IP| PRIVATE
    LOCAL -->|Allowed| ALLOW
    LOCAL -->|Blocked| BLOCK
    PRIVATE -->|Allowed| ALLOW
    PRIVATE -->|Blocked| BLOCK
    CHECK -->|Public IP| ALLOW
Loading

📋 Implementation Tasks

Credential Protection ✅

  • Implement GatewayRead.masked() to null out unmasked fields
  • Apply masking in create_gateway() response
  • Apply masking in update_gateway() response
  • Apply masking in get_gateway() response
  • Apply masking in list_gateways() response
  • Apply masking to cached gateway reads
  • Add security tests for credential protection

SSRF Prevention ✅

  • Add SSRF configuration settings to config.py
  • Implement _validate_ssrf() URL validator
  • Hardcode cloud metadata blocklist (169.254.x.x, metadata.google.internal)
  • Make localhost policy configurable (default: allow)
  • Make private network policy configurable (default: allow)
  • Document settings in .env.example
  • Add Helm chart configuration for Kubernetes
  • Add comprehensive documentation

Multi-Tenant Token Scoping ✅

  • Implement normalize_token_teams() in auth.py
  • Integrate with _get_token_teams_from_request() in main.py
  • Update token_scoping.py middleware
  • Implement secure caching (only cache public-only queries)
  • Apply token scoping to all list endpoints
  • Apply token scoping to gateway forwarding
  • Add tests for all token claim combinations

Granular Admin RBAC ✅

  • Add new permissions to Permissions class in db.py
  • Add allow_admin_bypass parameter to RBAC decorators
  • Implement has_admin_permission() in permission service
  • Update AdminAuthMiddleware to use capability check
  • Apply @require_permission to all 177 admin routes
  • Set allow_admin_bypass=False on all admin decorators
  • Update RBAC documentation

Auth Context Propagation ✅

  • Forward Authorization header in WebSocket to RPC
  • Forward X-Proxy-User header for identity
  • Validate token before forwarding
  • Test end-to-end auth flow

Consistent User Context ✅

  • Standardize user context format across routers
  • Update Teams router endpoints
  • Update RBAC router endpoints
  • Ensure consistent logging format

⚙️ Configuration

SSRF Protection Settings

# Master switch (default: enabled)
SSRF_PROTECTION_ENABLED=true

# Development-friendly defaults
SSRF_ALLOW_LOCALHOST=true
SSRF_ALLOW_PRIVATE_NETWORKS=true

# Always blocked (hardcoded, cannot be overridden)
# - 169.254.169.254/32 (AWS/Azure metadata)
# - 169.254.169.123/32 (AWS IMDSv2)
# - 169.254.0.0/16 (link-local)
# - metadata.google.internal (GCP)
# - fe80::/10 (IPv6 link-local)

Production Hardening

# Strict mode for cloud deployments
SSRF_ALLOW_LOCALHOST=false
SSRF_ALLOW_PRIVATE_NETWORKS=false

✅ Success Criteria

  • API credentials never exposed in any response path
  • Cloud metadata endpoints blocked on all cloud platforms
  • SSRF policies configurable for dev vs prod environments
  • Tokens with missing/empty teams get public-only access
  • Admin bypass requires explicit teams: null + is_admin: true
  • All 177 admin routes enforce granular permissions
  • is_admin flag alone cannot bypass permission checks
  • WebSocket auth propagates to RPC layer
  • User context format consistent across all endpoints
  • All existing tests pass
  • New security tests added for each capability

🏁 Definition of Done

  • Credential masking implemented and tested
  • SSRF protection with configurable policies
  • Secure-first token scoping with normalize_token_teams()
  • Granular RBAC on all admin routes
  • Token-scoped filtering on list endpoints and gateway forwarding
  • WebSocket auth forwarding to RPC
  • Consistent user context across all endpoints
  • Documentation updated (configuration, RBAC guide)
  • Code passes make verify checks

📝 Additional Notes

🔹 Secure-First Design: Ambiguous token states (missing teams, empty teams) default to minimum access. This prevents privilege escalation from malformed tokens.

🔹 Cloud Metadata Protection: The blocklist for cloud metadata IPs is hardcoded and cannot be disabled via configuration. This ensures protection even if operators misconfigure SSRF settings.

🔹 Strict RBAC: The is_admin flag no longer bypasses permission checks on admin routes. Admins must have explicit permissions granted, enabling fine-grained delegation.

🔹 Backward Compatibility: Properly-formed tokens with explicit team claims continue to work unchanged. Only edge cases with missing/null claims are affected.

🔹 Database Sessions: All endpoints use db: Session = Depends(get_db). Never use current_user_ctx["db"] which is None by design.


📚 References

Metadata

Metadata

Assignees

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeenhancementNew feature or requestepicLarge feature spanning multiple issuespythonPython / backend development (FastAPI)securityImproves security

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions