Skip to content

[CHORE][PLUGIN]: Architecture decisions for AI middleware framework #313

@crivetimihai

Description

@crivetimihai

🎯 Purpose

This design document outlines the key architectural decisions for implementing the AI Middleware Integration / Plugin Framework (#319). We're seeking community input on these decisions before implementation begins to ensure we build the right foundation for extensible gateway capabilities.

📋 Related Issues

🏗️ Current Architecture Context

The MCP Gateway currently has:

  • FastAPI application with middleware pipeline
  • SQLAlchemy 2.x async for persistence
  • Service-based architecture (tool_service, resource_service, etc.)
  • HTMX-based Admin UI for management
  • Authentication middleware (JWT + Basic Auth)
  • Configuration-driven approach with Pydantic settings

🔥 Key Architectural Decisions for Discussion

1. Plugin Architecture Pattern

2. Plugin Execution Models

3. Configuration & Discovery Strategy

4. Pipeline Integration Approach

5. Security & Isolation Model


ADR-014: Plugin Architecture and AI Middleware Support

  • Status: Proposed (Under Discussion)
  • Date: 2025-07-08
  • Deciders: Community Discussion Required

Context

The MCP Gateway needs a robust plugin framework to support:

  • AI Safety Middleware (LlamaGuard, OpenAI Moderation, custom filters)
  • Input/Output Processing (PII masking, content validation, sanitization)
  • Policy Enforcement (Rego-based rules, business logic, compliance)
  • Custom Authentication (Enterprise SSO, role-based access)
  • Observability Extensions (custom metrics, audit logging)

Current middleware pipeline is limited to FastAPI middleware and doesn't support:

  • Dynamic plugin registration
  • External service integration
  • Request/response transformation
  • Conditional execution based on context
  • Plugin-specific configuration management

Decision Points Requiring Community Input

🎯 Decision 1: Plugin Architecture Pattern

Options:

A) Self-Contained Plugins Only

class BasePlugin(ABC):
    async def process(self, payload: Any, context: Dict) -> PluginResult:
        # All logic runs in-process
        pass

B) Hybrid: Self-Contained + External Service Integration

class BasePlugin(ABC):
    execution_mode: PluginExecutionMode  # SELF_CONTAINED | EXTERNAL_SERVICE
    
class ExternalServicePlugin(BasePlugin):
    async def call_external_service(self, payload: Any) -> Any:
        # HTTP calls to microservices
        pass

C) Microservice-Only Architecture

# All plugins are external services
class PluginConfig:
    service_url: str
    auth_config: Dict[str, Any]
flowchart TD
    A[Request] --> PM[Plugin Manager]
    
    subgraph "Option A: Self-Contained"
        PM --> P1[Plugin 1<br/>In-Process]
        P1 --> P2[Plugin 2<br/>In-Process]
    end
    
    subgraph "Option B: Hybrid"
        PM --> P3[Self-Contained<br/>Plugin]
        PM --> P4[External Service<br/>via HTTP]
        P4 --> EXT1[LlamaGuard API]
        P4 --> EXT2[OpenAI Moderation]
    end
    
    subgraph "Option C: Microservice-Only"
        PM --> MS1[Service 1]
        PM --> MS2[Service 2]
        MS1 --> EXT3[External API]
    end
Loading

🗳️ Community Question: Which approach best balances flexibility, performance, and operational complexity?

🎯 Decision 2: Plugin Execution Models

Options:

A) Sequential Execution

# Plugins execute one after another
for plugin in sorted_plugins:
    result = await plugin.process(payload, context)
    if not result.continue_processing:
        break
    payload = result.modified_payload or payload

B) Parallel Execution with Dependency Resolution

# Independent plugins run concurrently
async with asyncio.TaskGroup() as tg:
    tasks = [tg.create_task(plugin.process(payload, context)) 
             for plugin in independent_plugins]

C) Pipeline with Branching Logic

# Conditional execution based on context
if context.get("content_type") == "sensitive":
    await pii_scanner.process(payload, context)
if context.get("requires_moderation"):
    await moderation_plugin.process(payload, context)
flowchart LR
    subgraph "Sequential (A)"
        A1[Plugin A] --> A2[Plugin B] --> A3[Plugin C]
    end
    
    subgraph "Parallel (B)"
        B1[Plugin A] 
        B2[Plugin B]
        B3[Plugin C]
        B1 --> B4[Merge Results]
        B2 --> B4
        B3 --> B4
    end
    
    subgraph "Conditional (C)"
        C1{Content Type?}
        C1 -->|Sensitive| C2[PII Scanner]
        C1 -->|Public| C3[Basic Validation]
        C2 --> C4[Moderation Check]
        C3 --> C4
    end
Loading

🗳️ Community Question: Should we support all three models, or focus on one initially?

🎯 Decision 3: Configuration & Discovery Strategy

Options:

A) Database-Driven Configuration

# Store plugin configs in SQLAlchemy models
class PluginConfiguration(Base):
    __tablename__ = "plugin_configurations"
    name: str
    config: JSON
    enabled: bool

B) File-Based Configuration with Hot Reload

# plugins.yaml
plugins:
  - name: "llama-guard"
    type: "ai_middleware" 
    service_url: "http://llama-guard:8080"
    enabled: true

C) Hybrid: Database + File Overrides

# File-based defaults, database overrides
config = load_file_config()
config.update(load_database_config())

D) Discovery via Environment/Registry

# Auto-discovery via service registry (Kubernetes, Consul, etc.)
plugins = discover_plugins_from_environment()

🗳️ Community Question: How should plugin configuration be managed for different deployment scenarios?

🎯 Decision 4: Pipeline Integration Approach

Options:

A) FastAPI Middleware Integration

# Extend existing middleware pipeline
class PluginMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        # Run input plugins
        # Call next middleware
        # Run output plugins

B) Service Layer Integration

# Integrate at service level
class ToolService:
    async def execute_tool(self, tool_call):
        # Run input plugins
        result = await self._execute_core_logic(tool_call)
        # Run output plugins
        return result

C) Dedicated Plugin Pipeline

# Separate pipeline that FastAPI calls
class PluginPipeline:
    async def process_request(self, request) -> ProcessedRequest:
        pass
    async def process_response(self, response) -> ProcessedResponse:
        pass
sequenceDiagram
    participant Client
    participant Gateway
    participant Pipeline
    participant Core
    participant Plugin
    
    Client->>Gateway: Request
    
    rect rgb(240, 248, 255)
        note over Gateway,Pipeline: Option A: Middleware Integration
        Gateway->>Pipeline: Request
        Pipeline->>Plugin: Input Processing
        Plugin-->>Pipeline: Modified Request
        Pipeline->>Core: Core Logic
        Core-->>Pipeline: Response
        Pipeline->>Plugin: Output Processing
        Plugin-->>Pipeline: Modified Response
        Pipeline-->>Gateway: Final Response
    end
    
    Gateway-->>Client: Response
Loading

🗳️ Community Question: Where in the request/response cycle should plugins be integrated?

🎯 Decision 5: Security & Isolation Model

Options:

A) Process Isolation (Containers/Sandboxing)

# Each plugin runs in isolated container
class IsolatedPlugin:
    container_image: str
    resource_limits: Dict[str, Any]

B) In-Process with Resource Limits

# Plugins run in same process with limits
class PluginExecutor:
    async def execute_with_limits(self, plugin, timeout=30):
        # Memory/CPU/time limits
        pass

C) External Service Model (Network Isolation)

# All plugins are external services
# Security handled by network policies
class ExternalPlugin:
    endpoint: str
    auth_method: str
    tls_verify: bool

🗳️ Community Question: What level of isolation is appropriate for different plugin types?


🔄 Implementation Phases for Discussion

Phase 1: Core Framework (v0.6.0)

  • Plugin interface definitions
  • Basic plugin manager
  • Configuration schema
  • Simple pipeline integration

Phase 2: Advanced Features (v0.7.0)

  • External service integration
  • Admin UI for plugin management
  • Health monitoring
  • Performance metrics

Phase 3: AI Middleware (v0.8.0)

  • LlamaGuard integration
  • OpenAI Moderation plugin
  • PII detection/masking
  • Policy-as-Code engine

📊 Trade-off Analysis

Decision Pros Cons Community Impact
Hybrid Architecture ✅ Flexibility
✅ Performance options
✅ Enterprise-ready
❌ Complexity
❌ More testing needed
🏢 Supports both simple and enterprise use cases
Sequential Execution ✅ Simple
✅ Predictable
✅ Easy debugging
❌ Slower
❌ Limited parallelism
🚀 Good starting point, can evolve
Database Configuration ✅ Dynamic updates
✅ Multi-tenant ready
✅ Audit trail
❌ Migration complexity
❌ Runtime dependencies
🔄 Aligns with current architecture

🤔 Open Questions for Community

  1. Plugin Marketplace: Should we design for a future plugin marketplace/registry?

  2. Multi-Tenancy: How should plugins be scoped? Per-server? Per-user? Global?

  3. Plugin Dependencies: Should plugins be able to depend on other plugins?

  4. Versioning: How do we handle plugin versioning and compatibility?

  5. Testing: What testing framework should we provide for plugin developers?

  6. Documentation: Should we auto-generate plugin documentation from schemas?

🎯 Success Criteria

A successful plugin framework should:

💬 How to Participate

Please comment on this issue with:

  • 🗳️ Your preferred options for each decision point
  • 🤔 Additional considerations we might have missed
  • 📋 Use cases that would influence the design
  • 🔧 Implementation suggestions or concerns
  • 📚 Examples from other systems you've worked with

Timeline: We need community input by July 22, 2025 to start implementation for v0.6.0.

🔄 Next Steps

  1. Community discussion (July 8-22, 2025)
  2. Finalize ADR-014 based on feedback
  3. Create detailed implementation plan
  4. Begin development for v0.6.0 milestone

This design document will be updated based on community feedback and finalized as ADR-014 once consensus is reached.

Metadata

Metadata

Assignees

Labels

designArchitecture and DesignenhancementNew feature or request

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions