Issue 1960 Fix high-impact performance issues in llm-guard plugin by tedhabeck · Pull Request #2638 · IBM/mcp-context-forge

tedhabeck · 2026-02-01T18:19:09Z

🔗 Related Issue

Closes #1960

📝 Summary

#1960

Summary of Commits on Branch issue-1960

The issue-1960 branch contains 35+ commits focused on improving the LLMGuard plugin's performance, code quality, and maintainability. Here's a summary organized by category:

Major Enhancements

Code Quality & Refactoring (Commit 08c3efed)

Reduced cyclomatic complexity by ~50%
Extracted methods for better separation of concerns
Improved testability with isolated components

Performance Optimizations

Vault Processing: Moved vault retrieval outside message loop to eliminate redundant async cache lookups
Scan Caching (76452acc): Added caching mechanism for scan results
Policy Singleton (1edfeab1): Implemented singleton pattern for policy management
RapidFuzz Integration (7181504c): Replaced word-wise Levenshtein distance with rapidfuzz.distance for better performance
Background Cache Cleanup (3071cd4c): Moved cache cleanup to background thread instead of running on every scan

Metrics & Observability

Added external plugin metrics endpoint (c72c6241)
Added metric for policy compile duration (047c2cc5)
Added metrics for scan duration seconds (7181504c)

Bug Fixes & Configuration

Fixed return type on __update_context API (22f1b788)
Pinned transformers to 4.55.1 to prevent TFPreTrainedModel error (ebf99f66)
Applied plugin to all prompts by default since prompt_ids are only known after creation (7b118fd8)
Fixed prompts type to Optional[set[str]] (786ef717)
Used lazy evaluation instead of f-strings for better performance (fc8c6d31)
Enabled sanitizers by default (d518bac7)
Added env var to disable TensorFlow in plugin startup (25047d79)

Maintenance

Multiple lint fixes and code cleanup commits
Added documentation comments (b1eb59b5)
Multiple merges from main branch to keep up-to-date
Added rapidfuzz dependency (f0040482)

Key Improvements Summary

50% reduction in cyclomatic complexity through method extraction
Significant performance gains via caching, background processing, and optimized algorithms
Better observability with comprehensive metrics
Improved maintainability with clear separation of concerns
Enhanced reliability with bug fixes and dependency pinning

🏷️ Type of Change

🧪 Verification

Check	Command	Status
Lint suite	`make lint`
Unit tests	`make test`
Coverage ≥ 90%	`make coverage`

✅ Checklist

Code formatted (make black isort pre-commit)
Tests added/updated for changes
Documentation updated (if applicable)
No secrets or credentials committed

📓 Notes (optional)

Plugin                            P:post       P:pre      R:post       R:pre      T:post       T:pre
----------------------------------------------------------------------------------------------------
LLMGuardPlugin                   0.102ms     0.149ms           —           —           —           —

Testing the metrics/prometheus endpoint of the plugin before merge:

The metric endpoint of the plugin requires that the runtime.py be manually injected into the container image until it is merged into the main build. To test before merge, run make build && make start , and then inject the updated runtime.py from this branch into the pod and then restart the pod: e.g.:

From the project root folder:

podman stop llmguardplugin && \
podman cp mcpgateway/plugins/framework/external/mcp/server/runtime.py llmguardplugin:/opt/app-root/lib/python3.12/site-packages/mcpgateway/plugins/framework/external/mcp/server && \
podman start llmguardplugin

The llmguard plugin will then expose an endpoint /metrics/prometheus.

curl -X GET http://127.0.0.1:8001/metrics/prometheus

To periodically scrape the endpoint with prometheus, create a prometheus.config file. Replace 192.168.1.92 with the ip-address of your local workstation in the example yaml file below:

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    metrics_path: '/metrics/prometheus'
    static_configs:
      - targets: ['192.168.1.92:8001']
        labels:
          group: 'llmguard'
      - targets: ['192.168.1.92:8000']
        labels:
          group: 'context-forge'

Then start prometheus using that configuration file. e.g:

   podman run --name prometheus -d -p 127.0.0.1:9090:9090 \
-v ./prometheus.yml:/etc/prometheus/prometheus.yml \
-v prometheus-data:/prometheus \
<prometheus_image_id>

Example Grafana dashboard surfacing metrics

crivetimihai · 2026-02-04T00:53:21Z

Thanks for tackling the llm-guard performance issues, @tedhabeck. The benchmark numbers look solid (sub-millisecond hook latency).

Since this is still in draft, a couple of notes for when it's ready:

PR description is minimal — please add a summary of the key changes (async conversion, batch evaluation, etc.)
No tests checked in the checklist — please confirm test coverage

Let us know when this is ready for a full review!