Skip to content

Issue 1960 Fix high-impact performance issues in llm-guard plugin#2638

Merged
crivetimihai merged 35 commits intoIBM:mainfrom
tedhabeck:issue-1960
Feb 6, 2026
Merged

Issue 1960 Fix high-impact performance issues in llm-guard plugin#2638
crivetimihai merged 35 commits intoIBM:mainfrom
tedhabeck:issue-1960

Conversation

@tedhabeck
Copy link
Copy Markdown
Collaborator

@tedhabeck tedhabeck commented Feb 1, 2026

🔗 Related Issue

Closes #1960


📝 Summary

#1960

Summary of Commits on Branch issue-1960

The issue-1960 branch contains 35+ commits focused on improving the LLMGuard plugin's performance, code quality, and maintainability. Here's a summary organized by category:

Major Enhancements

Code Quality & Refactoring (Commit 08c3efed)

Reduced cyclomatic complexity by ~50%
Extracted methods for better separation of concerns
Improved testability with isolated components

Performance Optimizations

Vault Processing: Moved vault retrieval outside message loop to eliminate redundant async cache lookups
Scan Caching (76452acc): Added caching mechanism for scan results
Policy Singleton (1edfeab1): Implemented singleton pattern for policy management
RapidFuzz Integration (7181504c): Replaced word-wise Levenshtein distance with rapidfuzz.distance for better performance
Background Cache Cleanup (3071cd4c): Moved cache cleanup to background thread instead of running on every scan

Metrics & Observability

Added external plugin metrics endpoint (c72c6241)
Added metric for policy compile duration (047c2cc5)
Added metrics for scan duration seconds (7181504c)

Bug Fixes & Configuration

Fixed return type on __update_context API (22f1b788)
Pinned transformers to 4.55.1 to prevent TFPreTrainedModel error (ebf99f66)
Applied plugin to all prompts by default since prompt_ids are only known after creation (7b118fd8)
Fixed prompts type to Optional[set[str]] (786ef717)
Used lazy evaluation instead of f-strings for better performance (fc8c6d31)
Enabled sanitizers by default (d518bac7)
Added env var to disable TensorFlow in plugin startup (25047d79)

Maintenance

Multiple lint fixes and code cleanup commits
Added documentation comments (b1eb59b5)
Multiple merges from main branch to keep up-to-date
Added rapidfuzz dependency (f0040482)

Key Improvements Summary

50% reduction in cyclomatic complexity through method extraction
Significant performance gains via caching, background processing, and optimized algorithms
Better observability with comprehensive metrics
Improved maintainability with clear separation of concerns
Enhanced reliability with bug fixes and dependency pinning


🏷️ Type of Change

  • Bug fix
  • Feature / Enhancement
  • Documentation
  • Refactor
  • Chore (deps, CI, tooling)
  • Other (describe below)

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 90% make coverage

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • Tests added/updated for changes
  • Documentation updated (if applicable)
  • No secrets or credentials committed

📓 Notes (optional)

Plugin                            P:post       P:pre      R:post       R:pre      T:post       T:pre
----------------------------------------------------------------------------------------------------
LLMGuardPlugin                   0.102ms     0.149ms           —           —           —           —

Testing the metrics/prometheus endpoint of the plugin before merge:

The metric endpoint of the plugin requires that the runtime.py be manually injected into the container image until it is merged into the main build. To test before merge, run make build && make start , and then inject the updated runtime.py from this branch into the pod and then restart the pod: e.g.:

From the project root folder:

podman stop llmguardplugin && \
podman cp mcpgateway/plugins/framework/external/mcp/server/runtime.py llmguardplugin:/opt/app-root/lib/python3.12/site-packages/mcpgateway/plugins/framework/external/mcp/server && \
podman start llmguardplugin

The llmguard plugin will then expose an endpoint /metrics/prometheus.

curl -X GET http://127.0.0.1:8001/metrics/prometheus

To periodically scrape the endpoint with prometheus, create a prometheus.config file. Replace 192.168.1.92 with the ip-address of your local workstation in the example yaml file below:

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    metrics_path: '/metrics/prometheus'
    static_configs:
      - targets: ['192.168.1.92:8001']
        labels:
          group: 'llmguard'
      - targets: ['192.168.1.92:8000']
        labels:
          group: 'context-forge'

Then start prometheus using that configuration file. e.g:

   podman run --name prometheus -d -p 127.0.0.1:9090:9090 \
-v ./prometheus.yml:/etc/prometheus/prometheus.yml \
-v prometheus-data:/prometheus \
<prometheus_image_id>

Example Grafana dashboard surfacing metrics

image

@crivetimihai crivetimihai changed the title Issue 1960 Issue 1960 Fix high-impact performance issues in llm-guard plugin Feb 1, 2026
@crivetimihai crivetimihai added this to the Release 1.0.0-GA milestone Feb 1, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks for tackling the llm-guard performance issues, @tedhabeck. The benchmark numbers look solid (sub-millisecond hook latency).

Since this is still in draft, a couple of notes for when it's ready:

  • PR description is minimal — please add a summary of the key changes (async conversion, batch evaluation, etc.)
  • No tests checked in the checklist — please confirm test coverage

Let us know when this is ready for a full review!

@crivetimihai crivetimihai self-assigned this Feb 4, 2026
@tedhabeck tedhabeck marked this pull request as ready for review February 5, 2026 01:10
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…e, add metrics for scan duration seconds

Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…that the plugin works out of the box.

Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Key Improvements:
Code Quality: Reduced cyclomatic complexity by ~50%
Performance: Vault retrieval moved outside message loop (eliminates redundant async cache lookups)
Consistency: All processing methods follow same pattern as input methods
Maintainability: Clear separation of concerns, easier to test individual components
Zero Breaking Changes: Maintains exact functional behavior

Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…y scan.

Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…ult exists.

Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
@crivetimihai crivetimihai merged commit 3a53604 into IBM:main Feb 6, 2026
43 checks passed
kcostell06 pushed a commit to kcostell06/mcp-context-forge that referenced this pull request Feb 24, 2026
…M#2638)

* fix: prompts are an Optional[set[str]] - set of prompt names.

Signed-off-by: habeck <habeck@us.ibm.com>

* revert: llmguard plugins.conditions.prompts

Signed-off-by: habeck <habeck@us.ibm.com>

* feat: add external plugin metrics endpoint

Signed-off-by: habeck <habeck@us.ibm.com>

* perf: use rapidfuzz.distance instead of word-wise Levenshtein distance, add metrics for scan duration seconds

Signed-off-by: habeck <habeck@us.ibm.com>

* perf: add metric for policy compile duration seconds

Signed-off-by: habeck <habeck@us.ibm.com>

* perf: policy singleton

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: missed commit to add rapidfuzz dependency

Signed-off-by: habeck <habeck@us.ibm.com>

* perf: add scan caching

Signed-off-by: habeck <habeck@us.ibm.com>

* enh: make _create_new_vault_on_expiry async

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fixes

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fix

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fixes

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: add doc comments

Signed-off-by: habeck <habeck@us.ibm.com>

* fix: pin transformers to 4.55.1 to prevent TFPreTrainedModel error

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fix

Signed-off-by: habeck <habeck@us.ibm.com>

* fix: Since prompt_ids are only known after creation, apply to all so that the plugin works out of the box.

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: test fix

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: remove duplicate import

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fix

Signed-off-by: habeck <habeck@us.ibm.com>

* enh:
Key Improvements:
Code Quality: Reduced cyclomatic complexity by ~50%
Performance: Vault retrieval moved outside message loop (eliminates redundant async cache lookups)
Consistency: All processing methods follow same pattern as input methods
Maintainability: Clear separation of concerns, easier to test individual components
Zero Breaking Changes: Maintains exact functional behavior

Signed-off-by: habeck <habeck@us.ibm.com>

* fix: use lazy evaluation rather than f-strings

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: enable snatizers by default

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: add env var to disable TensorFlow in plugin startup.

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: fix return type on __update_context api.

Signed-off-by: habeck <habeck@us.ibm.com>

* enh: run the cache cleanup in a background thread rather than on every scan.

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fix

Signed-off-by: habeck <habeck@us.ibm.com>

* fix: test case for Test _handle_vault_caching handles case when no vault exists.

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: add unit tests for new code

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: test coverage for llmguard.py to 94% from 80%

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: policy.py coverage to 100%

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: cache.py tests to 100%

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: lint fixes

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: add missing class doc to test_llmguardplugin.py

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: update readme

Signed-off-by: habeck <habeck@us.ibm.com>

* chore: clearer comment for plugin.conditions.prompts

Signed-off-by: habeck <habeck@us.ibm.com>

---------

Signed-off-by: habeck <habeck@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][PERFORMANCE]: Fix high-impact performance issues in llm-guard plugin

2 participants