Issue 1960 Fix high-impact performance issues in llm-guard plugin#2638
Merged
crivetimihai merged 35 commits intoIBM:mainfrom Feb 6, 2026
Merged
Issue 1960 Fix high-impact performance issues in llm-guard plugin#2638crivetimihai merged 35 commits intoIBM:mainfrom
crivetimihai merged 35 commits intoIBM:mainfrom
Conversation
Member
|
Thanks for tackling the llm-guard performance issues, @tedhabeck. The benchmark numbers look solid (sub-millisecond hook latency). Since this is still in draft, a couple of notes for when it's ready:
Let us know when this is ready for a full review! |
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…e, add metrics for scan duration seconds Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…that the plugin works out of the box. Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Key Improvements: Code Quality: Reduced cyclomatic complexity by ~50% Performance: Vault retrieval moved outside message loop (eliminates redundant async cache lookups) Consistency: All processing methods follow same pattern as input methods Maintainability: Clear separation of concerns, easier to test individual components Zero Breaking Changes: Maintains exact functional behavior Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…y scan. Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
…ult exists. Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
Signed-off-by: habeck <habeck@us.ibm.com>
9130512 to
bf8445e
Compare
kcostell06
pushed a commit
to kcostell06/mcp-context-forge
that referenced
this pull request
Feb 24, 2026
…M#2638) * fix: prompts are an Optional[set[str]] - set of prompt names. Signed-off-by: habeck <habeck@us.ibm.com> * revert: llmguard plugins.conditions.prompts Signed-off-by: habeck <habeck@us.ibm.com> * feat: add external plugin metrics endpoint Signed-off-by: habeck <habeck@us.ibm.com> * perf: use rapidfuzz.distance instead of word-wise Levenshtein distance, add metrics for scan duration seconds Signed-off-by: habeck <habeck@us.ibm.com> * perf: add metric for policy compile duration seconds Signed-off-by: habeck <habeck@us.ibm.com> * perf: policy singleton Signed-off-by: habeck <habeck@us.ibm.com> * chore: missed commit to add rapidfuzz dependency Signed-off-by: habeck <habeck@us.ibm.com> * perf: add scan caching Signed-off-by: habeck <habeck@us.ibm.com> * enh: make _create_new_vault_on_expiry async Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fixes Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fix Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fixes Signed-off-by: habeck <habeck@us.ibm.com> * chore: add doc comments Signed-off-by: habeck <habeck@us.ibm.com> * fix: pin transformers to 4.55.1 to prevent TFPreTrainedModel error Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fix Signed-off-by: habeck <habeck@us.ibm.com> * fix: Since prompt_ids are only known after creation, apply to all so that the plugin works out of the box. Signed-off-by: habeck <habeck@us.ibm.com> * chore: test fix Signed-off-by: habeck <habeck@us.ibm.com> * chore: remove duplicate import Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fix Signed-off-by: habeck <habeck@us.ibm.com> * enh: Key Improvements: Code Quality: Reduced cyclomatic complexity by ~50% Performance: Vault retrieval moved outside message loop (eliminates redundant async cache lookups) Consistency: All processing methods follow same pattern as input methods Maintainability: Clear separation of concerns, easier to test individual components Zero Breaking Changes: Maintains exact functional behavior Signed-off-by: habeck <habeck@us.ibm.com> * fix: use lazy evaluation rather than f-strings Signed-off-by: habeck <habeck@us.ibm.com> * chore: enable snatizers by default Signed-off-by: habeck <habeck@us.ibm.com> * chore: add env var to disable TensorFlow in plugin startup. Signed-off-by: habeck <habeck@us.ibm.com> * chore: fix return type on __update_context api. Signed-off-by: habeck <habeck@us.ibm.com> * enh: run the cache cleanup in a background thread rather than on every scan. Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fix Signed-off-by: habeck <habeck@us.ibm.com> * fix: test case for Test _handle_vault_caching handles case when no vault exists. Signed-off-by: habeck <habeck@us.ibm.com> * chore: add unit tests for new code Signed-off-by: habeck <habeck@us.ibm.com> * chore: test coverage for llmguard.py to 94% from 80% Signed-off-by: habeck <habeck@us.ibm.com> * chore: policy.py coverage to 100% Signed-off-by: habeck <habeck@us.ibm.com> * chore: cache.py tests to 100% Signed-off-by: habeck <habeck@us.ibm.com> * chore: lint fixes Signed-off-by: habeck <habeck@us.ibm.com> * chore: add missing class doc to test_llmguardplugin.py Signed-off-by: habeck <habeck@us.ibm.com> * chore: update readme Signed-off-by: habeck <habeck@us.ibm.com> * chore: clearer comment for plugin.conditions.prompts Signed-off-by: habeck <habeck@us.ibm.com> --------- Signed-off-by: habeck <habeck@us.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔗 Related Issue
Closes #1960
📝 Summary
#1960
Summary of Commits on Branch issue-1960
The issue-1960 branch contains 35+ commits focused on improving the LLMGuard plugin's performance, code quality, and maintainability. Here's a summary organized by category:
Major Enhancements
Code Quality & Refactoring (Commit 08c3efed)
Reduced cyclomatic complexity by ~50%
Extracted methods for better separation of concerns
Improved testability with isolated components
Performance Optimizations
Vault Processing: Moved vault retrieval outside message loop to eliminate redundant async cache lookups
Scan Caching (76452acc): Added caching mechanism for scan results
Policy Singleton (1edfeab1): Implemented singleton pattern for policy management
RapidFuzz Integration (7181504c): Replaced word-wise Levenshtein distance with rapidfuzz.distance for better performance
Background Cache Cleanup (3071cd4c): Moved cache cleanup to background thread instead of running on every scan
Metrics & Observability
Added external plugin metrics endpoint (c72c6241)
Added metric for policy compile duration (047c2cc5)
Added metrics for scan duration seconds (7181504c)
Bug Fixes & Configuration
Fixed return type on __update_context API (22f1b788)
Pinned transformers to 4.55.1 to prevent TFPreTrainedModel error (ebf99f66)
Applied plugin to all prompts by default since prompt_ids are only known after creation (7b118fd8)
Fixed prompts type to Optional[set[str]] (786ef717)
Used lazy evaluation instead of f-strings for better performance (fc8c6d31)
Enabled sanitizers by default (d518bac7)
Added env var to disable TensorFlow in plugin startup (25047d79)
Maintenance
Multiple lint fixes and code cleanup commits
Added documentation comments (b1eb59b5)
Multiple merges from main branch to keep up-to-date
Added rapidfuzz dependency (f0040482)
Key Improvements Summary
50% reduction in cyclomatic complexity through method extraction
Significant performance gains via caching, background processing, and optimized algorithms
Better observability with comprehensive metrics
Improved maintainability with clear separation of concerns
Enhanced reliability with bug fixes and dependency pinning
🏷️ Type of Change
🧪 Verification
make lintmake testmake coverage✅ Checklist
make black isort pre-commit)📓 Notes (optional)
Testing the metrics/prometheus endpoint of the plugin before merge:
The metric endpoint of the plugin requires that the runtime.py be manually injected into the container image until it is merged into the main build. To test before merge, run
make build && make start, and then inject the updated runtime.py from this branch into the pod and then restart the pod: e.g.:From the project root folder:
The llmguard plugin will then expose an endpoint
/metrics/prometheus.To periodically scrape the endpoint with prometheus, create a prometheus.config file. Replace
192.168.1.92with the ip-address of your local workstation in the example yaml file below:Then start prometheus using that configuration file. e.g:
Example Grafana dashboard surfacing metrics