llmguard: switch from pickle to orjson for cache serialization by RinZ27 · Pull Request #2179 · IBM/mcp-context-forge

RinZ27 · 2026-01-19T09:10:04Z

Swapping out pickle for orjson in the LLMGuard cache to address the deserialization risks mentioned in #2156. Since orjson is already in the project, I figured it's the cleanest way to harden this. I've also made sure it handles the tuple-to-array shift so the vault logic doesn't break.

Replaced pickle with orjson in cache.py.
Added recursive conversion for JSON arrays back into tuples.
No new dependencies added.

araujof · 2026-01-20T15:13:30Z

@tedhabeck FYI

tedhabeck · 2026-01-20T15:53:22Z

Pickle performance:

Plugin                            P:post       P:pre      R:post       R:pre      T:post       T:pre
----------------------------------------------------------------------------------------------------
LLMGuardPlugin                  26.069ms   128.718ms           —           —           —           —

orjson performance:

LLMGuardPlugin                  24.170ms   128.957ms           —           —           —           —

crivetimihai · 2026-01-20T19:19:15Z

Hi @RinZ27 - thank you for this contribution! I've rebased it, and made some modifications. I think this still needs a discussion around sanitization of data, but looking forward to your input on this.

Review Changes Applied

This PR has been rebased onto main and enhanced during review. Here's a summary of the changes made beyond the original commit:

Fixes Applied

Issue	Fix
Critical: `_convert_to_tuple` returned `Tuple[Tuple]` instead of `List[Tuple]`	Renamed to `_convert_to_list_of_tuples()`, returns `List[Tuple]` as expected by `Vault._tuples`
Medium: `orjson.dumps` TypeError not handled	Added `try/except TypeError` with graceful return `(False, False)`
Medium: `orjson.loads` JSONDecodeError not handled	Added `try/except orjson.JSONDecodeError` returning `None`
Medium: PII logged in info-level logs	Changed to debug level, logs only key + size (no vault contents)
Low: Corrupted cache entries not cleaned up	Delete corrupted/invalid entries on retrieval failure
Low: Type hints inaccurate	Fixed to `-> list
Low: Unexpected JSON shapes could crash downstream	Treat as cache miss, delete from cache, return `None`
Low: Cache miss logged as error	Changed to `logger.debug` (normal condition)

Final Diff Summary

Robust error handling for serialization and deserialization
PII-safe logging throughout
Type-correct return values

Trade-offs: orjson vs pickle

Why This Change is Important

Security (Primary Motivation):

pickle.loads() can execute arbitrary code during deserialization — while this would require write access to Redis (cache poisoning), it's better to use a different approach.
orjson only deserializes JSON data types — no code execution possible

Comparison

Aspect	pickle	orjson
Security	❌ Arbitrary code execution risk	✅ Safe — only JSON types
Type Support	✅ Any Python object	⚠️ JSON types only (strings, numbers, bools, lists, dicts, null)
Tuples	✅ Native support	⚠️ Become arrays, need conversion back
Performance	Good	✅ 3-10x faster for JSON-serializable data
Human Readable	❌ Binary blob	✅ JSON text (debuggable)
Cross-language	❌ Python only	✅ Universal format

Trade-offs Accepted

Type Restrictions: Vault data must be JSON-serializable. Verified that vault tuples are always List[Tuple[str, str, str]] containing only strings — no issue for this use case.
Tuple Conversion: JSON doesn't have a tuple type, so List[Tuple] becomes List[List] during serialization. The _convert_to_list_of_tuples() function handles the conversion back.
Behavioral Change: If non-JSON-serializable data is ever stored (e.g., bytes, UUID, custom objects), the cache write will now fail gracefully instead of working. This is intentional — it surfaces data shape issues early rather than silently storing potentially problematic data.

Alternatives Considered

json stdlib: Slower than orjson, same type restrictions
msgpack: Binary format, faster than JSON, but adds a new dependency
orjson with default handler: Could serialize more types, but adds complexity and may mask data issues

Conclusion

The security benefits of removing pickle outweigh the minor complexity of tuple conversion. The vault data structure is well-defined (string tuples only), making orjson a clean fit. Error handling ensures graceful degradation if assumptions are ever violated.

monshri · 2026-01-20T21:14:29Z

@crivetimihai One question that needs to be answered at the plugin level is do we want to throw error/PluginError when cache serialization fails or just log it? Right now, it's logging the serialization error and sending a False, False to the plugin.py.

Rest looks good to me.

RinZ27 · 2026-01-21T08:23:29Z

Just jumping in with my two cents here... I agree with the current approach (logging the error and returning False, False). Since caching is an optimization/auxiliary mechanism here, a serialization failure shouldn't block the main processing flow or crash the plugin execution. Graceful degradation seems safer for production.

RinZ27 · 2026-01-21T08:31:38Z

Appreciate the rebase and the extra polish on the function names and types, @crivetimihai. It's definitely looking much cleaner now. Also, thanks @tedhabeck for running those benchmarks—glad to see orjson giving us that slight performance edge too.

Signed-off-by: RinCodeForge927 <dangnhatrin90@gmail.com> Signed-off-by: RinZ27 <222222878+RinZ27@users.noreply.github.com>

) Signed-off-by: RinCodeForge927 <dangnhatrin90@gmail.com> Signed-off-by: RinZ27 <222222878+RinZ27@users.noreply.github.com>

RinZ27 requested a review from crivetimihai as a code owner January 19, 2026 09:10

RinZ27 force-pushed the refactor/llmguard-safe-cache branch 2 times, most recently from 2038de5 to 3883e1a Compare January 19, 2026 09:20

crivetimihai added this to the Release 1.0.0-RC1 milestone Jan 20, 2026

crivetimihai self-assigned this Jan 20, 2026

crivetimihai force-pushed the refactor/llmguard-safe-cache branch from 3883e1a to a423a69 Compare January 20, 2026 19:18

crivetimihai force-pushed the refactor/llmguard-safe-cache branch from a423a69 to aaecaea Compare January 20, 2026 19:42

crivetimihai requested a review from monshri January 20, 2026 19:45

llmguard: switch from pickle to orjson for cache serialization

07cdacd

Signed-off-by: RinCodeForge927 <dangnhatrin90@gmail.com> Signed-off-by: RinZ27 <222222878+RinZ27@users.noreply.github.com>

crivetimihai force-pushed the refactor/llmguard-safe-cache branch from aaecaea to 07cdacd Compare January 23, 2026 17:51

crivetimihai approved these changes Jan 23, 2026

View reviewed changes

crivetimihai merged commit 391fca8 into IBM:main Jan 23, 2026
52 checks passed

crivetimihai modified the milestones: Release 1.0.0-RC1, Release 1.0.0-BETA-2 Jan 23, 2026

RinZ27 deleted the refactor/llmguard-safe-cache branch January 24, 2026 09:43

crivetimihai mentioned this pull request Jan 24, 2026

Issue 1959 [PERFORMANCE]: Fix critical performance issues in llm-guard plugin #2200

Merged

2 tasks

crivetimihai mentioned this pull request Jan 31, 2026

[BUG][PERFORMANCE]: Fix high-impact performance issues in llm-guard plugin #1960

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llmguard: switch from pickle to orjson for cache serialization#2179

llmguard: switch from pickle to orjson for cache serialization#2179
crivetimihai merged 1 commit intoIBM:mainfrom
RinZ27:refactor/llmguard-safe-cache

RinZ27 commented Jan 19, 2026 •

edited

Loading

Uh oh!

araujof commented Jan 20, 2026

Uh oh!

tedhabeck commented Jan 20, 2026

Uh oh!

crivetimihai commented Jan 20, 2026 •

edited

Loading

Uh oh!

monshri commented Jan 20, 2026 •

edited

Loading

Uh oh!

RinZ27 commented Jan 21, 2026

Uh oh!

RinZ27 commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

RinZ27 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

araujof commented Jan 20, 2026

Uh oh!

tedhabeck commented Jan 20, 2026

Uh oh!

crivetimihai commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Changes Applied

Fixes Applied

Final Diff Summary

Trade-offs: orjson vs pickle

Why This Change is Important

Comparison

Trade-offs Accepted

Alternatives Considered

Conclusion

Uh oh!

monshri commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RinZ27 commented Jan 21, 2026

Uh oh!

RinZ27 commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RinZ27 commented Jan 19, 2026 •

edited

Loading

crivetimihai commented Jan 20, 2026 •

edited

Loading

monshri commented Jan 20, 2026 •

edited

Loading