[PERFORMANCE]: Response-cache-by-prompt algorithmic optimization

## Summary
Response-cache-by-prompt performs a full linear scan of cached entries and vectorizes the input per request. This is O(n) and becomes CPU-heavy as the cache grows.

## Evidence (current code)
- `plugins/response_cache_by_prompt/response_cache_by_prompt.py`: `_find_best` vectorizes input and compares cosine similarity against all cache entries each request.

## Impact
- CPU usage grows linearly with cache size.
- Can dominate request latency when cache is large.

## Proposed fix
- Use LRU + pruning to keep cache small, or index entries by tokens to reduce candidate comparisons.
- Consider approximate nearest neighbor search for large caches.

## Acceptance criteria
- Cache lookup avoids full linear scan for common cases.
- CPU cost per request scales sublinearly with cache size.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFORMANCE]: Response-cache-by-prompt algorithmic optimization #1835

Summary

Evidence (current code)

Impact

Proposed fix

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PERFORMANCE]: Response-cache-by-prompt algorithmic optimization #1835

Description

Summary

Evidence (current code)

Impact

Proposed fix

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions