feat: implement semantic caching by giorgi-pro · Pull Request #162 · ENTERPILOT/GoModel

giorgi-pro · 2026-03-21T21:41:15Z

Summary by CodeRabbit

New Features
- Two-layer response caching: semantic (similarity) cache added alongside exact-match; responses include "HIT (exact)" or "HIT (semantic)". Per-request semantic threshold, TTL, and cache-type controls supported. Guardrail changes are factored into cache keys.
Documentation
- Added ADR describing semantic-response-cache design and behavior.
Chores
- Go toolchain/dependencies updated to support local/API embedders and vector stores.
- Added .DS_Store to .gitignore.
Tests
- Extensive tests for semantic cache, vector stores, embedders, and cache interactions.

coderabbitai · 2026-03-21T21:41:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a semantic response-cache layer with embedder and vector-store abstractions, propagates a guardrails hash into request context, refactors exact-match caching into helpers, moves caching into the inference request path, provides in-memory and SQLite vecstores, updates config/validation/deps, and adds tests and an ADR.

Changes

Cohort / File(s)	Summary
Config `config/config.go`	Adds `ResponseCacheConfig.Semantic`, new types for semantic config (embedder/vector-store), `SemanticCacheActive`, validation rules, and default semantic settings.
Docs & deps `docs/adr/0006-semantic-response-cache.md`, `go.mod`, `.gitignore`	Adds ADR for semantic response cache, updates Go toolchain and dependencies for embedding/vecstore, and ignores `.DS_Store`.
App init & context `internal/app/app.go`, `internal/core/context.go`	Computes/stores `guardrailsHash` at startup; adds context helpers to attach/retrieve guardrails hash.
Embedding `internal/embedding/embedding.go`, `internal/embedding/embedding_test.go`	Introduces `Embedder` interface and `NewEmbedder` factory supporting local ONNX MiniLM and API embedders; includes tests and a mock embedder.
Vecstore abstraction & implementations `internal/responsecache/vecstore.go`, `internal/responsecache/vecstore_map.go`, `internal/responsecache/vecstore_sqlite.go`	Adds `VecStore`/`VecResult`, `NewVecStore` factory, in-memory `MapVecStore`, and a SQLite-backed vecstore with cosine-search, TTL, and cleanup.
Semantic cache middleware & tests `internal/responsecache/semantic.go`, `internal/responsecache/semantic_test.go`	New semantic caching middleware: embedding extraction, params-hash (includes guardrails hash), similarity search, async insert with TTL, header controls, and comprehensive tests.
Response cache refactor `internal/responsecache/responsecache.go`, `internal/responsecache/simple.go`, `internal/responsecache/middleware_test.go`, `internal/responsecache/handle_request_test.go`	Refactors to separate `simple` (exact) and `semantic` layers, constructor accepts raw providers, adds `HandleRequest`, and splits simple logic into `TryHit`/`StoreAfter`; tests updated/added.
Handler & server wiring `internal/server/handlers.go`, `internal/server/http.go`, `internal/server/translated_inference_service.go`	Handler now holds `responseCache` and `guardrailsHash`; caching moved from global Echo middleware into per-request path after patching; guardrails hash propagated into request context.
Tests & validation `config/cache_validation_test.go`, `internal/responsecache/*_test.go`	Adds tests for semantic cache validation, semantic middleware behavior, vecstore TTL and guardrails-hash properties, plus handle-request integration test.
Small repo hygiene `.gitignore`	Added `.DS_Store` entry.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler
    participant ResponseCache
    participant SimpleCache
    participant SemanticCache
    participant Embedder
    participant VecStore
    participant Downstream

    Client->>Handler: POST /v1/chat/completions
    Handler->>ResponseCache: HandleRequest(body, next)

    ResponseCache->>SimpleCache: TryHit(body)
    alt Exact match
        SimpleCache-->>ResponseCache: HIT (exact)
        ResponseCache-->>Handler: return cached response (X-Cache: HIT (exact))
        Handler-->>Client: 200 OK
    else No exact match
        SimpleCache-->>ResponseCache: miss
        ResponseCache->>SemanticCache: Handle(body, next)
        SemanticCache->>Embedder: Embed(extracted text)
        Embedder-->>SemanticCache: vector
        SemanticCache->>VecStore: Search(vector, paramsHash)
        VecStore-->>SemanticCache: results
        alt Similarity >= threshold
            SemanticCache-->>ResponseCache: HIT (semantic)
            ResponseCache-->>Handler: return cached response (X-Cache: HIT (semantic))
            Handler-->>Client: 200 OK
        else miss
            SemanticCache->>Downstream: invoke next()
            Downstream-->>SemanticCache: response
            SemanticCache->>VecStore: Insert(vector, response, paramsHash, ttl) (async)
            SemanticCache-->>ResponseCache: return response (X-Cache: MISS)
            ResponseCache-->>Handler: response
            Handler-->>Client: response
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(gom-48): add simple response cache #123 — Extends prior response-cache foundation; overlaps in responsecache package and cache behavior.
feat: refactor config system to unified 3-layer pipeline and add conf… #60 — Related config-schema changes; touches config defaults and validation.
feat: add admin dashboard and API behind ADMIN_ENDPOINTS_ENABLED and ADMIN_UI_ENABLED feature flags #73 — Related server wiring and app initialization changes involving guardrails/cache integration.

Suggested labels

release:feature

Suggested reviewers

SantiagoDePolonia

Poem

🐰 I nibble bytes and hop through stacks,
Two caches now rest in tidy racks.
Vectors whisper, guardrails hum,
Responses cached — here we come!
A rabbit claps — hooray, code tracks! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: implementing semantic caching functionality, which is the primary focus of the extensive changeset across configuration, middleware, vector stores, and embeddings.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/implement_semantic_search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 18

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 32: Update the gitignore entry for the macOS Finder metadata so it
matches the actual filename by replacing the current DS_Store pattern with
.DS_Store (i.e., change "DS_Store" to ".DS_Store") so .DS_Store files are
correctly ignored by git.

In `@config/config.go`:
- Around line 378-392: The current validation enforces vector_store type and
URLs whenever semanticCacheActive detects vector fields, but it doesn't require
the Semantic.Enabled flag; update the check so these validations only run when
the semantic cache is actually enabled (i.e., require sem.Enabled == true) or
change semanticCacheActive to include sem.Enabled in its logic; specifically
modify the block that starts with sem := &c.Response.Semantic and/or the
semanticCacheActive function so that the switch on sem.VectorStore.Type and the
qdrant/pgvector URL checks only execute when sem.Enabled is true (or when
semanticCacheActive returns true after including Enabled).
- Around line 396-405: The validation currently treats any set semantic cache
field as intent to enable, but runtime only checks semanticEnabled; align them
by updating semanticCacheActive to require the Enabled flag as well (i.e.,
return sem != nil && sem.Enabled && (...existing field checks...)) so validation
fails unless Enabled is true, or alternatively change semanticEnabled (in
internal/responsecache/responsecache.go) to call semanticCacheActive(sem) to use
the same detection logic; pick one approach and apply consistently so both
semanticCacheActive and semanticEnabled use the same Enabled+field criteria.

In `@go.mod`:
- Around line 139-140: Add a short comment above the replace directive
documenting why we pin github.com/sugarme/tokenizer to the fork
github.com/clems4ever/tokenizer v0.0.0-20250926133620-9ddc80533c43 (e.g.,
specific bugfix, compatibility with Go toolchain, or missing feature), including
the issue/PR or commit ID that necessitated the fork and any guidance on when it
can be reverted; update the replace line comment to reference that rationale so
future maintainers understand why the forked tokenizer is required.

In `@internal/app/app.go`:
- Around line 579-596: computeGuardrailsHashFromConfig currently builds
responsecache.GuardrailRuleDescriptor with only Name, Type and
SystemPrompt.Content; extend it to include the rule Order and SystemPrompt.Mode
so changes to ordering or prompt mode invalidate the semantic cache, and update
the responsecache.GuardrailRuleDescriptor type and
responsecache.ComputeGuardrailsHash usage to accept and incorporate these new
fields accordingly.

In `@internal/embedding/embedding_test.go`:
- Around line 10-18: The test TestNewEmbedder_LocalDefault should not check
err.Error() == "" because that branch is redundant; instead, keep the early skip
when NewEmbedder returns nil error (ONNX present) and otherwise assert that an
error was returned — remove the unreachable/unused empty-error-string assertion.
Locate TestNewEmbedder_LocalDefault and the NewEmbedder call and delete the
err.Error() == "" check (and its t.Fatal), leaving a clear skip for err == nil
and a single assertion that err != nil when ONNX is absent.

In `@internal/embedding/embedding.go`:
- Around line 77-83: The Embed method for miniLMEmbedder ignores the provided
context, so long-running ONNX Compute calls cannot be cancelled; update Embed
(miniLMEmbedder.Embed) to respect ctx by either calling a context-aware variant
of the model compute if available (e.g., model.ComputeContext(ctx, ...)) or by
running e.model.Compute(text, true) in a goroutine and select on ctx.Done() to
abort and return ctx.Err() if cancelled; ensure any resources are cleaned up and
errors are wrapped consistently (keep the existing error message format).
- Around line 48-53: The apiEmbedder is constructed with &http.Client{} which
has no timeout; update the constructor that creates apiEmbedder in embedding.go
to set a reasonable Timeout on the http.Client (e.g., Timeout: defaultTimeout)
or accept a configurable timeout parameter and use that; add a defaultTimeout
constant (or wire through an existing config value) and ensure the apiEmbedder
struct uses &http.Client{Timeout: timeout} so embedding API calls cannot hang
indefinitely.
- Around line 127-146: The code currently parses the body then inspects
parsed.Error, but it should first validate HTTP status codes from
e.httpClient.Do(req) to catch non-2xx responses that may not populate
parsed.Error; after receiving resp, check resp.StatusCode (e.g., if
resp.StatusCode < 200 || resp.StatusCode >= 300) and read the rawBody and return
a formatted error including the status code and body content before attempting
json.Unmarshal; update the error path near the embeddingResponse/parsed handling
so failures return early with context (status code and body) rather than only
relying on parsed.Error or parsed.Data.

In `@internal/responsecache/responsecache.go`:
- Around line 113-118: The semantic path currently returns directly from
m.semantic.Handle when skipSemantic is false, so on a semantic miss the exact
Redis cache never gets populated; change the control flow in responsecache.go so
that after calling m.semantic.Handle (or when it indicates a miss), you still
call m.simple.StoreAfter when m.simple != nil and skipExact is false (i.e.,
perform exact-cache backfill after semantic handling instead of returning
early), and add a HandleRequest regression test that enables both caches and
verifies a semantic miss results in the Redis exact cache being populated for
subsequent exact requests.

In `@internal/responsecache/semantic.go`:
- Around line 165-209: The current semantic cache key is unsafe because
extractEmbedText only returns the last user text (via lastUserText) and
extractTextFromContent drops non-text parts, letting multimodal or multi-turn
requests collide; update caching so computeParamsHash (or whichever builds the
cache key) either (A) only enables semantic caching for single-turn, text-only
requests by checking extractEmbedText’s nonSystemCount==1 and verifying
extractTextFromContent found only plain string/text parts, or (B) include the
full relevant context in the key by passing the entire messages slice (not just
lastUserText) plus deterministic digests/hashes of any non-text content parts
produced by extractTextFromContent (e.g., embed binary/image ids or checksums)
so multimodal/multi-turn differences are represented; modify
extractEmbedText/extractTextFromContent to surface a full messages
representation or content digests to support this change.
- Around line 382-385: The current ShouldSkipAllCache only checks
X-Cache-Control and therefore disagrees with the semantic path that honors
standard Cache-Control: no-store; update ShouldSkipAllCache (used by
ResponseCacheMiddleware.HandleRequest) to use the same predicate as the semantic
path — either call the existing predicate used by the semantic route (if one
exists) or change ShouldSkipAllCache to return true when X-Cache-Control equals
"no-store" OR when the standard Cache-Control header contains the directive
"no-store" (case-insensitive, and handling multiple comma/space-separated
directives).
- Around line 27-30: Remove the now-unused context key type and constant
declarations introduced for response-cache guardrails by deleting the
contextKeyType type and the contextKeyGuardrailsHash constant (and their
comment) from semantic.go; these symbols (contextKeyType,
contextKeyGuardrailsHash) are no longer referenced after switching to
core.WithGuardrailsHash so removing them will fix the lint error—after removal,
run the linter/build to confirm no remaining references.

In `@internal/responsecache/simple.go`:
- Around line 72-97: The X-Cache value set in simpleCacheMiddleware.TryHit is
using "HIT" but ADR 0006 requires the exact-match format "HIT exact" (semantic
layer uses "HIT (semantic)"); update the header set in TryHit so
c.Response().Header().Set("X-Cache", ...) uses "HIT exact" for exact-cache hits
(locate TryHit in simpleCacheMiddleware and the point after cache read where
hashRequest is used to determine a hit) and keep the rest of the response logic
unchanged.

In `@internal/responsecache/vecstore_map.go`:
- Around line 121-130: The custom Newton-Raphson sqrt64 function should be
replaced with the standard library math.Sqrt: remove or delete the sqrt64
function and change any callers to use math.Sqrt(x) instead, and add/import
"math" where needed; ensure you remove the unused sqrt64 symbol to avoid dead
code or update references consistently (function name: sqrt64).

In `@internal/responsecache/vecstore_sqlite.go`:
- Around line 170-176: The current sqliteVecStore.deleteByKeys loops and issues
one ExecContext per key, causing O(n) round-trips; change it to execute a single
batched DELETE against the vec_items table using a single DELETE ... WHERE key
IN (?, ?, ...) with dynamically generated placeholders from the keys slice (or,
if placeholder generation is undesirable, wrap a prepared statement in a
transaction and execute batch deletes), ensuring you still use the ctx with
timeout and handle/return errors from s.db.ExecContext rather than ignoring
them; update the function sqliteVecStore.deleteByKeys to build the placeholders,
call s.db.ExecContext once, and properly handle the result/error.
- Around line 70-119: The Search method (sqliteVecStore.Search) can return fewer
than the requested limit because expired rows are filtered after the SQL LIMIT;
fix by moving expiry filtering into the SQL query: add "AND (expires_at = 0 OR
expires_at >= ?)" to the WHERE clause and pass the current unix timestamp as the
extra parameter so the DB only returns non-expired rows, then remove/ignore the
application-level expiry check that drops results (you can still collect expired
keys separately via a separate query or keep the goroutine deleteByKeys call if
needed). Use the existing serializeFloat32(vec) parameter ordering when adding
the timestamp argument.

In `@internal/server/translated_inference_service.go`:
- Around line 55-61: When marshalRequestBody(req) fails it's silently bypassing
the cache; add a debug log for the error before falling back to
dispatchChatCompletion. In the block that checks s.responseCache != nil &&
!req.Stream, capture marshalErr and if marshalErr != nil call the server's
logger (e.g., s.logger.Debugf or similar) to log a message like
"marshalRequestBody failed: %v" with marshalErr, then continue to call
s.dispatchChatCompletion(c, req, plan) as before; keep the existing
s.responseCache.HandleRequest path when marshal succeeds.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a6410690-7aac-454d-806d-6f5c9a9cf86b

📥 Commits

Reviewing files that changed from the base of the PR and between 980d755 and 413e1e9.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (19)

.gitignore
config/config.go
docs/adr/0006-semantic-response-cache.md
go.mod
internal/app/app.go
internal/core/context.go
internal/embedding/embedding.go
internal/embedding/embedding_test.go
internal/responsecache/middleware_test.go
internal/responsecache/responsecache.go
internal/responsecache/semantic.go
internal/responsecache/semantic_test.go
internal/responsecache/simple.go
internal/responsecache/vecstore.go
internal/responsecache/vecstore_map.go
internal/responsecache/vecstore_sqlite.go
internal/server/handlers.go
internal/server/http.go
internal/server/translated_inference_service.go

.gitignore

config/config.go

go.mod

internal/app/app.go

internal/responsecache/simple.go

internal/responsecache/vecstore_map.go

internal/responsecache/vecstore_sqlite.go

internal/server/translated_inference_service.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 3: go.mod currently pins "go 1.25.1" but CI and Docker use "1.25" (and
release uses "stable"), causing potential version drift; choose one approach and
make them consistent: either change go.mod to "go 1.25" (relax patch-level) or
update CI and Docker pins to the exact patch "1.25.1" — update the GO_VERSION in
.github/workflows/test.yml, the FROM image tag in Dockerfile (golang:1.25 ->
golang:1.25.1-alpine3.23 or equivalent), and the go-version in
.github/workflows/release.yml to match; ensure all three locations (go.mod,
GO_VERSION, Docker FROM / release go-version) use the same version string.

In `@internal/responsecache/simple.go`:
- Around line 81-84: When m.store.Get(ctx, key) returns an error the code
currently swallows it; update the handler around the m.store.Get call to log the
error (at debug or warn) including the key and error details before falling back
to a cache miss. Use the middleware's logger (e.g., m.logger or the existing
logger field) or the request context logger to emit a message like "cache read
error" with key and err, then keep returning false, nil to preserve behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d8c78c18-0a66-494f-8090-2de0c74f8a1a

📥 Commits

Reviewing files that changed from the base of the PR and between 413e1e9 and ce4bd39.

📒 Files selected for processing (3)

.gitignore
go.mod
internal/responsecache/simple.go

go.mod

internal/responsecache/simple.go

guardrails_hash already implemented

on semantic-miss

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/config.go`:
- Around line 379-392: The semantic cache validation currently only checks
vector_store.Type and related URLs; add guards inside the
SemanticCacheActive(sem) block to reject invalid tuning values by validating
sem.Tuning.SimilarityThreshold (or similarity_threshold) must be > 0 and <= 1
and return a descriptive error if not, and validate sem.Tuning.TTL (or ttl) must
be >= 0 (reject negative values) with a clear error message; place these checks
alongside the existing vector_store validations (referencing
c.Response.Semantic, SemanticCacheActive, and sem.VectorStore.Type) so invalid
tuning values are caught during config validation.

In `@internal/responsecache/responsecache.go`:
- Around line 98-99: The header comparison for X-Cache-Type is case-sensitive
while ShouldSkipExactCache is case-insensitive, so update the check in the
handler to compare case-insensitively (e.g., use strings.EqualFold or normalize
strings.ToLower) when evaluating c.Request().Header.Get("X-Cache-Type") against
CacheTypeExact; modify the expression skipSemantic := m.semantic == nil ||
strings.EqualFold(c.Request().Header.Get("X-Cache-Type"), CacheTypeExact) (and
add/import "strings" if missing) so both paths handle header casing consistently
with ShouldSkipExactCache.

In `@internal/responsecache/semantic.go`:
- Around line 81-86: The current paramsHash includes conversationFingerprint
which contains the full last user turn causing paraphrases to fall into
different namespaces; change the namespace construction so it only hashes
invariant context (e.g., prior turns excluding the final user text, non-text
attachments/metadata, guardrails hash from core.GetGuardrailsHash(ctx), model
and embedder identity) instead of the full message JSON inside
conversationFingerprint used when computing paramsHash; alternatively, detect
non-single-turn or non-text-only requests in the code paths (the same places
around conversationFingerprint/computeParamsHash and the later 194-235 block)
and explicitly skip semantic caching for those cases so store.Search (which is
filtered by params_hash) only compares across semantically equivalent
namespaces.

In `@internal/responsecache/vecstore_sqlite.go`:
- Around line 49-62: Add an index and a uniqueness constraint to the vec_items
schema and switch inserts to an upsert to match the lookup pattern: alter the
CREATE TABLE for vec_items to enforce uniqueness on (key, params_hash) (or add a
UNIQUE INDEX on those columns), and add an index on params_hash or a composite
index on (params_hash, expires_at) so queries using WHERE params_hash = ? AND
expires_at > ? are indexed; then change the code paths that insert into
vec_items (the INSERT statements in vecstore_sqlite.go that write new
embeddings/responses) to use INSERT ... ON CONFLICT(key, params_hash) DO UPDATE
(or DO NOTHING if you prefer no overwrite) to dedupe concurrent inserts; update
both the initial table/index creation block and the other index creation block
referenced in the file.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b760722d-770a-4a10-a2a3-e7f1c5501b37

📥 Commits

Reviewing files that changed from the base of the PR and between ce4bd39 and 487fc73.

📒 Files selected for processing (15)

config/cache_validation_test.go
config/config.go
docs/adr/0006-semantic-response-cache.md
go.mod
internal/app/app.go
internal/embedding/embedding.go
internal/embedding/embedding_test.go
internal/responsecache/handle_request_test.go
internal/responsecache/middleware_test.go
internal/responsecache/responsecache.go
internal/responsecache/semantic.go
internal/responsecache/semantic_test.go
internal/responsecache/vecstore_map.go
internal/responsecache/vecstore_sqlite.go
internal/server/translated_inference_service.go

config/config.go

internal/responsecache/responsecache.go

internal/responsecache/semantic.go

internal/responsecache/vecstore_sqlite.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/server/translated_inference_service.go (1)
192-216: 🧹 Nitpick | 🔵 Trivial

Embeddings handler not integrated with the new caching pipeline.

Embeddings bypasses the handleTranslatedInference / handleWithCache pattern used by ChatCompletion and Responses. If semantic caching should apply to embeddings in the future, this method will need to be refactored.

If caching is intentionally not supported for embeddings, consider adding a comment documenting this decision.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/translated_inference_service.go` around lines 192 - 216, The
Embeddings method currently bypasses the shared caching pipeline used by
ChatCompletion/Responses (handleTranslatedInference and handleWithCache), so
either integrate embeddings into that flow or explicitly document the decision
not to cache; update the translatedInferenceService.Embeddings implementation to
call the same helper (e.g., handleTranslatedInference or handleWithCache) with
the embedding request and provider/model resolution (preserving requestID and
usage logging via s.logUsage and usage.ExtractFromEmbeddingResponse), or add a
clear comment above Embeddings stating caching is intentionally disabled for
embeddings and why, including references to handleTranslatedInference,
handleWithCache, ChatCompletion, and Responses so future maintainers know the
intended behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/handlers.go`:
- Around line 68-81: The translatedInference() helper allocates a new
translatedInferenceService and calls initHandlers() on every request; change
Handler to hold a cached field (e.g. translatedSvc *translatedInferenceService)
and lazily initialize it inside translatedInference() using sync.Once or a mutex
to ensure thread-safety, construct the service reusing Handler's
provider/modelResolver/... fields, call initHandlers() only once during
initialization, and return the cached instance on subsequent calls so you avoid
per-request allocations.

---

Outside diff comments:
In `@internal/server/translated_inference_service.go`:
- Around line 192-216: The Embeddings method currently bypasses the shared
caching pipeline used by ChatCompletion/Responses (handleTranslatedInference and
handleWithCache), so either integrate embeddings into that flow or explicitly
document the decision not to cache; update the
translatedInferenceService.Embeddings implementation to call the same helper
(e.g., handleTranslatedInference or handleWithCache) with the embedding request
and provider/model resolution (preserving requestID and usage logging via
s.logUsage and usage.ExtractFromEmbeddingResponse), or add a clear comment above
Embeddings stating caching is intentionally disabled for embeddings and why,
including references to handleTranslatedInference, handleWithCache,
ChatCompletion, and Responses so future maintainers know the intended behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96ebe06b-5a54-4ba8-8075-0fcefe4ef364

📥 Commits

Reviewing files that changed from the base of the PR and between 487fc73 and 54eae2e.

📒 Files selected for processing (2)

internal/server/handlers.go
internal/server/translated_inference_service.go

internal/server/handlers.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/responsecache/responsecache.go`:
- Around line 98-102: The import block containing "log/slog", "strings", "time"
was accidentally placed inside the HandleRequest function causing a syntax
error; remove that misplaced import block from inside HandleRequest and add
"strings" to the top-level import list at the top of the file (keeping existing
imports such as log/slog and time as needed), then run gofmt/goimports to ensure
imports are ordered and the file compiles.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 98061d51-5fc2-4e4f-9c7e-151cb8008470

📥 Commits

Reviewing files that changed from the base of the PR and between 54eae2e and 608051b.

📒 Files selected for processing (1)

internal/responsecache/responsecache.go

internal/responsecache/responsecache.go

# Conflicts: # .gitignore # go.mod # go.sum # internal/responsecache/simple.go

giorgi-pro added 3 commits March 21, 2026 20:48

chore: add adr

cdc1a69

chore: adjust adr

3ae5158

feat: add semantic search

413e1e9

giorgi-pro self-assigned this Mar 21, 2026

giorgi-pro changed the title ~~feat: implement semantic search~~ feat: implement semantic caching Mar 21, 2026

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

giorgi-pro and others added 3 commits March 21, 2026 23:24

Update .gitignore

28ccf5f

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update go.mod

1574e3b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update internal/responsecache/simple.go

ce4bd39

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

go.mod Outdated Show resolved Hide resolved

internal/responsecache/simple.go Show resolved Hide resolved

giorgi-pro added 5 commits March 22, 2026 00:09

chore: update adr

3c3d29a

guardrails_hash already implemented

fix: set exact cache

dfa0760

on semantic-miss

chore: adjust parameter

978a95a

chore: clean up adr

e9862a8

chore: address comments

487fc73

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

config/config.go Show resolved Hide resolved

internal/responsecache/responsecache.go Outdated Show resolved Hide resolved

internal/responsecache/semantic.go Outdated Show resolved Hide resolved

internal/responsecache/vecstore_sqlite.go Outdated Show resolved Hide resolved

refactor: clean up duplicates

54eae2e

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

internal/server/handlers.go Show resolved Hide resolved

Update internal/responsecache/responsecache.go

608051b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

internal/responsecache/responsecache.go Outdated Show resolved Hide resolved

giorgi-pro added 4 commits March 22, 2026 01:06

chore: address comments

1bb3d11

Merge branch 'main' into feat/implement_semantic_search

2f5eedc

# Conflicts: # .gitignore # go.mod # go.sum # internal/responsecache/simple.go

fix: lint

fd22c1f

docs: include semantic caching

ad9a8c1

giorgi-pro merged commit 602e190 into main Mar 26, 2026
15 checks passed

giorgi-pro deleted the feat/implement_semantic_search branch March 26, 2026 08:01

This was referenced Mar 28, 2026

feat(dashboard): add dashboard functions to the exectution plan/workflow feature #183

Merged

feat(workflows): expose failover controls in execution plans #186

Merged

Conversation

giorgi-pro commented Mar 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

giorgi-pro commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading