Skip to content

fix(ai): silence embedding batch cap warnings#779

Open
alexandreroumieu-codeapprentice wants to merge 1 commit intogarrytan:masterfrom
alexandreroumieu-codeapprentice:fix/clean-ai-batching
Open

fix(ai): silence embedding batch cap warnings#779
alexandreroumieu-codeapprentice wants to merge 1 commit intogarrytan:masterfrom
alexandreroumieu-codeapprentice:fix/clean-ai-batching

Conversation

@alexandreroumieu-codeapprentice
Copy link
Copy Markdown

@alexandreroumieu-codeapprentice alexandreroumieu-codeapprentice commented May 9, 2026

Summary

  • Add Google Gemini embedding's documented request token cap (max_batch_tokens: 20_000).
  • Mark Ollama and LiteLLM embedding recipes as explicit dynamic-limit opt-outs so startup warnings only catch accidental omissions.
  • Normalize embedding batch-cap handling in the gateway so only positive static caps trigger proactive pre-splitting.
  • Update regression coverage for shipped embedding recipe warning hygiene.

Validation

  • bun test test/ai/adaptive-embed-batch.test.ts
    • 24 pass / 0 fail / 49 expect() calls

Notes


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan added a commit that referenced this pull request May 10, 2026
…#121)

Two small ergonomics fixes folded together (#765 deferred — see TODOS.md
follow-up; the CJK PGLite extraction was bigger than the plan estimated).

#779 reworked (alexandreroumieu-codeapprentice): silence the
missing-max_batch_tokens startup warning for recipes with genuinely
dynamic batch capacity. New `EmbeddingTouchpoint.no_batch_cap?: true`
field. Set on ollama (capacity depends on locally loaded model +
OLLAMA_NUM_PARALLEL), litellm-proxy (depends on backend), llama-server
(set by --ctx-size at server launch). Three less stderr warnings on
every gateway configure; google still warns (it's a real fixed-cap
provider that ought to ship a max_batch_tokens declaration).

Bonus: litellm-proxy now declares `user_provided_models: true`, removing
the last consumer of the legacy `recipe.id === 'litellm'` hardcode in
gateway.ts:223 (D8=A wire-through completion).

#121 reworked (vinsew): self-contained API keys. Two parts:

  1. config.ts: ANTHROPIC_API_KEY env merge was silently missing.
     loadConfig() merged OPENAI_API_KEY but not ANTHROPIC_API_KEY into
     the file-config-shape result. One-line addition.

  2. cli.ts:buildGatewayConfig: when ~/.gbrain/config.json declares
     openai_api_key / anthropic_api_key but the process env doesn't
     have those env vars set (common for launchd-spawned daemons,
     agent subprocess tools, containers that don't propagate
     ~/.zshrc), fold the config-file values into the gateway env
     snapshot. Process env still wins (loaded last) so per-process
     overrides keep working.

Tests (4 cases in test/ai/no-batch-cap-suppression.test.ts):
- Ollama / LiteLLM / llama-server all declare no_batch_cap: true
- configureGateway does NOT warn for those three
- configureGateway STILL warns for google (regression guard)
- Cross-cutting invariant: empty-models recipes declare user_provided_models

Tests: bun test test/ai/ — 128/128 (4 new + 124 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 9 of 11).
#765 (Hunyuan PGLite + CJK keyword fallback) deferred to TODOS.md
follow-up; the CJK extraction (~150 lines + scoring logic + tests) is
larger than the wave's adjacent-fix lane should carry. Closes that PR
with a deferral note.

Co-Authored-By: alexandreroumieu-codeapprentice <noreply@github.com>
Co-Authored-By: vinsew <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant