Comparing changes

Six bugs that combined to break the Visualize capability on Gemini 2.5 Flash (and similar thinking-by-default models). Each is independently useful, but the user-visible symptom — Visualize "kind of works but output is randomly truncated at ~370 chars" — needs all of them. 1. Gemini 2.5/3.x reasoning-tokens default (root cause) Gemini 2.5+ models burn most of `max_tokens` on internal "thinking" tokens by default. With `max_tokens=4096`, ~3900 went to reasoning and only ~160 came out as actual content, causing finish_reason=length on every multi-step pipeline (Visualize codegen + review, Deep Solve, anything that asks for a structured output beyond a sentence). Default `reasoning_effort="none"` for Gemini 2.5/3.x models when the caller doesn't specify, in all three execution paths: - provider_core/openai_compat_provider.py:_build_kwargs (live path) - executors.py:sdk_complete / sdk_stream (legacy SDK path) - cloud_provider.py:_openai_complete / _openai_stream (aiohttp fallback) 2. visualize capability had no agents.yaml entry `get_agent_params("visualize")` silently fell through to the 4096 default because there was no section_map entry and no DEFAULT_AGENTS_SETTINGS entry. Added both, with a 16384-token budget appropriate for full HTML pages. 3. Review stage crashed hard on JSON parse failure `ReviewAgent.process` does `ReviewResult.model_validate(extract_json_object(response))`. When the model returned prose instead of JSON (common with large SVGs that the model can't escape into a JSON string), the parse raised and killed the entire turn. Wrapped pipeline.run_review() in try/except so review failure falls back to the unreviewed draft and the user still gets a rendered result. 4. Codegen output not trimmed to the root tag Models often wrap SVG/HTML in prose ("Here you go: <svg>…</svg> Enjoy!") or emit a closing code fence on the same line as `</html>`, which `extract_code_block`'s regex (requiring a leading \n before the fence) doesn't strip. Added defensive root-tag trimming for render_type=="svg" and render_type=="html". Verified end-to-end on Gemini 2.5 Flash via the CLI and headless Playwright: full 22 KB long-division HTML page, no truncation, all interactive elements present, multi-step walkthrough completes correctly (7852 ÷ 6 → 1308 R 4).

… endpoint (#485) Fixes #481. Makes `require_auth` and `require_admin` `async def` so the user ContextVar set inside the dep is visible to async endpoints; sync deps run via `anyio.to_thread.run_sync` under `copy_context()` and the set is discarded on return.

…en Visualize pipeline Squash-merge of #490 (skinred78) with two non-trivial conflict resolutions: * `openai_compat_provider.py`: PR inlined reasoning_effort logic that has since been refactored into `build_openai_compatible_reasoning_kwargs`. Keep the helper call; the Gemini default-down will be folded into the helper in a follow-up commit instead of duplicating it inline. * `capabilities/visualize.py`: i18n was applied to the review-stage messages after the PR forked. Kept the new try/except review-fallback structure but routed all three messages through `i18n.t(...)`; added `review_skipped_error` key to en/zh visualize.yaml. Other PR changes applied as-is: * `executors.py` / `cloud_provider.py`: inline Gemini 2.5/3.x default-down * `loader.py`: section_map entry for `visualize` * `init.py`: `capabilities.visualize` default with max_tokens=16384 * `code_generator_agent.py`: defensive root-tag trim for svg/html Closes #489. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to #490. The PR inlined the "disable thinking for Gemini 2.5/3" gate in 5 places across 3 files. This commit collapses them to one registry and three thin call sites. Changes: * `services/llm/reasoning_params.py`: new `_PROVIDER_DEFAULT_OFF_PATTERNS` registry + `default_reasoning_effort_for(provider, model)` public helper. `build_openai_compatible_reasoning_kwargs` now consults the registry, so the openai-compat path (which lost its inline gate during the #490 merge conflict resolution) is restored via the helper. * `services/llm/executors.py` (sdk_complete + sdk_stream) and `services/llm/cloud_provider.py` (_openai_complete + _openai_stream) now call `default_reasoning_effort_for(...)` instead of inlining the ('gemini-2.5', 'gemini-3') startswith check. * Use substring (not startswith) match so `models/gemini-2.5-flash` is also covered — some OpenAI-compat clients prefix model ids with `models/`. * `services/config/loader.py:get_agent_params`: when a module's section is missing from the user's stale `agents.yaml`, fall back through `DEFAULT_AGENTS_SETTINGS` before the global `(0.5, 4096)` default. This lets the `capabilities.visualize` default (`max_tokens=16384` from #490) reach existing installs, not just fresh ones. * `capabilities/visualize.py`: hoist the duplicated lazy `from deeptutor.agents.visualize.models import ReviewResult` import from two branches into the top of `run()`. Tests: * `tests/services/llm/test_reasoning_params.py` — 17 new cases covering Gemini 2.5/3 + `models/` prefix + case-insensitivity + the legacy Gemini 1.5/2.0 / other-provider untouched paths + the explicit-override takes-precedence rule. * All 107 tests in `tests/services/llm/` still pass; 5 pre-existing `test_chat_params_config` failures (8192 vs 8000 drift) are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…streaming everywhere - Centralize Gemini 2.5/3 reasoning_effort=none in reasoning_params.default_reasoning_effort_for so Visualize / Chat / Solve / agentic loop stop returning empty bodies on those models. - Visualize: per-capability max_tokens default (16k) seeded from DEFAULT_AGENTS_SETTINGS, defensive root-tag trim on SVG/HTML output, graceful fallback when JSON-mode review step crashes. - Fix #485: require_auth / require_admin are async so the set_current_user ContextVar reaches the endpoint instead of being discarded by anyio.to_thread.run_sync's worker-thread context copy. Adds _install_current_user helper shared by HTTP + WebSocket. - Reasoning + native-tools chat protocol: formal content stream must still start with FINISH/TOOL/THINK/PAUSE; tool-call deltas no longer force-resolve labels and implicit_think_label is ignored, so protocol-repair catches missing labels instead of mis-routing turns. - Smooth streaming on every chat surface: useSmoothStreamText through AssistantResponse, pin-to-bottom (useLayoutEffect) on book chat + quiz follow-up, data-chat-scroll-root opt-in for overflow-anchor:none. - Sidebar: collapsible Recents region with own scroll viewport, deterministic Lucide icon per session via SessionAvatar, Docs link next to GitHub footer, "New Chat" button removed (nav handles it). - Add Lemonade local provider (port 13305) — registry entry, README Docker host-gateway row, providers.md docs. - Context-window models-endpoint probe honors DISABLE_SSL_VERIFY via TCPConnector(ssl=False). - README: insert v1.4.2 release row, push v1.3.10 inside "Past releases" fold, bump install opt-in note to v1.4.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on May 17, 2026

Commits on May 28, 2026

This comparison is taking too long to generate.

Uh oh!