-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Comparing changes
Open a pull request
base repository: HKUDS/DeepTutor
base: v1.4.1
head repository: HKUDS/DeepTutor
compare: v1.4.2
- 7 commits
- 39 files changed
- 4 contributors
Commits on May 17, 2026
-
fix(visualize): unblock Gemini 2.5+, harden Visualize pipeline
Six bugs that combined to break the Visualize capability on Gemini 2.5 Flash (and similar thinking-by-default models). Each is independently useful, but the user-visible symptom — Visualize "kind of works but output is randomly truncated at ~370 chars" — needs all of them. 1. Gemini 2.5/3.x reasoning-tokens default (root cause) Gemini 2.5+ models burn most of `max_tokens` on internal "thinking" tokens by default. With `max_tokens=4096`, ~3900 went to reasoning and only ~160 came out as actual content, causing finish_reason=length on every multi-step pipeline (Visualize codegen + review, Deep Solve, anything that asks for a structured output beyond a sentence). Default `reasoning_effort="none"` for Gemini 2.5/3.x models when the caller doesn't specify, in all three execution paths: - provider_core/openai_compat_provider.py:_build_kwargs (live path) - executors.py:sdk_complete / sdk_stream (legacy SDK path) - cloud_provider.py:_openai_complete / _openai_stream (aiohttp fallback) 2. visualize capability had no agents.yaml entry `get_agent_params("visualize")` silently fell through to the 4096 default because there was no section_map entry and no DEFAULT_AGENTS_SETTINGS entry. Added both, with a 16384-token budget appropriate for full HTML pages. 3. Review stage crashed hard on JSON parse failure `ReviewAgent.process` does `ReviewResult.model_validate(extract_json_object(response))`. When the model returned prose instead of JSON (common with large SVGs that the model can't escape into a JSON string), the parse raised and killed the entire turn. Wrapped pipeline.run_review() in try/except so review failure falls back to the unreviewed draft and the user still gets a rendered result. 4. Codegen output not trimmed to the root tag Models often wrap SVG/HTML in prose ("Here you go: <svg>…</svg> Enjoy!") or emit a closing code fence on the same line as `</html>`, which `extract_code_block`'s regex (requiring a leading \n before the fence) doesn't strip. Added defensive root-tag trimming for render_type=="svg" and render_type=="html". Verified end-to-end on Gemini 2.5 Flash via the CLI and headless Playwright: full 22 KB long-division HTML page, no truncation, all interactive elements present, multi-step walkthrough completes correctly (7852 ÷ 6 → 1308 R 4).Sam committedMay 17, 2026 Configuration menu - View commit details
-
Copy full SHA for 6c2615d - Browse repository at this point
Copy the full SHA 6c2615dView commit details
Commits on May 28, 2026
-
Configuration menu - View commit details
-
Copy full SHA for 9329879 - Browse repository at this point
Copy the full SHA 9329879View commit details -
fix(auth): make require_auth async so the user ContextVar reaches the…
Configuration menu - View commit details
-
Copy full SHA for 69ea8b3 - Browse repository at this point
Copy the full SHA 69ea8b3View commit details -
Merge pull request #490: fix(visualize): unblock Gemini 2.5+ and hard…
…en Visualize pipeline Squash-merge of #490 (skinred78) with two non-trivial conflict resolutions: * `openai_compat_provider.py`: PR inlined reasoning_effort logic that has since been refactored into `build_openai_compatible_reasoning_kwargs`. Keep the helper call; the Gemini default-down will be folded into the helper in a follow-up commit instead of duplicating it inline. * `capabilities/visualize.py`: i18n was applied to the review-stage messages after the PR forked. Kept the new try/except review-fallback structure but routed all three messages through `i18n.t(...)`; added `review_skipped_error` key to en/zh visualize.yaml. Other PR changes applied as-is: * `executors.py` / `cloud_provider.py`: inline Gemini 2.5/3.x default-down * `loader.py`: section_map entry for `visualize` * `init.py`: `capabilities.visualize` default with max_tokens=16384 * `code_generator_agent.py`: defensive root-tag trim for svg/html Closes #489. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 44564b0 - Browse repository at this point
Copy the full SHA 44564b0View commit details -
refactor: centralize Gemini default-off reasoning in reasoning_params
Follow-up to #490. The PR inlined the "disable thinking for Gemini 2.5/3" gate in 5 places across 3 files. This commit collapses them to one registry and three thin call sites. Changes: * `services/llm/reasoning_params.py`: new `_PROVIDER_DEFAULT_OFF_PATTERNS` registry + `default_reasoning_effort_for(provider, model)` public helper. `build_openai_compatible_reasoning_kwargs` now consults the registry, so the openai-compat path (which lost its inline gate during the #490 merge conflict resolution) is restored via the helper. * `services/llm/executors.py` (sdk_complete + sdk_stream) and `services/llm/cloud_provider.py` (_openai_complete + _openai_stream) now call `default_reasoning_effort_for(...)` instead of inlining the ('gemini-2.5', 'gemini-3') startswith check. * Use substring (not startswith) match so `models/gemini-2.5-flash` is also covered — some OpenAI-compat clients prefix model ids with `models/`. * `services/config/loader.py:get_agent_params`: when a module's section is missing from the user's stale `agents.yaml`, fall back through `DEFAULT_AGENTS_SETTINGS` before the global `(0.5, 4096)` default. This lets the `capabilities.visualize` default (`max_tokens=16384` from #490) reach existing installs, not just fresh ones. * `capabilities/visualize.py`: hoist the duplicated lazy `from deeptutor.agents.visualize.models import ReviewResult` import from two branches into the top of `run()`. Tests: * `tests/services/llm/test_reasoning_params.py` — 17 new cases covering Gemini 2.5/3 + `models/` prefix + case-insensitivity + the legacy Gemini 1.5/2.0 / other-provider untouched paths + the explicit-override takes-precedence rule. * All 107 tests in `tests/services/llm/` still pass; 5 pre-existing `test_chat_params_config` failures (8192 vs 8000 drift) are unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for db41c57 - Browse repository at this point
Copy the full SHA db41c57View commit details -
Configuration menu - View commit details
-
Copy full SHA for c54743b - Browse repository at this point
Copy the full SHA c54743bView commit details -
release: v1.4.2 — Gemini 2.5+ unblocked, auth ContextVar fix, smooth …
…streaming everywhere - Centralize Gemini 2.5/3 reasoning_effort=none in reasoning_params.default_reasoning_effort_for so Visualize / Chat / Solve / agentic loop stop returning empty bodies on those models. - Visualize: per-capability max_tokens default (16k) seeded from DEFAULT_AGENTS_SETTINGS, defensive root-tag trim on SVG/HTML output, graceful fallback when JSON-mode review step crashes. - Fix #485: require_auth / require_admin are async so the set_current_user ContextVar reaches the endpoint instead of being discarded by anyio.to_thread.run_sync's worker-thread context copy. Adds _install_current_user helper shared by HTTP + WebSocket. - Reasoning + native-tools chat protocol: formal content stream must still start with FINISH/TOOL/THINK/PAUSE; tool-call deltas no longer force-resolve labels and implicit_think_label is ignored, so protocol-repair catches missing labels instead of mis-routing turns. - Smooth streaming on every chat surface: useSmoothStreamText through AssistantResponse, pin-to-bottom (useLayoutEffect) on book chat + quiz follow-up, data-chat-scroll-root opt-in for overflow-anchor:none. - Sidebar: collapsible Recents region with own scroll viewport, deterministic Lucide icon per session via SessionAvatar, Docs link next to GitHub footer, "New Chat" button removed (nav handles it). - Add Lemonade local provider (port 13305) — registry entry, README Docker host-gateway row, providers.md docs. - Context-window models-endpoint probe honors DISABLE_SSL_VERIFY via TCPConnector(ssl=False). - README: insert v1.4.2 release row, push v1.3.10 inside "Past releases" fold, bump install opt-in note to v1.4.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for c4d4766 - Browse repository at this point
Copy the full SHA c4d4766View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v1.4.1...v1.4.2