Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: HKUDS/DeepTutor
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.4.1
Choose a base ref
...
head repository: HKUDS/DeepTutor
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.4.2
Choose a head ref
  • 7 commits
  • 39 files changed
  • 4 contributors

Commits on May 17, 2026

  1. fix(visualize): unblock Gemini 2.5+, harden Visualize pipeline

    Six bugs that combined to break the Visualize capability on Gemini 2.5
    Flash (and similar thinking-by-default models). Each is independently
    useful, but the user-visible symptom — Visualize "kind of works but
    output is randomly truncated at ~370 chars" — needs all of them.
    
    1. Gemini 2.5/3.x reasoning-tokens default (root cause)
    
       Gemini 2.5+ models burn most of `max_tokens` on internal "thinking"
       tokens by default. With `max_tokens=4096`, ~3900 went to reasoning
       and only ~160 came out as actual content, causing finish_reason=length
       on every multi-step pipeline (Visualize codegen + review, Deep Solve,
       anything that asks for a structured output beyond a sentence).
    
       Default `reasoning_effort="none"` for Gemini 2.5/3.x models when the
       caller doesn't specify, in all three execution paths:
       - provider_core/openai_compat_provider.py:_build_kwargs (live path)
       - executors.py:sdk_complete / sdk_stream (legacy SDK path)
       - cloud_provider.py:_openai_complete / _openai_stream (aiohttp fallback)
    
    2. visualize capability had no agents.yaml entry
    
       `get_agent_params("visualize")` silently fell through to the 4096
       default because there was no section_map entry and no
       DEFAULT_AGENTS_SETTINGS entry. Added both, with a 16384-token budget
       appropriate for full HTML pages.
    
    3. Review stage crashed hard on JSON parse failure
    
       `ReviewAgent.process` does `ReviewResult.model_validate(extract_json_object(response))`.
       When the model returned prose instead of JSON (common with large SVGs
       that the model can't escape into a JSON string), the parse raised and
       killed the entire turn. Wrapped pipeline.run_review() in try/except
       so review failure falls back to the unreviewed draft and the user
       still gets a rendered result.
    
    4. Codegen output not trimmed to the root tag
    
       Models often wrap SVG/HTML in prose ("Here you go: <svg>…</svg>
       Enjoy!") or emit a closing code fence on the same line as `</html>`,
       which `extract_code_block`'s regex (requiring a leading \n before
       the fence) doesn't strip. Added defensive root-tag trimming for
       render_type=="svg" and render_type=="html".
    
    Verified end-to-end on Gemini 2.5 Flash via the CLI and headless
    Playwright: full 22 KB long-division HTML page, no truncation, all
    interactive elements present, multi-step walkthrough completes
    correctly (7852 ÷ 6 → 1308 R 4).
    Sam committed May 17, 2026
    Configuration menu
    Copy the full SHA
    6c2615d View commit details
    Browse the repository at this point in the history

Commits on May 28, 2026

  1. Configuration menu
    Copy the full SHA
    9329879 View commit details
    Browse the repository at this point in the history
  2. fix(auth): make require_auth async so the user ContextVar reaches the…

    … endpoint (#485)
    
    Fixes #481. Makes `require_auth` and `require_admin` `async def` so the user ContextVar set inside the dep is visible to async endpoints; sync deps run via `anyio.to_thread.run_sync` under `copy_context()` and the set is discarded on return.
    truffle-dev authored May 28, 2026
    Configuration menu
    Copy the full SHA
    69ea8b3 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #490: fix(visualize): unblock Gemini 2.5+ and hard…

    …en Visualize pipeline
    
    Squash-merge of #490 (skinred78) with two non-trivial conflict resolutions:
    
    * `openai_compat_provider.py`: PR inlined reasoning_effort logic that has since
      been refactored into `build_openai_compatible_reasoning_kwargs`. Keep the
      helper call; the Gemini default-down will be folded into the helper in a
      follow-up commit instead of duplicating it inline.
    
    * `capabilities/visualize.py`: i18n was applied to the review-stage messages
      after the PR forked. Kept the new try/except review-fallback structure but
      routed all three messages through `i18n.t(...)`; added `review_skipped_error`
      key to en/zh visualize.yaml.
    
    Other PR changes applied as-is:
    * `executors.py` / `cloud_provider.py`: inline Gemini 2.5/3.x default-down
    * `loader.py`: section_map entry for `visualize`
    * `init.py`: `capabilities.visualize` default with max_tokens=16384
    * `code_generator_agent.py`: defensive root-tag trim for svg/html
    
    Closes #489.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    pancacake and claude committed May 28, 2026
    Configuration menu
    Copy the full SHA
    44564b0 View commit details
    Browse the repository at this point in the history
  4. refactor: centralize Gemini default-off reasoning in reasoning_params

    Follow-up to #490. The PR inlined the "disable thinking for Gemini 2.5/3"
    gate in 5 places across 3 files. This commit collapses them to one
    registry and three thin call sites.
    
    Changes:
    
    * `services/llm/reasoning_params.py`: new `_PROVIDER_DEFAULT_OFF_PATTERNS`
      registry + `default_reasoning_effort_for(provider, model)` public helper.
      `build_openai_compatible_reasoning_kwargs` now consults the registry, so
      the openai-compat path (which lost its inline gate during the #490 merge
      conflict resolution) is restored via the helper.
    
    * `services/llm/executors.py` (sdk_complete + sdk_stream) and
      `services/llm/cloud_provider.py` (_openai_complete + _openai_stream) now
      call `default_reasoning_effort_for(...)` instead of inlining the
      ('gemini-2.5', 'gemini-3') startswith check.
    
    * Use substring (not startswith) match so `models/gemini-2.5-flash` is
      also covered — some OpenAI-compat clients prefix model ids with `models/`.
    
    * `services/config/loader.py:get_agent_params`: when a module's section is
      missing from the user's stale `agents.yaml`, fall back through
      `DEFAULT_AGENTS_SETTINGS` before the global `(0.5, 4096)` default. This
      lets the `capabilities.visualize` default (`max_tokens=16384` from #490)
      reach existing installs, not just fresh ones.
    
    * `capabilities/visualize.py`: hoist the duplicated lazy
      `from deeptutor.agents.visualize.models import ReviewResult` import
      from two branches into the top of `run()`.
    
    Tests:
    
    * `tests/services/llm/test_reasoning_params.py` — 17 new cases covering
      Gemini 2.5/3 + `models/` prefix + case-insensitivity + the legacy
      Gemini 1.5/2.0 / other-provider untouched paths + the explicit-override
      takes-precedence rule.
    * All 107 tests in `tests/services/llm/` still pass; 5 pre-existing
      `test_chat_params_config` failures (8192 vs 8000 drift) are unrelated.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    pancacake and claude committed May 28, 2026
    Configuration menu
    Copy the full SHA
    db41c57 View commit details
    Browse the repository at this point in the history
  5. improve chat

    pancacake committed May 28, 2026
    Configuration menu
    Copy the full SHA
    c54743b View commit details
    Browse the repository at this point in the history
  6. release: v1.4.2 — Gemini 2.5+ unblocked, auth ContextVar fix, smooth …

    …streaming everywhere
    
    - Centralize Gemini 2.5/3 reasoning_effort=none in
      reasoning_params.default_reasoning_effort_for so Visualize / Chat /
      Solve / agentic loop stop returning empty bodies on those models.
    - Visualize: per-capability max_tokens default (16k) seeded from
      DEFAULT_AGENTS_SETTINGS, defensive root-tag trim on SVG/HTML output,
      graceful fallback when JSON-mode review step crashes.
    - Fix #485: require_auth / require_admin are async so the
      set_current_user ContextVar reaches the endpoint instead of being
      discarded by anyio.to_thread.run_sync's worker-thread context copy.
      Adds _install_current_user helper shared by HTTP + WebSocket.
    - Reasoning + native-tools chat protocol: formal content stream must
      still start with FINISH/TOOL/THINK/PAUSE; tool-call deltas no longer
      force-resolve labels and implicit_think_label is ignored, so
      protocol-repair catches missing labels instead of mis-routing turns.
    - Smooth streaming on every chat surface: useSmoothStreamText through
      AssistantResponse, pin-to-bottom (useLayoutEffect) on book chat +
      quiz follow-up, data-chat-scroll-root opt-in for overflow-anchor:none.
    - Sidebar: collapsible Recents region with own scroll viewport,
      deterministic Lucide icon per session via SessionAvatar, Docs link
      next to GitHub footer, "New Chat" button removed (nav handles it).
    - Add Lemonade local provider (port 13305) — registry entry, README
      Docker host-gateway row, providers.md docs.
    - Context-window models-endpoint probe honors DISABLE_SSL_VERIFY via
      TCPConnector(ssl=False).
    - README: insert v1.4.2 release row, push v1.3.10 inside "Past
      releases" fold, bump install opt-in note to v1.4.2.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    pancacake and claude committed May 28, 2026
    Configuration menu
    Copy the full SHA
    c4d4766 View commit details
    Browse the repository at this point in the history
Loading