[Bug]: Visualize (and other capabilities) silently truncate on Gemini 2.5 / 3.x due to default thinking tokens

## Summary

Gemini 2.5 / 3.x models ship with thinking enabled by default. When DeepTutor sends `max_tokens=4096` (the Visualize default) or even larger budgets, Gemini spends most of that budget on internal reasoning tokens and the model returns truncated output with `finish_reason=length` — often before any visible content has streamed. Pipelines that need structured output (Visualize codegen + review, Deep Solve writing, etc.) appear to half-produce results and either crash downstream parsers or render unusable artifacts.

## Repro

1. Configure DeepTutor with `LLM_BINDING=gemini`, `LLM_MODEL=gemini-2.5-flash` (any Gemini 2.5/3.x flash-tier model reproduces).
2. In the chat UI, switch the mode to **Visualize** and pick any non-trivial prompt — e.g.
   > Build an interactive long-division tutor for Year 7 that teaches the algorithm one digit at a time. Use the problem 7852 / 6. Walk me through divide / multiply / subtract / bring down for each digit with input boxes and a Check button.
3. Observe: stream halts mid-output (HTML cut off mid-CSS or mid-script, SVG cut off mid-tag). Codegen returns ~300-2000 chars and then stops.

Reproducible against the Gemini OpenAI-compat endpoint independently:

```bash
curl -s 'https://generativelanguage.googleapis.com/v1beta/openai/chat/completions' \
  -H \"Authorization: Bearer \$KEY\" -H 'Content-Type: application/json' \
  -d '{\"model\":\"gemini-2.5-flash\",\"stream\":false,\"max_tokens\":4096,
       \"messages\":[{\"role\":\"user\",
       \"content\":\"Write a 50-line SVG of cookies, just the SVG.\"}]}' \
  | jq '{finish:.choices[0].finish_reason, len:(.choices[0].message.content|length), usage}'
```

Result with thinking enabled (default):
\`\`\`
{ \"finish\": \"length\", \"len\": 360, \"usage\": {\"prompt_tokens\":24, \"completion_tokens\":164, \"total_tokens\":4116} }
\`\`\`

Result with `reasoning_effort: \"none\"` added to the request:
\`\`\`
{ \"finish\": \"stop\", \"len\": 2594, \"usage\": {\"prompt_tokens\":24, \"completion_tokens\":1426, \"total_tokens\":1450} }
\`\`\`

`total_tokens - prompt - completion` in the first call (~3900 tokens) is exactly the missing reasoning-token budget that ate the response.

## Why DeepTutor doesn't currently mitigate this

`OpenAICompatProvider._build_kwargs` (the live path) only auto-injects `reasoning_effort=\"high\"` when a model matches `spec.reasoning_model_patterns`. The Gemini `ProviderSpec` doesn't set those patterns. There's no default-down behavior for thinking-by-default models, so they silently consume the budget.

Also contributing (already filed as related smaller bugs but worth noting):

- The Visualize capability has no entry in `agents.yaml` / `DEFAULT_AGENTS_SETTINGS` / `loader.py:section_map`, so it silently uses the 4096-token default instead of a higher budget appropriate for HTML pages.
- `ReviewAgent.process` raises a JSONDecodeError that propagates up and kills the whole Visualize turn when the model returns prose instead of strict JSON (downstream consequence of the truncation above).

## Proposed fix

Disable thinking by default for Gemini 2.5/3.x in the three execution paths (`openai_compat_provider`, `executors`, `cloud_provider`) — caller can still opt in via explicit `reasoning_effort`. Plus the three smaller adjacent fixes (agents.yaml entry, review-stage graceful fallback, codegen tag-trim).

PR coming.

## Environment

- DeepTutor: current `dev` branch (cade789)
- LLM_BINDING=gemini, LLM_MODEL=gemini-2.5-flash
- Setup Tour install on macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Visualize (and other capabilities) silently truncate on Gemini 2.5 / 3.x due to default thinking tokens #489

Summary

Repro

Why DeepTutor doesn't currently mitigate this

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Visualize (and other capabilities) silently truncate on Gemini 2.5 / 3.x due to default thinking tokens #489

Description

Summary

Repro

Why DeepTutor doesn't currently mitigate this

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions