Skip to content

feat: image vision analysis, fix first-message 400, fix /model default reset#1642

Closed
imkingjh999 wants to merge 2 commits into
Hmbown:mainfrom
imkingjh999:feature/image-parsing
Closed

feat: image vision analysis, fix first-message 400, fix /model default reset#1642
imkingjh999 wants to merge 2 commits into
Hmbown:mainfrom
imkingjh999:feature/image-parsing

Conversation

@imkingjh999

@imkingjh999 imkingjh999 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add vision analysis with object bounding boxes (0–1000 normalised coordinates)
  • Restructure: move vision_bridge.rsvision/bridge.rs module with structured types (BBox, VisualPrimitive)
  • Slim down tools.rs to delegate to bridge module
  • Add primitives config flag for VisionModelConfig
  • Add image_analyze instruction to system prompt
  • Fix 400 error on first message after entering TUI (sync provider to config)
  • Fix /model re-selecting same provider resetting model to default

primitives = true vs false

primitives = true primitives = false
Bounding boxes Each object gets a normalised bbox [x1,y1,x2,y2] (0–1000) No coordinates returned
Distance comparison Can accurately compare distances between multiple points in the image Cannot — text-only description lacks spatial precision
Recommended model qwen3.5-omni-plus-2026-03-15 (free, 1M calls) Any vision model

With primitives = true + qwen3.5-omni-plus-2026-03-15 (free tier), multi-point distance accuracy exceeds ChatGPT's vision capability.

Test plan

  • Paste an image and verify vision analysis runs automatically
  • Verify bounding boxes are returned in the vision context
  • Start TUI, send first message — should not 400
  • Switch to deepseek provider, pick a non-default model, re-select deepseek — model should persist
  • Run cargo test -p deepseek-tui -- vision — 19 tests pass

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a standalone vision analysis bridge, adding the vision_bridge module to handle structured image descriptions and visual primitives with normalized bounding boxes. It includes logic for prompt construction, response parsing, and an HTTP client for vision API requests. Reviewers identified several improvement opportunities: removing the unused PerModelContextConfig struct, replacing eprintln! calls with a proper logging framework to prevent TUI display corruption, reusing the reqwest::Client instance for better connection pooling, and extending the request timeout to cover response body parsing.

Comment thread crates/tui/src/config.rs Outdated
Comment thread crates/tui/src/vision_bridge.rs Outdated
Comment thread crates/tui/src/vision_bridge.rs Outdated
Comment thread crates/tui/src/vision_bridge.rs Outdated
@imkingjh999 imkingjh999 changed the title feat: image parsing & vision analysis, fix first-message 400, fix /model default feat: image parsing and vision analysis with bounding boxes May 14, 2026
@imkingjh999 imkingjh999 changed the title feat: image parsing and vision analysis with bounding boxes feat: image vision analysis, fix first-message 400, fix /model default reset May 14, 2026
…0, fix /model reset

- Add vision analysis with 0-1000 normalised bounding boxes (bridge.rs)
- Slim down tools.rs to delegate to bridge module
- Add primitives config flag to VisionModelConfig
- Add image_analyze instruction to system prompt
- Fix 400 error on first message (sync provider to config)
- Fix /model re-selecting same provider resetting model to default
@imkingjh999 imkingjh999 force-pushed the feature/image-parsing branch from ff44868 to dbb75e9 Compare May 14, 2026 16:16
@imkingjh999 imkingjh999 marked this pull request as ready for review May 14, 2026 16:44
@Hmbown

Hmbown commented May 14, 2026

Copy link
Copy Markdown
Owner

Provider/model switching bug-fix portions were harvested into #1649 and merged to main with contributor credit. That covered the /model default reset/provider-selected model behavior for #1632 without taking the image vision feature into v0.8.38.

Leaving this PR open for maintainer judgment on the remaining vision feature and any other non-harvested scope.

@Hmbown

Hmbown commented May 17, 2026

Copy link
Copy Markdown
Owner

Thanks for the contribution. We harvested the narrow provider/model default-reset fix into #1649, but this PR mixes model selection fixes with a larger vision feature surface. For v0.8.40 we are keeping a high bar after the /model regression: small repro-backed fixes only, no mixed feature bundles. Please split any remaining work into one focused PR with tests and a single user-facing behavior.

@Hmbown Hmbown closed this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants