Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Oct 16, 2025

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option
1 2

Close #16597

@ServeurpersoCom
Copy link
Collaborator Author

I have to do a little cleaning, the patch was not merged properly on my side. -> draft

@ServeurpersoCom ServeurpersoCom marked this pull request as ready for review October 18, 2025 21:09
@ServeurpersoCom
Copy link
Collaborator Author

This PR is now clean, but it was developed after this one: #16562

@ServeurpersoCom ServeurpersoCom marked this pull request as draft October 18, 2025 21:10
@ServeurpersoCom ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch 2 times, most recently from 0fe776d to 02df5a1 Compare October 18, 2025 21:21
@ServeurpersoCom ServeurpersoCom marked this pull request as ready for review October 18, 2025 21:22
@allozaur
Copy link
Collaborator

Alright, @ServeurpersoCom, let's move forward with this one after merging #16562 ;) Let me know when you've addressed the merge conflicts and I'll gladly review the code

@ServeurpersoCom
Copy link
Collaborator Author

For the tool call inspector, do you prefer having one spoiler block per tool call, or a single aggregated spoiler wrapping all tool calls in the message?

Sans titre

It's rebased/reworked now. I push --force :)

@ServeurpersoCom ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 02df5a1 to a5cff84 Compare October 22, 2025 17:00
@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 22, 2025

Feel free to dissect the architecture as deep as you want! Component boundaries, store coupling, service layering, anything that smells non-idiomatic.
Also, if we end up polishing this feature further, I’m thinking it could live in a dedicated module for cleaner boundaries ?

lib/
 └─ toolcalls/
     ├─ toolcall-service.ts
     ├─ toolcall-store.ts
     ├─ ToolCallBlock.svelte
     └─ ToolCallItem.svelte

@ServeurpersoCom
Copy link
Collaborator Author

And we could even imagine the architecture being reusable later : like having a small JavaScript execution module decoupled from the UI, so the model could actually interact with a JS thread it coded itself.
That would also cover, in a more generic way, the proposal from PR #13501 by @samolego but in this case, the model would generate and run its own JS tools. Done properly, it’s no more of a security risk than the HTML/JS preview you get in Hugging Face Chat or Claude!
Sans titre

@ServeurpersoCom
Copy link
Collaborator Author

Includes a very small optimization from the previous PR (scroll listener removal). It landed here intentionally :D

@ServeurpersoCom
Copy link
Collaborator Author

Testing :

Add this

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "simple_addition_tool",
        "description": "A dummy calculator tool used for testing multi-argument tool call streaming.",
        "parameters": {
          "type": "object",
          "properties": {
            "a": {
              "type": "number",
              "description": "The first number to add."
            },
            "b": {
              "type": "number",
              "description": "The second number to add."
            }
          },
          "required": ["a", "b"]
        }
      }
    }
  ]
}

Here :

Sans titre

And ask model :

2

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few cosmetics

@ServeurpersoCom ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 7b1b1cc to 0ba18eb Compare November 13, 2025 19:10
@ServeurpersoCom
Copy link
Collaborator Author

Rebase / Format / Build

@allozaur
Copy link
Collaborator

@ServeurpersoCom please re-base & rebuild

ServeurpersoCom and others added 5 commits November 14, 2025 21:32
…and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option
…atMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
…atMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
@ServeurpersoCom ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 0ba18eb to 73e4023 Compare November 14, 2025 20:33
@ServeurpersoCom
Copy link
Collaborator Author

@ServeurpersoCom please re-base & rebuild

rebased and rebuilt

@allozaur allozaur merged commit 1411d92 into ggml-org:master Nov 15, 2025
14 checks passed
@SadaleNet
Copy link

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

image

@ServeurpersoCom
Copy link
Collaborator Author

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?
image

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

@ServeurpersoCom
Copy link
Collaborator Author

https://github.com/user-attachments/assets/ac188e22-9bbf-48a0-8e12-f655ec5a4ecd
We’re working hard with Alek on the MCP client. Here’s what it does in dev

@SadaleNet
Copy link

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?
image

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Nov 17, 2025

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?
image

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

@SadaleNet
Copy link

SadaleNet commented Nov 17, 2025

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?
image

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

Oh sure. Tell me more. I just need a starting point. Particularly on how to intercept the model's tool call signal and return the appropriate result to the model while using the UI of llama-server. Just to confirm, would this mechanism work in the current master branch of llama.cpp?

As for MCP, if it's not absolutely required, I guess I can explore that on my own later. :P

Again, thanks a lot for working on this feature. People like me are highly thankful of your work.

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

@ServeurpersoCom
Copy link
Collaborator Author

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

Absolutely

@SadaleNet
Copy link

I've figure out how to get this feature to work and created a repo for that: https://github.com/SadaleNet/llamacpp-tool-calling-python

Again, thanks for your groundwork. My script wouldn't be working without your greater prior work.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Nov 19, 2025

I've figure out how to get this feature to work and created a repo for that: https://github.com/SadaleNet/llamacpp-tool-calling-python

Again, thanks for your groundwork. My script wouldn't be working without your greater prior work.

Cool !

You can try this one; it's integrated in TypeScript. Just edit the config.js file to configure your MCP server. It's still under development; for the time being, I've put my MCP server's URL directly into it (easy to find with git grep). :) It supports WebSocket and Streamable-HTTP, and configurable agentic loop (chain of toolcall)

https://github.com/ServeurpersoCom/llama.cpp/tree/mcp-client-alpha I add full UI settings soon.

I have also a full backend Node.js OAI-Compat reverse proxy version supporting stdio/ws/streamable-http transport (testing-branch18 and nexts). It's like llama-swap, but for adding tools to any OAI basic client (bot, "jarvis" style voice assistant, etc.)

Both handle the context well, they allow for long chains of complex development if you put an MCP server like the one I put as an example, it performs almost as well as Claude in computer use depending on the model.

SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
webui: add system message in export conversation, support upload conversation with system message
Webui: show upload only when in new conversation
Webui: Add model name
webui: increase height of chat message window when clicking editing
Webui: autoclose settings dialog dropdown and maximze screen width when zoom in
webui: fix date issues and add more dates
webui: change error to toast.error.
server: add n_past and slot_id in props_simple
webui: add cache tokens, context and prompt speed in chat
webui: modernize ui
webui: change welcome message
webui: change speed display
webui: change run python icon
webui: add config to use server defaults for sampler
webui: put speed on left and context on right

webui: recognize AsciiDoc files as valid text files (ggml-org#16850)

* webui: recognize AsciiDoc files as valid text files

* webui: add an updated static webui build

* webui: add the updated dependency list

* webui: re-add an updated static webui build

Add a setting to display message generation statistics (ggml-org#16901)

* feat: Add setting to display message generation statistics

* chore: build static webui output

webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (ggml-org#16757)

* webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe dialog

Extended MarkdownContent to flag previewable code languages,
add a preview button alongside copy controls, manage preview
dialog state, and share styling for the new button group

Introduced CodePreviewDialog.svelte, a sandboxed iframe modal
for rendering HTML/JS previews with consistent dialog controls

* webui: fullscreen HTML preview dialog using bits-ui

* Update tools/server/webui/src/lib/components/app/misc/CodePreviewDialog.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/misc/MarkdownContent.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: pedantic style tweak for CodePreviewDialog close button

* webui: remove overengineered preview language logic

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

webui: auto-refresh /props on inference start to resync model metadata (ggml-org#16784)

* webui: auto-refresh /props on inference start to resync model metadata

- Add no-cache headers to /props and /slots
- Throttle slot checks to 30s
- Prevent concurrent fetches with promise guard
- Trigger refresh from chat streaming for legacy and ModelSelector
- Show dynamic serverWarning when using cached data

* fix: restore proper legacy behavior in webui by using unified /props refresh

Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing

When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration

* fix: detect first valid SSE chunk and refresh server props once

* fix: removed the slots availability throttle constant and state

* webui: purge ai-generated cruft

* chore: update webui static build

feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508)

* webui : Revised LaTeX formula recognition

* webui : Further examples containg amounts

* webui : vitest for maskInlineLaTeX

* webui: Moved preprocessLaTeX to lib/utils

* webui: LaTeX in table-cells

* chore: update webui build output (use theirs)

* webui: backslash in LaTeX-preprocessing

* chore: update webui build output

* webui: look-behind backslash-check

* chore: update webui build output

* Apply suggestions from code review

Code maintenance (variable names, code formatting, string handling)

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: Moved constants to lib/constants.

* webui: package woff2 inside base64 data

* webui: LaTeX-line-break in display formula

* chore: update webui build output

* webui: Bugfix (font embedding)

* webui: Bugfix (font embedding)

* webui: vite embeds assets

* webui: don't suppress 404 (fonts)

* refactor: KaTeX integration with SCSS

Moves KaTeX styling to SCSS for better customization and font embedding.

This change includes:
- Adding `sass` as a dev dependency.
- Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding.
- Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables.

* fix: LaTeX processing within blockquotes

* webui: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

server : add props.model_alias (ggml-org#16943)

* server : add props.model_alias

webui: fix keyboard shortcuts for new chat & edit chat title (ggml-org#17007)

Better UX for handling multiple attachments in WebUI (ggml-org#17246)

webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (ggml-org#16618)

* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

webui: Fix clickability around chat processing statistics UI (ggml-org#17278)

* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output

Fix merge error

webui: Add a "Continue" Action for Assistant Message (ggml-org#16971)

* feat: Add "Continue" action for assistant messages

* feat: Continuation logic & prompt improvements

* chore: update webui build output

* feat: Improve logic for continuing the assistant message

* chore: update webui build output

* chore: Linting

* chore: update webui build output

* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message

* chore: update webui build output

* feat: Enable "Continue" button based on config & non-reasoning model type

* chore: update webui build output

* chore: Update packages with `npm audit fix`

* fix: Remove redundant error

* chore: update webui build output

* chore: Update `.gitignore`

* fix: Add missing change

* feat: Add auto-resizing for Edit Assistant/User Message textareas

* chore: update webui build output

Improved file naming & structure for UI components (ggml-org#17405)

* refactor: Component iles naming & structure

* chore: update webui build output

* refactor: Dialog titles + components namig

* chore: update webui build output

* refactor: Imports

* chore: update webui build output

webui: hide border of button

webui: update

webui: update

webui: update

add vision

webui: minor settings reorganization and add disable autoscroll option (ggml-org#17452)

* webui: added a dedicated 'Display' settings section that groups visualization options

* webui: added a Display setting to toggle automatic chat scrolling

* chore: update webui build output

Co-authored-by: firecoperana <firecoperana>
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
…ersistence in chat UI (ggml-org#16618)

* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add a debug option to display OpenAI-Compatible toolcall chunks in the WebUI

3 participants