feat: tool calling support for chat step#1591
Conversation
📝 WalkthroughWalkthroughThis PR introduces tool-calling support to the LLM system, enabling LLMs to invoke DAG-based tools during chat interactions. It adds tool definitions, tool call metadata, execution logic, provider integrations, and UI components to surface tool calls and definitions across the platform. Changes
Sequence DiagramsequenceDiagram
actor User
participant Chat Executor
participant LLM Provider
participant Tool Registry
participant Tool Executor
participant DAG Executor
User->>Chat Executor: Run chat with tools
Chat Executor->>Tool Registry: Load tool DAGs
Tool Registry-->>Chat Executor: Tool definitions
Chat Executor->>LLM Provider: Chat request with tools
LLM Provider-->>Chat Executor: Response with tool calls
alt Tool calls present
Chat Executor->>Tool Executor: Execute tool calls
loop For each tool call
Tool Executor->>Tool Registry: Get tool DAG
Tool Registry-->>Tool Executor: DAG
Tool Executor->>DAG Executor: Execute DAG
DAG Executor-->>Tool Executor: Result + SubDAGRun
end
Tool Executor-->>Chat Executor: Tool results
Chat Executor->>LLM Provider: Send tool results + conversation
LLM Provider-->>Chat Executor: Final response
end
Chat Executor-->>User: Final message + sub-runs + tool definitions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes span multiple architectural layers (API, core LLM, providers, runtime, UI) with new types, tool execution logic, provider integrations, and UI components. The tool executor and registry implementations contain density in parameter parsing and DAG-based execution orchestration. However, the changes follow consistent patterns and most file modifications are relatively focused within their domains. Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ui/src/features/dags/components/chat-history/StepMessagesTable.tsx (1)
272-276: Copy action should handle tool-call-only messages.
msg.contentcan be empty/undefined for tool-call messages, which makes the copy action fail or copy"undefined". Consider copying tool-call payloads when content is missing.🐛 Proposed fix
- handleCopy(msg.content, i); + const contentToCopy = + msg.content || + (msg.toolCalls?.length + ? JSON.stringify(msg.toolCalls) + : ''); + handleCopy(contentToCopy, i);
🤖 Fix all issues with AI agents
In `@internal/llm/providers/anthropic/anthropic.go`:
- Around line 77-110: In internal/llm/providers/anthropic/anthropic.go the
json.Marshal call that builds argsJSON for tool calls (when iterating
resp.Content and using block.Input to create llm.ToolCall) currently discards
the error; change this to check the error and return or wrap it (or log and
return a descriptive error) instead of ignoring it. Similarly, find the
json.Unmarshal usages (the one noted around the later parsing path that
currently does `if err := json.Unmarshal(...);` and ignores the result) and
handle errors consistently by returning or wrapping them into the surrounding
function's error flow (or logging and returning), ensuring any fallback only
runs after a failed unmarshal is detected and reported; update the functions
that construct the llm.ChatResponse and toolCalls to propagate these errors
rather than silently continuing.
In `@internal/llm/providers/gemini/gemini.go`:
- Around line 155-170: The switch handling ChatRequest.ToolChoice (used to set
geminiReq.ToolConfig) currently only supports "auto", "required", and "none" and
silently ignores named tools; update the functionCallingConfig struct (add
AllowedFunctionNames []string) and modify the switch in the code that sets
geminiReq.ToolConfig to include a default case that sets Mode to "ANY" and
populates AllowedFunctionNames with req.ToolChoice (so named tools are enforced
via allowedFunctionNames), ensuring you reference functionCallingConfig and
geminiReq.ToolConfig/ToolChoice when making the change.
In `@internal/llm/providers/openai/openai.go`:
- Around line 183-186: The current assignment sets chatReq.ToolChoice to a raw
string from req.ToolChoice but OpenAI expects either the standard strings
("auto", "required", "none") or an object of shape
{"type":"function","function":{"name":"..."}}, so update the logic around
req.ToolChoice/chatReq.ToolChoice to: if req.ToolChoice is one of the standard
strings, assign it unchanged; otherwise map the string to the required object
shape (i.e., create an object with "type":"function" and a nested "function" map
containing "name": req.ToolChoice) before assigning to chatReq.ToolChoice so
OpenAI receives the correct structure.
In `@internal/runtime/builtin/chat/tool_executor_test.go`:
- Around line 97-103: Replace use of assert with require in the table-driven
test that calls buildParamString: change assert.Equal(t, tc.expected, result) to
require.Equal(t, tc.expected, result) so failures stop the subtest immediately;
update the test imports to include "github.com/stretchr/testify/require" (and
remove or keep assert if used elsewhere), and update any other assertions in
this test function to use require for consistent failure semantics.
In `@internal/service/frontend/api/v2/transformer.go`:
- Around line 322-334: The code currently converts msg.ToolCalls into
apiMsg.ToolCalls unconditionally; change it so you only process and set
apiMsg.ToolCalls when the message is from the assistant role (e.g., check
msg.Author.Role or msg.Role equals the assistant role constant) and leave
apiMsg.ToolCalls nil otherwise; update the block around msg.ToolCalls to first
verify the message role is assistant, then build the toolCalls slice and assign
apiMsg.ToolCalls only in that branch (reference msg.ToolCalls and
apiMsg.ToolCalls to locate the code).
In `@ui/src/features/dags/components/DAGStatus.tsx`:
- Around line 148-161: The modal isn't receiving the fallback sub-DAG name
(derived as subDAGName from allSubRuns and n.step.call), so when the first
subRun provides dagName the modal title is blank; update the call to
setParallelExecutionModal in this block to include the computed subDAGName (not
just node: n) so the modal props can use that value (e.g., add a name or dagName
field containing subDAGName), and ensure the modal component uses that prop
instead of only reading n.step.call.
🧹 Nitpick comments (12)
internal/core/spec/step.go (1)
171-175: Consider adding validation forMaxToolIterations.The
MaxToolIterationsfield accepts any integer value via the pointer, but negative or zero values would likely cause issues in the tool execution loop. Consider adding validation inbuildStepLLMsimilar to howMaxTokensis validated.Example validation to add in buildStepLLM
// Validate maxTokens if specified if cfg.MaxTokens != nil { if *cfg.MaxTokens < 1 { return core.NewValidationError("llm.maxTokens", *cfg.MaxTokens, fmt.Errorf("maxTokens must be at least 1")) } } + // Validate maxToolIterations if specified + if cfg.MaxToolIterations != nil { + if *cfg.MaxToolIterations < 1 { + return core.NewValidationError("llm.maxToolIterations", *cfg.MaxToolIterations, + fmt.Errorf("maxToolIterations must be at least 1")) + } + }ui/src/features/dags/components/chat-history/StepMessagesTable.tsx (1)
169-186: Add ARIA state for the tool definitions toggle.This improves accessibility and screen-reader context for the collapsible section.
♿ Proposed tweak
- <button + <button onClick={() => setShowTools(!showTools)} className="w-full flex items-center gap-2 px-2 py-1 hover:bg-accent/50 text-left" type="button" + aria-expanded={showTools} + aria-controls="tool-definitions" > ... - {showTools && ( - <div className="px-4 pb-2 space-y-2"> + {showTools && ( + <div id="tool-definitions" className="px-4 pb-2 space-y-2">internal/runtime/builtin/chat/tool_executor_test.go (1)
6-9: Preferrequirefor test assertions.Most assertions here should fail fast to keep diagnostics clean and align with test guidelines.
As per coding guidelines, consider switching to
require.*for these assertions.ui/src/features/dags/components/chat-history/ToolDefinitionCard.tsx (1)
27-43: Wrap long tool text to prevent overflow.Tool names, descriptions, and default values can be long; add wrapping to keep the card compact and readable.
Based on learnings, addwhitespace-normal break-words.♻️ Proposed tweak
- <div className="text-xs border-l-2 border-l-purple-500 pl-2 py-0.5"> - <div className="font-mono font-medium text-purple-500">{tool.name}</div> + <div className="text-xs border-l-2 border-l-purple-500 pl-2 py-0.5 whitespace-normal break-words"> + <div className="font-mono font-medium text-purple-500 break-words"> + {tool.name} + </div> {tool.description && ( - <div className="text-muted-foreground">{tool.description}</div> + <div className="text-muted-foreground break-words">{tool.description}</div> )}internal/runtime/data.go (1)
515-526: Copy tool definitions in/out to avoid shared mutation.Returning internal slices allows external mutation and can break thread-safety guarantees. Mirror the defensive copy pattern used elsewhere.
♻️ Proposed fix
func (d *Data) SetToolDefinitions(tools []exec.ToolDefinition) { d.mu.Lock() defer d.mu.Unlock() - d.inner.State.ToolDefinitions = tools + if tools == nil { + d.inner.State.ToolDefinitions = nil + return + } + copied := make([]exec.ToolDefinition, len(tools)) + copy(copied, tools) + d.inner.State.ToolDefinitions = copied } func (d *Data) GetToolDefinitions() []exec.ToolDefinition { d.mu.RLock() defer d.mu.RUnlock() - return d.inner.State.ToolDefinitions + if d.inner.State.ToolDefinitions == nil { + return nil + } + copied := make([]exec.ToolDefinition, len(d.inner.State.ToolDefinitions)) + copy(copied, d.inner.State.ToolDefinitions) + return copied }api/v2/api.gen.go (1)
1257-1259: Consider usingDAGNamealias forDagNamefields.Keeps type consistency with the rest of the API models; update
api/v2/api.yamland regenerate if desired.Also applies to: 1269-1271
internal/runtime/builtin/chat/tool_executor.go (2)
182-198: Consider aggregating all errors instead of returning only the first.The
Killmethod collects errors but only returns the first one. This could hide failures when multiple DAGs fail to terminate.♻️ Option: Return aggregated errors
+import "errors" + func (e *ToolExecutor) Kill(sig os.Signal) error { e.mu.Lock() defer e.mu.Unlock() var errs []error for runID, subDAG := range e.runningDAGs { if err := subDAG.Kill(sig); err != nil { errs = append(errs, fmt.Errorf("failed to kill tool DAG %s: %w", runID, err)) } } - if len(errs) > 0 { - return errs[0] - } - return nil + return errors.Join(errs...) }
266-278: Outputs iteration order is non-deterministic.In
formatToolResult, iterating overresult.Outputsmap produces non-deterministic output order in the fallback path (lines 271-275). While this is a fallback for JSON marshal errors, consider sorting keys for consistent output.internal/runtime/builtin/chat/executor.go (1)
477-502: Tool calling uses non-streaming mode - document the trade-off.The comment on line 498 notes streaming is disabled for tool calling because it "complicates tool call detection." This is a reasonable design choice, but users expecting streaming with tools should understand this limitation.
Consider adding a debug log or noting in documentation that streaming is automatically disabled when tools are configured.
internal/runtime/builtin/chat/tools.go (3)
44-47: Consider returning empty registry instead of nil for empty dagNames.Returning
nilfor emptydagNamesmeans callers must always nil-check. An empty registry withHasTools() == falsewould be safer.♻️ Return empty registry instead of nil
func NewToolRegistry(ctx context.Context, dagNames []string) (*ToolRegistry, error) { if len(dagNames) == 0 { - return nil, nil + return &ToolRegistry{ + tools: make(map[string]*toolInfo), + dagNames: make(map[string]string), + }, nil }
152-198: Regex could silently skip malformed parameters.When
toolParamRegex.FindStringSubmatch(part)returns nil, the parameter is silently skipped (line 173-175). Consider logging a warning for malformed parameters to aid debugging.match := toolParamRegex.FindStringSubmatch(part) if match == nil { + // Log warning for debugging malformed params continue }
111-123: Tool ordering is non-deterministic.Iterating over
r.toolsmap produces tools in non-deterministic order. This could lead to inconsistent tool ordering in LLM requests, which might affect reproducibility.♻️ Sort tools by name for consistent ordering
+import "sort" + func (r *ToolRegistry) ToLLMTools() []llmpkg.Tool { if r == nil { return nil } + // Collect and sort tool names for deterministic ordering + names := make([]string, 0, len(r.tools)) + for name := range r.tools { + names = append(names, name) + } + sort.Strings(names) + tools := make([]llmpkg.Tool, 0, len(r.tools)) - for _, info := range r.tools { + for _, name := range names { + info := r.tools[name] tools = append(tools, llmpkg.Tool{ Type: "function", Function: llmpkg.ToolFunction{ Name: info.Name, Description: info.Description, Parameters: buildJSONSchema(info.Params), }, }) } return tools }
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1591 +/- ##
==========================================
- Coverage 62.13% 61.79% -0.35%
==========================================
Files 279 281 +2
Lines 30707 31173 +466
==========================================
+ Hits 19081 19264 +183
- Misses 9917 10188 +271
- Partials 1709 1721 +12
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary by CodeRabbit
New Features
Enhancements
✏️ Tip: You can customize this high-level summary in your review settings.