Skip to content

feat: tool calling support for chat step#1591

Merged
yohamta0 merged 12 commits into
mainfrom
chat-enhnc
Jan 18, 2026
Merged

feat: tool calling support for chat step#1591
yohamta0 merged 12 commits into
mainfrom
chat-enhnc

Conversation

@yohamta0

@yohamta0 yohamta0 commented Jan 18, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added LLM tool calling capability, enabling assistants to invoke external tools and sub-DAGs during chat interactions
    • Tool definitions and call details now visible in chat history with expandable tool information cards
    • Sub-DAG execution runs from tool calls are now tracked and linked with DAG names for better traceability
    • Tools can be configured per chat step to define available functions for the LLM to use
  • Enhancements

    • Chat messages now display tool calls made by the assistant alongside regular content
    • UI improved to show tool parameters and execution details within chat history

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 18, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces tool-calling support to the LLM system, enabling LLMs to invoke DAG-based tools during chat interactions. It adds tool definitions, tool call metadata, execution logic, provider integrations, and UI components to surface tool calls and definitions across the platform.

Changes

Cohort / File(s) Summary
API Schema & Generated Types
api/v2/api.yaml, api/v2/api.gen.go, ui/src/api/v2/schema.ts
Added ChatToolCall, ToolDefinition types; extended ChatMessage with toolCalls, ChatMessagesResponse with toolDefinitions, SubDAGRun/SubDAGRunDetail with dagName fields. Updated Swagger spec.
Core LLM Types & Config
internal/core/llm.go, internal/llm/types.go, internal/core/exec/messages.go
Added Tool, ToolCall, ToolResult, ToolCallFunction types. Extended LLMConfig with Tools and MaxToolIterations fields; extended Message/ChatRequest/ChatResponse/StreamEvent with tool-related fields.
LLM Step Configuration
internal/core/spec/step.go, schemas/dag.schema.json
Added Tools and MaxToolIterations fields to llmConfig struct; propagated to buildStepLLM; updated schema.
LLM Provider Implementations
internal/llm/providers/openai/openai.go, internal/llm/providers/gemini/gemini.go, internal/llm/providers/anthropic/anthropic.go
Integrated tool handling: extract tool calls from responses, convert tools for provider format, process tool-related message blocks and results, populate ToolCalls in responses.
Chat Executor & Tool Execution
internal/runtime/builtin/chat/executor.go, internal/runtime/builtin/chat/tool_executor.go, internal/runtime/builtin/chat/tools.go
Implemented multi-turn tool-calling loop in Executor; added ToolRegistry for DAG-based tool discovery and parameter parsing; added ToolExecutor for sequential tool execution with sub-run tracking; added public GetSubRuns/GetToolDefinitions methods.
Chat Executor Tests
internal/runtime/builtin/chat/executor_test.go, internal/runtime/builtin/chat/tool_executor_test.go, internal/runtime/builtin/chat/tools_test.go
Comprehensive test coverage for tool registry, parameter parsing, tool execution, result formatting, and executor initialization.
Runtime Data & Execution
internal/core/exec/node.go, internal/runtime/data.go, internal/runtime/node.go, internal/runtime/executor/executor.go, internal/runtime/transform/node.go
Added ToolDefinitions field to NodeState, DAGName to SubDAGRun; added SubRunProvider and ToolDefinitionProvider interfaces; extended Node.Execute to capture sub-runs and tool definitions; propagated data in transform layer.
Logging
internal/cmn/logger/tag/tag.go
Added Tool and ToolCallID tag constructors for logging.
API Handler & Transformer
internal/service/frontend/api/v2/dagruns.go, internal/service/frontend/api/v2/transformer.go
Extended message and sub-DAG run responses with toolDefinitions; updated getSubDAGRunDetail signature to accept dagName; added toToolDefinitions and tool call conversion helpers.
UI Components
ui/src/features/dags/components/chat-history/{StepMessagesTable.tsx,ToolDefinitionCard.tsx}
Added ToolDefinitionCard component for rendering tool metadata; extended StepMessagesTable to display tool definitions and tool calls with getMessagePreview helper.
UI Navigation & Display
ui/src/features/dags/components/{DAGStatus.tsx,dag-details/{NodeStatusTableRow.tsx,SubDAGRunsList.tsx},dag-execution/DAGExecutionHistory.tsx}
Updated sub-DAG navigation to infer subDAGName from step.call or subRun.dagName; relaxed navigation guards to support tool-based sub-DAG detection.

Sequence Diagram

sequenceDiagram
    actor User
    participant Chat Executor
    participant LLM Provider
    participant Tool Registry
    participant Tool Executor
    participant DAG Executor

    User->>Chat Executor: Run chat with tools
    Chat Executor->>Tool Registry: Load tool DAGs
    Tool Registry-->>Chat Executor: Tool definitions
    
    Chat Executor->>LLM Provider: Chat request with tools
    LLM Provider-->>Chat Executor: Response with tool calls
    
    alt Tool calls present
        Chat Executor->>Tool Executor: Execute tool calls
        loop For each tool call
            Tool Executor->>Tool Registry: Get tool DAG
            Tool Registry-->>Tool Executor: DAG
            Tool Executor->>DAG Executor: Execute DAG
            DAG Executor-->>Tool Executor: Result + SubDAGRun
        end
        Tool Executor-->>Chat Executor: Tool results
        Chat Executor->>LLM Provider: Send tool results + conversation
        LLM Provider-->>Chat Executor: Final response
    end
    
    Chat Executor-->>User: Final message + sub-runs + tool definitions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span multiple architectural layers (API, core LLM, providers, runtime, UI) with new types, tool execution logic, provider integrations, and UI components. The tool executor and registry implementations contain density in parameter parsing and DAG-based execution orchestration. However, the changes follow consistent patterns and most file modifications are relatively focused within their domains.

Possibly related PRs

  • feat: display chat messages for dag-run #1578: Introduced the foundational chat-message feature that this PR extends by adding tool-calling metadata, tool definitions, and infrastructure for tool invocation to the same chat message models and API flows.
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: tool calling support for chat step' accurately and concisely describes the primary feature addition—tool calling capabilities for chat steps. It is specific, clear, and directly reflects the main objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ui/src/features/dags/components/chat-history/StepMessagesTable.tsx (1)

272-276: Copy action should handle tool-call-only messages.

msg.content can be empty/undefined for tool-call messages, which makes the copy action fail or copy "undefined". Consider copying tool-call payloads when content is missing.

🐛 Proposed fix
-                        handleCopy(msg.content, i);
+                        const contentToCopy =
+                          msg.content ||
+                          (msg.toolCalls?.length
+                            ? JSON.stringify(msg.toolCalls)
+                            : '');
+                        handleCopy(contentToCopy, i);
🤖 Fix all issues with AI agents
In `@internal/llm/providers/anthropic/anthropic.go`:
- Around line 77-110: In internal/llm/providers/anthropic/anthropic.go the
json.Marshal call that builds argsJSON for tool calls (when iterating
resp.Content and using block.Input to create llm.ToolCall) currently discards
the error; change this to check the error and return or wrap it (or log and
return a descriptive error) instead of ignoring it. Similarly, find the
json.Unmarshal usages (the one noted around the later parsing path that
currently does `if err := json.Unmarshal(...);` and ignores the result) and
handle errors consistently by returning or wrapping them into the surrounding
function's error flow (or logging and returning), ensuring any fallback only
runs after a failed unmarshal is detected and reported; update the functions
that construct the llm.ChatResponse and toolCalls to propagate these errors
rather than silently continuing.

In `@internal/llm/providers/gemini/gemini.go`:
- Around line 155-170: The switch handling ChatRequest.ToolChoice (used to set
geminiReq.ToolConfig) currently only supports "auto", "required", and "none" and
silently ignores named tools; update the functionCallingConfig struct (add
AllowedFunctionNames []string) and modify the switch in the code that sets
geminiReq.ToolConfig to include a default case that sets Mode to "ANY" and
populates AllowedFunctionNames with req.ToolChoice (so named tools are enforced
via allowedFunctionNames), ensuring you reference functionCallingConfig and
geminiReq.ToolConfig/ToolChoice when making the change.

In `@internal/llm/providers/openai/openai.go`:
- Around line 183-186: The current assignment sets chatReq.ToolChoice to a raw
string from req.ToolChoice but OpenAI expects either the standard strings
("auto", "required", "none") or an object of shape
{"type":"function","function":{"name":"..."}}, so update the logic around
req.ToolChoice/chatReq.ToolChoice to: if req.ToolChoice is one of the standard
strings, assign it unchanged; otherwise map the string to the required object
shape (i.e., create an object with "type":"function" and a nested "function" map
containing "name": req.ToolChoice) before assigning to chatReq.ToolChoice so
OpenAI receives the correct structure.

In `@internal/runtime/builtin/chat/tool_executor_test.go`:
- Around line 97-103: Replace use of assert with require in the table-driven
test that calls buildParamString: change assert.Equal(t, tc.expected, result) to
require.Equal(t, tc.expected, result) so failures stop the subtest immediately;
update the test imports to include "github.com/stretchr/testify/require" (and
remove or keep assert if used elsewhere), and update any other assertions in
this test function to use require for consistent failure semantics.

In `@internal/service/frontend/api/v2/transformer.go`:
- Around line 322-334: The code currently converts msg.ToolCalls into
apiMsg.ToolCalls unconditionally; change it so you only process and set
apiMsg.ToolCalls when the message is from the assistant role (e.g., check
msg.Author.Role or msg.Role equals the assistant role constant) and leave
apiMsg.ToolCalls nil otherwise; update the block around msg.ToolCalls to first
verify the message role is assistant, then build the toolCalls slice and assign
apiMsg.ToolCalls only in that branch (reference msg.ToolCalls and
apiMsg.ToolCalls to locate the code).

In `@ui/src/features/dags/components/DAGStatus.tsx`:
- Around line 148-161: The modal isn't receiving the fallback sub-DAG name
(derived as subDAGName from allSubRuns and n.step.call), so when the first
subRun provides dagName the modal title is blank; update the call to
setParallelExecutionModal in this block to include the computed subDAGName (not
just node: n) so the modal props can use that value (e.g., add a name or dagName
field containing subDAGName), and ensure the modal component uses that prop
instead of only reading n.step.call.
🧹 Nitpick comments (12)
internal/core/spec/step.go (1)

171-175: Consider adding validation for MaxToolIterations.

The MaxToolIterations field accepts any integer value via the pointer, but negative or zero values would likely cause issues in the tool execution loop. Consider adding validation in buildStepLLM similar to how MaxTokens is validated.

Example validation to add in buildStepLLM
 	// Validate maxTokens if specified
 	if cfg.MaxTokens != nil {
 		if *cfg.MaxTokens < 1 {
 			return core.NewValidationError("llm.maxTokens", *cfg.MaxTokens,
 				fmt.Errorf("maxTokens must be at least 1"))
 		}
 	}

+	// Validate maxToolIterations if specified
+	if cfg.MaxToolIterations != nil {
+		if *cfg.MaxToolIterations < 1 {
+			return core.NewValidationError("llm.maxToolIterations", *cfg.MaxToolIterations,
+				fmt.Errorf("maxToolIterations must be at least 1"))
+		}
+	}
ui/src/features/dags/components/chat-history/StepMessagesTable.tsx (1)

169-186: Add ARIA state for the tool definitions toggle.

This improves accessibility and screen-reader context for the collapsible section.

♿ Proposed tweak
-          <button
+          <button
             onClick={() => setShowTools(!showTools)}
             className="w-full flex items-center gap-2 px-2 py-1 hover:bg-accent/50 text-left"
             type="button"
+            aria-expanded={showTools}
+            aria-controls="tool-definitions"
           >
...
-          {showTools && (
-            <div className="px-4 pb-2 space-y-2">
+          {showTools && (
+            <div id="tool-definitions" className="px-4 pb-2 space-y-2">
internal/runtime/builtin/chat/tool_executor_test.go (1)

6-9: Prefer require for test assertions.

Most assertions here should fail fast to keep diagnostics clean and align with test guidelines.

As per coding guidelines, consider switching to require.* for these assertions.

ui/src/features/dags/components/chat-history/ToolDefinitionCard.tsx (1)

27-43: Wrap long tool text to prevent overflow.

Tool names, descriptions, and default values can be long; add wrapping to keep the card compact and readable.
Based on learnings, add whitespace-normal break-words.

♻️ Proposed tweak
-    <div className="text-xs border-l-2 border-l-purple-500 pl-2 py-0.5">
-      <div className="font-mono font-medium text-purple-500">{tool.name}</div>
+    <div className="text-xs border-l-2 border-l-purple-500 pl-2 py-0.5 whitespace-normal break-words">
+      <div className="font-mono font-medium text-purple-500 break-words">
+        {tool.name}
+      </div>
       {tool.description && (
-        <div className="text-muted-foreground">{tool.description}</div>
+        <div className="text-muted-foreground break-words">{tool.description}</div>
       )}
internal/runtime/data.go (1)

515-526: Copy tool definitions in/out to avoid shared mutation.

Returning internal slices allows external mutation and can break thread-safety guarantees. Mirror the defensive copy pattern used elsewhere.

♻️ Proposed fix
 func (d *Data) SetToolDefinitions(tools []exec.ToolDefinition) {
 	d.mu.Lock()
 	defer d.mu.Unlock()
-	d.inner.State.ToolDefinitions = tools
+	if tools == nil {
+		d.inner.State.ToolDefinitions = nil
+		return
+	}
+	copied := make([]exec.ToolDefinition, len(tools))
+	copy(copied, tools)
+	d.inner.State.ToolDefinitions = copied
 }

 func (d *Data) GetToolDefinitions() []exec.ToolDefinition {
 	d.mu.RLock()
 	defer d.mu.RUnlock()
-	return d.inner.State.ToolDefinitions
+	if d.inner.State.ToolDefinitions == nil {
+		return nil
+	}
+	copied := make([]exec.ToolDefinition, len(d.inner.State.ToolDefinitions))
+	copy(copied, d.inner.State.ToolDefinitions)
+	return copied
 }
api/v2/api.gen.go (1)

1257-1259: Consider using DAGName alias for DagName fields.

Keeps type consistency with the rest of the API models; update api/v2/api.yaml and regenerate if desired.

Also applies to: 1269-1271

internal/runtime/builtin/chat/tool_executor.go (2)

182-198: Consider aggregating all errors instead of returning only the first.

The Kill method collects errors but only returns the first one. This could hide failures when multiple DAGs fail to terminate.

♻️ Option: Return aggregated errors
+import "errors"
+
 func (e *ToolExecutor) Kill(sig os.Signal) error {
 	e.mu.Lock()
 	defer e.mu.Unlock()
 
 	var errs []error
 	for runID, subDAG := range e.runningDAGs {
 		if err := subDAG.Kill(sig); err != nil {
 			errs = append(errs, fmt.Errorf("failed to kill tool DAG %s: %w", runID, err))
 		}
 	}
 
-	if len(errs) > 0 {
-		return errs[0]
-	}
-	return nil
+	return errors.Join(errs...)
 }

266-278: Outputs iteration order is non-deterministic.

In formatToolResult, iterating over result.Outputs map produces non-deterministic output order in the fallback path (lines 271-275). While this is a fallback for JSON marshal errors, consider sorting keys for consistent output.

internal/runtime/builtin/chat/executor.go (1)

477-502: Tool calling uses non-streaming mode - document the trade-off.

The comment on line 498 notes streaming is disabled for tool calling because it "complicates tool call detection." This is a reasonable design choice, but users expecting streaming with tools should understand this limitation.

Consider adding a debug log or noting in documentation that streaming is automatically disabled when tools are configured.

internal/runtime/builtin/chat/tools.go (3)

44-47: Consider returning empty registry instead of nil for empty dagNames.

Returning nil for empty dagNames means callers must always nil-check. An empty registry with HasTools() == false would be safer.

♻️ Return empty registry instead of nil
 func NewToolRegistry(ctx context.Context, dagNames []string) (*ToolRegistry, error) {
 	if len(dagNames) == 0 {
-		return nil, nil
+		return &ToolRegistry{
+			tools:    make(map[string]*toolInfo),
+			dagNames: make(map[string]string),
+		}, nil
 	}

152-198: Regex could silently skip malformed parameters.

When toolParamRegex.FindStringSubmatch(part) returns nil, the parameter is silently skipped (line 173-175). Consider logging a warning for malformed parameters to aid debugging.

 		match := toolParamRegex.FindStringSubmatch(part)
 		if match == nil {
+			// Log warning for debugging malformed params
 			continue
 		}

111-123: Tool ordering is non-deterministic.

Iterating over r.tools map produces tools in non-deterministic order. This could lead to inconsistent tool ordering in LLM requests, which might affect reproducibility.

♻️ Sort tools by name for consistent ordering
+import "sort"
+
 func (r *ToolRegistry) ToLLMTools() []llmpkg.Tool {
 	if r == nil {
 		return nil
 	}
 
+	// Collect and sort tool names for deterministic ordering
+	names := make([]string, 0, len(r.tools))
+	for name := range r.tools {
+		names = append(names, name)
+	}
+	sort.Strings(names)
+
 	tools := make([]llmpkg.Tool, 0, len(r.tools))
-	for _, info := range r.tools {
+	for _, name := range names {
+		info := r.tools[name]
 		tools = append(tools, llmpkg.Tool{
 			Type: "function",
 			Function: llmpkg.ToolFunction{
 				Name:        info.Name,
 				Description: info.Description,
 				Parameters:  buildJSONSchema(info.Params),
 			},
 		})
 	}
 
 	return tools
 }

Comment thread internal/llm/providers/anthropic/anthropic.go
Comment thread internal/llm/providers/gemini/gemini.go
Comment thread internal/llm/providers/openai/openai.go
Comment thread internal/runtime/builtin/chat/tool_executor_test.go
Comment thread internal/service/frontend/api/v2/transformer.go
Comment thread ui/src/features/dags/components/DAGStatus.tsx
@yohamta0 yohamta0 merged commit 8852bc9 into main Jan 18, 2026
6 checks passed
@yohamta0 yohamta0 deleted the chat-enhnc branch January 18, 2026 17:03
@codecov

codecov Bot commented Jan 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 41.80328% with 284 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.79%. Comparing base (39e3f64) to head (4073fe0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/runtime/builtin/chat/executor.go 8.05% 134 Missing and 3 partials ⚠️
internal/runtime/builtin/chat/tool_executor.go 30.13% 100 Missing and 2 partials ⚠️
internal/runtime/builtin/chat/tools.go 89.93% 10 Missing and 5 partials ⚠️
internal/runtime/node.go 0.00% 10 Missing and 2 partials ⚠️
internal/runtime/data.go 0.00% 8 Missing ⚠️
internal/core/llm.go 0.00% 6 Missing ⚠️
internal/cmn/logger/tag/tag.go 0.00% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1591      +/-   ##
==========================================
- Coverage   62.13%   61.79%   -0.35%     
==========================================
  Files         279      281       +2     
  Lines       30707    31173     +466     
==========================================
+ Hits        19081    19264     +183     
- Misses       9917    10188     +271     
- Partials     1709     1721      +12     
Files with missing lines Coverage Δ
internal/core/exec/messages.go 100.00% <ø> (ø)
internal/core/exec/node.go 0.00% <ø> (ø)
internal/core/spec/step.go 85.51% <100.00%> (+0.03%) ⬆️
internal/llm/types.go 100.00% <ø> (ø)
internal/runtime/executor/executor.go 0.00% <ø> (ø)
internal/runtime/transform/node.go 100.00% <100.00%> (ø)
internal/cmn/logger/tag/tag.go 0.00% <0.00%> (ø)
internal/core/llm.go 75.00% <0.00%> (-25.00%) ⬇️
internal/runtime/data.go 73.57% <0.00%> (-2.48%) ⬇️
internal/runtime/node.go 74.33% <0.00%> (-1.74%) ⬇️
... and 3 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39e3f64...4073fe0. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant