Reasoning Loop Guide¶
Complete guide to the Symbiont agentic reasoning loop: a typestate-enforced Observe-Reason-Gate-Act (ORGA) cycle for autonomous agent behavior.
Overview¶
The reasoning loop is the core execution engine for autonomous agents in Symbiont. It drives a multi-turn conversation between an LLM, a policy gate, and external tools through a structured cycle:
- Observe — Collect results from previous tool executions
- Reason — LLM produces proposed actions (tool calls or text responses)
- Gate — Policy engine evaluates each proposed action
- Act — Approved actions are dispatched to tool executors
The loop continues until the LLM produces a final text response, hits iteration/token limits, or times out.
Design Principles¶
- Compile-time safety: Invalid phase transitions are caught at compile time via Rust's type system
- Opt-in complexity: The loop works with just a provider and policy gate; knowledge bridge, Cedar policies, and human-in-the-loop are all optional
- Backward compatible: Adding new features (like the knowledge bridge) never breaks existing code
- Observable: Every phase emits journal events and tracing spans
Quick Start¶
Minimal Example¶
use std::sync::Arc;
use symbi_runtime::reasoning::circuit_breaker::CircuitBreakerRegistry;
use symbi_runtime::reasoning::context_manager::DefaultContextManager;
use symbi_runtime::reasoning::conversation::{Conversation, ConversationMessage};
use symbi_runtime::reasoning::executor::DefaultActionExecutor;
use symbi_runtime::reasoning::loop_types::{BufferedJournal, LoopConfig};
use symbi_runtime::reasoning::policy_bridge::DefaultPolicyGate;
use symbi_runtime::reasoning::reasoning_loop::ReasoningLoopRunner;
use symbi_runtime::types::AgentId;
// Set up the runner with default components
let runner = ReasoningLoopRunner {
provider: Arc::new(my_inference_provider),
policy_gate: Arc::new(DefaultPolicyGate::permissive()),
executor: Arc::new(DefaultActionExecutor::default()),
context_manager: Arc::new(DefaultContextManager::default()),
circuit_breakers: Arc::new(CircuitBreakerRegistry::default()),
journal: Arc::new(BufferedJournal::new(1000)),
knowledge_bridge: None,
};
// Build a conversation
let mut conv = Conversation::with_system("You are a helpful assistant.");
conv.push(ConversationMessage::user("What is 6 * 7?"));
// Run the loop
let result = runner.run(AgentId::new(), conv, LoopConfig::default()).await;
println!("Output: {}", result.output);
println!("Iterations: {}", result.iterations);
println!("Tokens used: {}", result.total_usage.total_tokens);
With Tool Definitions¶
use symbi_runtime::reasoning::inference::ToolDefinition;
let config = LoopConfig {
max_iterations: 10,
tool_definitions: vec![
ToolDefinition {
name: "web_search".into(),
description: "Search the web for information".into(),
parameters: serde_json::json!({
"type": "object",
"properties": {
"query": { "type": "string" }
},
"required": ["query"]
}),
},
],
..Default::default()
};
let result = runner.run(agent_id, conv, config).await;
Phase System¶
Typestate Pattern¶
The loop uses Rust's type system to enforce valid phase transitions at compile time. Each phase is a zero-sized type marker:
pub struct Reasoning; // LLM produces proposed actions
pub struct PolicyCheck; // Each action evaluated by the gate
pub struct ToolDispatching; // Approved actions executed
pub struct Observing; // Results collected for next iteration
The AgentLoop<Phase> struct carries the loop state and can only call methods appropriate to its current phase. For example, AgentLoop<Reasoning> only exposes produce_output(), which consumes self and returns AgentLoop<PolicyCheck>.
This means the following mistakes are compile errors, not runtime bugs: - Skipping the policy check - Dispatching tools without reasoning first - Observing results without dispatching
Phase Flow¶
graph TD
R["Reasoning Phase\nproduce_output"]
P["Policy Check Phase\ncheck_policy"]
T["Tool Dispatching Phase\ndispatch_tools"]
O["Observing Phase\nobserve_results"]
LR["LoopResult"]
R --> P
P --> T
T --> O
O -->|Continue| R
O -->|Complete| LR
Dynamic branching within the typestate: The LLM's output during the Reasoning phase may indicate continuation (more tool calls needed) or completion (final text response). The observe_results() method returns a LoopContinuation enum — either Continue(AgentLoop<Reasoning>) to re-enter the reasoning phase, or Complete(LoopResult) to terminate. This enum is the only point where the loop branches dynamically; all other phase transitions are strictly linear and enforced at compile time. No dynamic dispatch or trait objects are used — the branching is a standard Rust match on a concrete enum.
Inference Providers¶
The InferenceProvider trait abstracts over LLM backends:
#[async_trait]
pub trait InferenceProvider: Send + Sync {
async fn complete(
&self,
conversation: &Conversation,
options: &InferenceOptions,
) -> Result<InferenceResponse, InferenceError>;
fn provider_name(&self) -> &str;
fn default_model(&self) -> &str;
fn supports_native_tools(&self) -> bool;
fn supports_structured_output(&self) -> bool;
}
Cloud Provider (OpenRouter)¶
The CloudInferenceProvider connects to OpenRouter (or any OpenAI-compatible endpoint):
export OPENROUTER_API_KEY="sk-or-..."
export OPENROUTER_MODEL="google/gemini-2.0-flash-001" # optional
use symbi_runtime::reasoning::providers::cloud::CloudInferenceProvider;
let provider = CloudInferenceProvider::from_env()
.expect("OPENROUTER_API_KEY must be set");
Policy Gate¶
Every proposed action passes through the policy gate before execution:
#[async_trait]
pub trait ReasoningPolicyGate: Send + Sync {
async fn evaluate_action(
&self,
agent_id: &AgentId,
action: &ProposedAction,
state: &LoopState,
) -> LoopDecision;
}
pub enum LoopDecision {
Allow,
Deny { reason: String },
Modify { modified_action: Box<ProposedAction>, reason: String },
}
Built-in Gates¶
DefaultPolicyGate::permissive()— Allows all actions (development/testing)DefaultPolicyGate::new()— Default policy rulesOpaPolicyGateBridge— Bridges to the OPA-based policy engineCedarGate— Cedar policy language integration
Policy Denial Feedback¶
When an action is denied, the denial reason is fed back to the LLM as a policy feedback observation, allowing it to adjust its approach on the next iteration.
Action Execution¶
ActionExecutor Trait¶
#[async_trait]
pub trait ActionExecutor: Send + Sync {
async fn execute_actions(
&self,
actions: &[ProposedAction],
config: &LoopConfig,
circuit_breakers: &CircuitBreakerRegistry,
) -> Vec<Observation>;
}
Built-in Executors¶
| Executor | Description |
|---|---|
DefaultActionExecutor |
Parallel dispatch with per-tool timeouts |
EnforcedActionExecutor |
Delegates through ToolInvocationEnforcer → MCP pipeline |
KnowledgeAwareExecutor |
Intercepts knowledge tools, delegates rest to inner executor |
Circuit Breakers¶
Each tool has an associated circuit breaker that tracks failures:
- Closed (normal): Tool calls proceed normally
- Open (tripped): Too many consecutive failures; calls rejected immediately
- Half-open (probing): Limited calls allowed to test recovery
let circuit_breakers = CircuitBreakerRegistry::new(CircuitBreakerConfig {
failure_threshold: 3,
recovery_timeout: Duration::from_secs(60),
half_open_max_calls: 1,
});
Knowledge-Reasoning Bridge¶
The KnowledgeBridge connects the agent's knowledge store (hierarchical memory, knowledge base, vector search) to the reasoning loop.
Setup¶
use symbi_runtime::reasoning::knowledge_bridge::{KnowledgeBridge, KnowledgeConfig};
let bridge = Arc::new(KnowledgeBridge::new(
context_manager.clone(), // Arc<dyn context::ContextManager>
KnowledgeConfig {
max_context_items: 5,
relevance_threshold: 0.3,
auto_persist: true,
},
));
let runner = ReasoningLoopRunner {
// ... other fields ...
knowledge_bridge: Some(bridge),
};
How It Works¶
Before each reasoning step:
1. Search terms are extracted from recent user/tool messages
2. query_context() and search_knowledge() retrieve relevant items
3. Results are formatted and injected as a system message (replacing the previous injection)
During tool dispatch:
The KnowledgeAwareExecutor intercepts two special tools:
-
recall_knowledge— Searches the knowledge base and returns formatted results -
store_knowledge— Stores a new fact as a subject-predicate-object triple
All other tool calls are delegated to the inner executor unchanged. This includes ToolClad tools, which are validated against their manifest contracts before execution.
After loop completion:
If auto_persist is enabled, the bridge extracts assistant responses and stores them as working memory for future conversations.
Backward Compatibility¶
Setting knowledge_bridge: None makes the runner behave identically to before — no context injection, no knowledge tools, no persistence.
Conversation Management¶
Conversation Type¶
Conversation manages an ordered sequence of messages with serialization to both OpenAI and Anthropic API formats:
let mut conv = Conversation::with_system("You are a helpful assistant.");
conv.push(ConversationMessage::user("Hello"));
conv.push(ConversationMessage::assistant("Hi there!"));
// Serialize for API calls
let openai_msgs = conv.to_openai_messages();
let (system, anthropic_msgs) = conv.to_anthropic_messages();
Token Budget Enforcement¶
The in-loop ContextManager (not to be confused with the knowledge ContextManager) manages the conversation token budget:
- Sliding Window: Remove oldest messages first
- Observation Masking: Hide verbose tool results
- Anchored Summary: Keep system message + N recent messages
Durable Journal¶
Every phase transition emits a JournalEntry to the configured JournalWriter:
pub struct JournalEntry {
pub sequence: u64,
pub timestamp: DateTime<Utc>,
pub agent_id: AgentId,
pub iteration: u32,
pub event: LoopEvent,
}
pub enum LoopEvent {
Started { agent_id, config },
ReasoningComplete { iteration, actions, usage },
PolicyEvaluated { iteration, action_count, denied_count },
ToolsDispatched { iteration, tool_count, duration },
ObservationsCollected { iteration, observation_count },
Terminated { reason, iterations, total_usage, duration },
RecoveryTriggered { iteration, tool_name, strategy, error },
}
The default BufferedJournal stores entries in memory. Production deployments can implement JournalWriter for persistent storage.
Configuration¶
LoopConfig¶
pub struct LoopConfig {
pub max_iterations: u32, // Default: 25
pub max_total_tokens: u32, // Default: 100,000
pub timeout: Duration, // Default: 5 minutes
pub default_recovery: RecoveryStrategy,
pub tool_timeout: Duration, // Default: 30 seconds
pub max_concurrent_tools: usize, // Default: 5
pub context_token_budget: usize, // Default: 32,000
pub tool_definitions: Vec<ToolDefinition>,
}
Recovery Strategies¶
When tool execution fails, the loop can apply different recovery strategies:
| Strategy | Description |
|---|---|
Retry |
Retry with exponential backoff |
Fallback |
Try alternative tools |
CachedResult |
Use a cached result if fresh enough |
LlmRecovery |
Ask the LLM to find an alternative approach |
Escalate |
Route to a human operator queue |
DeadLetter |
Give up and log the failure |
Testing¶
Unit Tests (No API Key Required)¶
Integration Tests with Mock Provider¶
Live Tests with Real LLM¶
OPENROUTER_API_KEY="sk-or-..." OPENROUTER_MODEL="google/gemini-2.0-flash-001" \
cargo test -j2 -p symbi-runtime --features http-input --test reasoning_live_tests -- --nocapture
Implementation Phases¶
The reasoning loop was built in five phases, each adding capabilities:
| Phase | Focus | Key Components |
|---|---|---|
| 1 | Core loop | conversation, inference, phases, reasoning_loop |
| 2 | Resilience | circuit_breaker, executor, context_manager, policy_bridge |
| 3 | DSL integration | human_critic, pipeline_config, REPL builtins |
| 4 | Multi-agent | agent_registry, critic_audit, saga |
| 5 | Observability | cedar_gate, journal, metrics, scheduler, tracing_spans |
| Bridge | Knowledge | knowledge_bridge, knowledge_executor |
| orga-adaptive | Advanced | tool_profile, progress_tracker, pre_hydrate, extended knowledge_bridge |
Advanced Primitives (orga-adaptive)¶
The orga-adaptive feature gate adds four advanced capabilities. See the full guide for details.
| Primitive | Purpose |
|---|---|
| Tool Profile | Glob-based filtering of tools visible to the LLM |
| Progress Tracker | Per-step reattempt limits with stuck-loop detection |
| Pre-Hydration | Deterministic context pre-fetch from task input references |
| Scoped Conventions | Directory-aware convention retrieval via recall_knowledge |
let config = LoopConfig {
tool_profile: Some(ToolProfile::include_only(&["search_*", "file_*"])),
pre_hydration: Some(PreHydrationConfig::default()),
..Default::default()
};
Next Steps¶
- Runtime Architecture — Full system architecture overview
- Security Model — Policy enforcement and audit trails
- DSL Guide — Agent definition language
- API Reference — Complete API documentation