Large-scale systems generate massive, noisy logs. Traditional keyword search misses context and brittle alert rules generate fatigue. AutoIR turns raw logs into actionable incidents by combining:
- TiDB Serverless with native VECTOR(384) indexing for fast semantic search
- Serverless embeddings via AWS SageMaker (HF BGE-small) for low-latency vectorization
- Kimi K2 LLM - An agentic orchestrator with safe function tools to query TiDB and synthesize incidents
- Optional AWS SNS notifications for human-in-the-loop routing
Built for the TiDB AgentX Hackathon, AutoIR showcases a real multi-step agent that ingests, embeds, searches, and explains—end to end.
AutoIR operates as a self-hosted continuous monitoring and analysis platform, delivering actionable intelligence through multiple channels:
The system generates comprehensive incident reports using Kimi K2's analytical capabilities:
Report Structure:
- Executive Summary: High-level incident overview with severity classification
- Timeline Analysis: Chronological event sequence with key timestamps
- Root Cause Investigation: Multi-step analysis using TiDB queries and log correlation
- Impact Assessment: Affected systems, user impact, and business metrics
- Remediation Steps: Prioritized action items with risk/effort estimates
- Evidence Artifacts: Supporting log samples, metrics, and query results
24/7 Fargate Daemon runs continuously on AWS ECS, providing:
- Proactive Detection: Semantic anomaly detection across log streams
- SNS Integration: Instant notifications to on-call teams
- Escalation Policies: Severity-based routing and stakeholder notifications
- Alert Correlation: Groups related events to reduce noise
Live Search & Investigation Interface:
- Semantic Log Search: Natural language queries across millions of events
- Real-time Streaming: Live token-by-token analysis from Kimi K2
- Tool-assisted Investigation: Guided exploration with safe SQL queries
- Conversation History: Persistent analysis sessions and findings
Production Telemetry:
- System Health Dashboards: ECS task status, endpoint availability
- Performance Metrics: Query latencies, embedding throughput, token generation rates
- Cost Tracking: AWS resource utilization and optimization recommendations
- Trend Analysis: Historical incident patterns and system behavior
Persistent Intelligence:
- Incident Database: Searchable repository of past incidents and resolutions
- Pattern Recognition: ML-driven identification of recurring issues
- Runbook Generation: Automated documentation of successful remediation procedures
- Team Knowledge Sharing: Collaborative incident post-mortems and lessons learned
Continuous Operation:
graph TD
A[CloudWatch Logs] --> B[ECS Fargate Daemon]
B --> C[SageMaker Embeddings]
C --> D[TiDB Vector Storage]
D --> E[Kimi K2 Analysis]
E --> F[Incident Reports]
E --> G[SNS Alerts]
E --> H[Dashboard Updates]
F --> I[Knowledge Base]
G --> J[On-Call Teams]
H --> K[Operations Center]
Deployment Architecture:
- ECS Fargate Tasks: Serverless, auto-scaling log processors
- Kimi K2 Instances: Dedicated EC2 instances for LLM inference
- TiDB Serverless: Elastic vector database with global replication
- SageMaker Endpoints: On-demand embedding generation
The system operates with 99.9% uptime SLA, processing 10K+ log events per minute and generating actionable incidents within 30 seconds of detection.
| Parameter | Development | Production |
|---|---|---|
| Region | us-east-1 | us-east-1 |
| Operating Hours | 8 hours/day (weekdays) | 24/7 |
| Kimi K2 Instance | g6.8xlarge | g6.16xlarge |
| EBS Storage | 250 GB gp3 | 500 GB gp3 |
| Fargate Tasks | 1x (0.25 vCPU, 1 GB) | 2x (0.5 vCPU, 2 GB) |
| Embedding Requests | 50K/month | 250K/month |
| CloudWatch Ingestion | 2 GB/day | 8 GB/day |
| TiDB Tier | Serverless | Dedicated (small) |
| Service Component | Development | Production | Notes |
|---|---|---|---|
| EC2 (Kimi K2) | $400-600 | $2,800-3,400 | g6.8xlarge vs g6.16xlarge, hours/usage |
| EBS Storage | $20 | $40 | 250GB vs 500GB gp3 volumes |
| ECS Fargate | $12-18 | $36-54 | Task count and resource allocation |
| SageMaker Serverless | $15-50 | $75-200 | BGE-small embeddings, usage-dependent |
| CloudWatch Logs | $35-60 | $140-240 | Ingestion + storage + queries |
| TiDB | $5-25 | $350-600 | Serverless vs Dedicated cluster |
| Data Transfer | $5-15 | $20-50 | Inter-service communication |
| SNS + Misc | $2-5 | $10-25 | Notifications and ancillary services |
| Deployment Type | Monthly Range | Annual Range | Key Characteristics |
|---|---|---|---|
| Development | $494-773 | $5,928-9,276 | 8h/day, smaller instances, serverless TiDB |
| Production | $3,431-4,609 | $41,172-55,308 | 24/7, optimized instances, dedicated TiDB |
| Production (Reserved) | $2,750-3,800 | $33,000-45,600 | 1-year RI savings (~20-25%) |
| Strategy | Savings Potential | Implementation | Use Case |
|---|---|---|---|
| Spot Instances | 50-70% on EC2 | Use for dev/test workloads | Non-critical environments |
| Reserved Instances | 20-42% on EC2 | 1-3 year commitments | Predictable production loads |
| Auto-scheduling | 60-70% total | Stop instances off-hours | Development environments |
| Right-sizing | 15-30% | Match instance to actual load | Over-provisioned resources |
| Embedding Caching | 20-50% on SageMaker | Cache frequent queries | Repetitive log patterns |
| Log Filtering | 30-60% on CloudWatch | Filter noisy/unnecessary logs | High-volume applications |
| Metric | Value | Calculation |
|---|---|---|
| Cost per Incident Analysis | $0.15-0.45 | Based on 8K-12K incidents/month |
| Cost per Log Event Processed | $0.0003-0.0008 | Based on 5M-15M events/month |
| Cost per User (10 operators) | $349-461/user/month | Production deployment |
| Break-even vs Traditional Tools | 6-12 months | Compared to enterprise SIEM/monitoring |
Note: These are directional estimates for us-east-1 region. Actual costs will vary based on your specific usage patterns, region, and AWS pricing changes. Always validate with AWS Cost Calculator and monitor actual usage.
AutoIR leverages Kimi K2, a 32-billion parameter open-source language model developed by Moonshot AI, as its self-hosted primary LLM for incident analysis and response generation. Kimi K2 provides:
- 32B activated parameters with 1 trillion total parameters (Mixture-of-Experts architecture)
- Competitive performance on knowledge tasks, mathematics, coding, and tool use
- Quantized deployment options for cost-effective inference on AWS EC2
- Native tool calling support for AutoIR's constrained function toolset
AutoIR includes comprehensive tooling for deploying Kimi K2 on AWS EC2 instances using quantized models via Unsloth's optimized runtime:
# Setup Kimi K2 instance with quantized model
autoir aws kimi-k2-setup \
--region us-east-1 \
--instance-name kimi-k2-prod \
--instance-type g6.16xlarge \
--quantization UD-TQ1_0 \
--storage-size 500Supported Instance Types:
g6.16xlarge(recommended) - 64 vCPUs, 256 GB RAM, 4x L4 GPUsg6.12xlarge- 48 vCPUs, 192 GB RAM, 4x L4 GPUsg6.8xlarge- 32 vCPUs, 128 GB RAM, 1x L4 GPU
Quantization Options:
UD-TQ1_0(1.8-bit) - Maximum compression, lower accuracyUD-TQ2_K_XL(2-bit) - Recommended balance of size and performanceUD-TQ4_0(4-bit) - Higher accuracy, larger memory footprint
# List all Kimi K2 endpoints
autoir aws kimi-k2-list --region us-east-1
# Start/stop instances
autoir aws kimi-k2-manage --action start --endpoint kimi-k2-prod
autoir aws kimi-k2-manage --action stop --endpoint kimi-k2-prod
# Check instance status and health
autoir aws kimi-k2-manage --action status --endpoint kimi-k2-prod
# Terminate instance (with confirmation)
autoir aws kimi-k2-manage --action terminate --endpoint kimi-k2-prod --confirmKimi K2 integrates seamlessly with AutoIR's tool ecosystem for interactive incident analysis:
# Interactive chat with tool calling enabled
autoir llm chat \
--endpoint kimi-k2-prod \
--tools \
--stream
# Non-interactive analysis
autoir llm chat \
--endpoint kimi-k2-prod \
--message "Analyze recent 5xx errors in api-gateway logs" \
--tools \
--temperature 60Available Tools in Chat:
tidb_query- Execute safe SELECT queries against TiDBanalysis- Perform calculations and data transformationsget_current_time- Time/date utilities for temporal analysiscalculate- Mathematical operations for metricsread_file/write_file- Limited file I/O for reports
AutoIR communicates with Kimi K2 instances via the llama.cpp server API:
# Direct API call example
curl -X POST http://your-instance-ip:8080/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "<|im_system|>system<|im_middle|>You are a systems analyst<|im_end|><|im_user|>user<|im_middle|>Analyze this error<|im_end|><|im_assistant|>assistant<|im_middle|>",
"temperature": 0.6,
"min_p": 0.01,
"n_predict": 500
}'Kimi K2 Chat Template Format:
<|im_system|>system<|im_middle|>SYSTEM_PROMPT<|im_end|>
<|im_user|>user<|im_middle|>USER_MESSAGE<|im_end|>
<|im_assistant|>assistant<|im_middle|>
Typical Performance (g6.16xlarge with UD-TQ1_0):
- Inference Speed: 6-9 tokens/second
- Context Window: 16,384 tokens (configurable)
- Memory Usage: ~8-12 GB VRAM
- Cold Start: ~10-15 seconds
- Response Latency: 50-200ms (first token)
Production Recommendations:
- Use
UD-TQ2_K_XLquantization for better accuracy - Scale to
p4d.24xlargefor production workloads - Enable auto-scaling groups for high availability
- Consider multiple regions for disaster recovery
EC2 Cost Examples (us-east-1):
g6.16xlargeOn-Demand: ~$4.61/hourg6.16xlargeReserved (1yr): ~$2.70/hourg6.12xlargeSpot: ~$1.50-2.50/hour (varies)
Cost-Saving Strategies:
- Spot Instances: 50-70% savings with interruption handling
- Reserved Instances: Up to 42% savings for predictable workloads
- Auto-scheduling: Stop instances during off-hours
- Right-sizing: Use smaller instances for development/testing
AutoIR's Kimi K2 deployment includes security best practices:
- Security Groups: Restrict API access to specific IP ranges
- IAM Roles: Least-privilege access for instance management
- VPC Deployment: Network isolation with private subnets
- SSL/TLS: Encrypted communication (can be configured)
- Tool Sandboxing: Constrained function execution environment
# Setup with custom security configuration
autoir aws kimi-k2-setup \
--region us-east-1 \
--instance-name secure-kimi \
--allowed-ip 203.0.113.0/24 \
--vpc-id vpc-12345678 \
--subnet-id subnet-87654321- Self-hosted vector-native log analytics: Logs are embedded as 384-d vectors and stored in TiDB with cosine distance for semantic retrieval at query time.
- Self-hosted serverless embeddings: SageMaker Serverless Inference hosts the HF
BAAI/bge-small-en-v1.5feature-extraction model; cold-start resistant and cost-efficient. - Self-hosted tool-based orchestration: The agent can call constrained tools (
tidb_query,analysis, etc.) to gather evidence and compute metrics before drafting incidents. - Self-hosted production-ready ingestion: ECS Fargate daemon tails CloudWatch log groups, batches, embeds, and persists to TiDB.
- Safety by design: The TiDB tool only allows SELECTs (LIMIT enforced), and analysis runs as a pure expression evaluator.
| Component | Purpose | Key Details |
|---|---|---|
CloudWatch Logs |
Raw event source | Any AWS log group/stream |
Fargate Daemon |
Ingestion & batching | Pulls logs, optional SNS hooks |
SageMaker Endpoint |
Embeddings | HF DLC feature-extraction, serverless, JSON invoke |
TiDB Serverless |
Vector store & SQL | VECTOR(384), vec_cosine_distance, relational joins |
Agent Orchestrator |
LLM + tools | Tool cycle with SELECT-only TiDB queries and safe analysis |
Semantic search happens in-database:
SELECT id, log_group, log_stream, ts_ms, message,
1 - (embedding <=> CAST(? AS VECTOR(384))) AS score,
(embedding <=> CAST(? AS VECTOR(384))) AS distance
FROM `autoir_log_events`
ORDER BY distance ASC
LIMIT 20;And the schema is enforced for vector semantics and incidents:
CREATE TABLE IF NOT EXISTS `autoir_log_events` (
id VARCHAR(64) PRIMARY KEY,
log_group VARCHAR(255),
log_stream VARCHAR(255),
ts_ms BIGINT,
message TEXT,
embedding VECTOR(384) NOT NULL COMMENT 'hnsw(distance=cosine)',
KEY idx_group_ts (log_group, ts_ms)
);
CREATE TABLE IF NOT EXISTS autoir_incidents (
id VARCHAR(64) PRIMARY KEY,
created_ms BIGINT NOT NULL,
updated_ms BIGINT NOT NULL,
status ENUM('open','ack','resolved') NOT NULL DEFAULT 'open',
severity ENUM('info','low','medium','high','critical') NOT NULL,
title VARCHAR(255) NOT NULL,
summary TEXT,
affected_group VARCHAR(255),
affected_stream VARCHAR(255),
first_ts_ms BIGINT,
last_ts_ms BIGINT,
event_count INT,
sample_ids JSON,
vector_context JSON,
dedupe_key VARCHAR(128),
UNIQUE KEY uniq_dedupe (dedupe_key)
);Example Query: "Investigate spikes in 5xx errors on api-gateway in us-east-1 over the last hour."
| Step | Tool Called | Result |
|---|---|---|
| 1 | tidb_query |
Aggregate error-rate by group/stream/time window |
| 2 | analysis |
Compute severity and confidence based on ratios/volume |
| 3 | tidb_query |
Fetch representative samples for evidence |
| 4 | (optional) SNS | Notify on-call with concise incident summary |
The agent loops through tools up to 8 times to refine findings before producing a final incident write-up.
AutoIR requires the following self-hosted cloud infrastructure components for full operation:
Required AWS Services:
- ✅ EC2 Instances - For Kimi K2 LLM deployment (g6.16xlarge recommended)
- ✅ ECS Fargate - 24/7 log processing daemon
- ✅ SageMaker Serverless - BGE-small embedding generation
- ✅ CloudWatch Logs - Source log aggregation
- ✅ SNS - Real-time alerting and notifications
- ✅ IAM Roles - Service permissions and security
Required TiDB Features:
- ✅ VECTOR(384) Data Type - Native vector storage and indexing
- ✅ HNSW Indexing - High-performance similarity search
- ✅ Serverless Scaling - Automatic capacity adjustment
- ✅ Global Replication - Multi-region availability
- ✅ MySQL Compatibility - Standard SQL interface
- ✅ Node.js 18+ - Runtime environment
- ✅ AWS CLI v2 - Configured with credentials and default region
- ✅ Git - Source code management
# Install globally from npm (recommended)
npm install -g autoir
# Or clone and build from source
git clone https://github.com/youneslaaroussi/autoir.git
cd autoir
npm install
npm run build
# (Optional) Global install for CLI usage
npm install -g .Configure TiDB and AWS:
# Save TiDB DSN profile (stores host/user/db locally)
autoir tidb dsn
# Bootstrap a serverless SageMaker embedding endpoint (BGE-small)
autoir aws sagemaker-bootstrap \
--region us-east-1 \
--endpoint autoir-embed-ep-srvIngest logs and store embeddings:
# Tail CloudWatch, embed lines, and persist vectors into TiDB
autoir logs tail \
"/aws/lambda/your-log-group" \
--region us-east-1 \
--sagemaker-endpoint autoir-embed-ep-srv \
--embedQuery semantically:
autoir logs query \
--query "timeout contacting DB" \
--sagemaker-endpoint autoir-embed-ep-srvRun the agentic incident loop (optional alerts):
autoir daemon \
--alerts-enabled \
--region us-east-1 \
--sns-arn arn:aws:sns:us-east-1:123456789012:autoir-alerts| Data Source | Integration Method | Data Stored |
|---|---|---|
| CloudWatch Logs | AWS SDK/CLI tailing | Raw messages + metadata |
| Embeddings | SageMaker Serverless (HF DLC) | 384-d vectors (JSON invoke) |
| Vector DB | TiDB Serverless | VECTOR(384) column + RDBMS fields |
The agent uses TiDB both for vector search and relational SQL (time filters, aggregations, joins).
AutoIR implements a sophisticated tool orchestration system that enables safe, constrained function calling for LLM agents:
interface Tool {
name: string
description: string
parameters: {
type: string
properties: Record<string, any>
required: string[]
}
}
interface ToolCall {
id: string
name: string
arguments: Record<string, any>
}| Tool | Purpose | Safeguards | Example Usage |
|---|---|---|---|
tidb_query |
Read-only SQL to TiDB | SELECT-only, auto-LIMIT 1000, strips semicolons | SELECT COUNT(*) FROM logs WHERE severity='error' |
analysis |
Pure JS expression eval | Expression-only, no statements, limited stdlib (Math, JSON) | logs.filter(l => l.timestamp > Date.now() - 3600000).length |
calculate |
Mathematical operations | Blocks constructors/eval, sanitized input | Math.sqrt(variance) * 2.5 |
get_current_time |
Time/date utilities | Timezone validation, ISO format | Returns current timestamp for temporal queries |
read_file |
Read local files | Relative paths only, size limits | Read configuration or report files |
write_file |
Write local files | Relative paths only, sandbox directory | Generate incident reports |
graph LR
A[LLM Response] --> B{Tool Calls?}
B -->|Yes| C[Validate Parameters]
C --> D[Execute Tool]
D --> E[Collect Results]
E --> F[Update Conversation]
F --> G[Send to LLM]
G --> B
B -->|No| H[Final Response]
The agent's tool cycle implements these safety measures:
- Parameter Validation: All tool calls are validated against JSON schemas
- Execution Sandboxing: Tools run in isolated contexts with limited capabilities
- Result Sanitization: Tool outputs are JSON-serialized and size-limited
- Iteration Limits: Maximum 8 tool cycles to prevent infinite loops
- Error Handling: Tool failures are captured and fed back to the LLM
// Enforces read-only access with automatic limits
private async execute(args: {sql: string, limit?: number}): Promise<string> {
if (!/^\s*select\s/i.test(sql)) {
throw new Error('Only SELECT statements allowed')
}
const finalSql = hasLimit ? sql : `${sql} LIMIT ${Math.min(limit ?? 100, 1000)}`
const [rows] = await pool.query(finalSql)
return JSON.stringify({
rows: Array.isArray(rows) ? rows : [],
score: Array.isArray(rows) ? rows.length : 0,
meta: {limit: limit ?? 100, appliedLimit: !hasLimit}
})
}// Safe expression evaluation in controlled context
private async execute(args: {expression: string, context?: any}): Promise<string> {
const wrapped = `return (${args.expression});` // Force expression context
try {
// Provide only safe globals: ctx, Math, JSON
const fn = new Function('ctx', 'Math', 'JSON', wrapped)
const result = fn(args.context ?? {}, Math, JSON)
return JSON.stringify({result})
} catch (err: any) {
return JSON.stringify({error: err?.message || String(err)})
}
}AutoIR implements real-time streaming for both Kimi K2 and alternative LLM providers:
// Streaming interface for real-time token delivery
interface StreamingObserver {
onStreamStart?: () => void
onStreamToken?: (token: string) => void
onStreamEnd?: () => void
onToolStart?: (call: ToolCall) => void
onToolResult?: (call: ToolCall, result: string) => void
onToolError?: (call: ToolCall, error: any) => void
}Kimi K2 streaming uses Server-Sent Events (SSE) or raw JSON lines:
// SSE format
data: {"content": "Hello", "delta": "Hello"}
data: {"content": " world", "delta": " world"}
data: [DONE]
// Raw JSON format
{"content": "Hello"}
{"content": " world"}private async callKimiK2Stream(
endpoint: string,
prompt: string,
options: LlmOptions,
onToken: (token: string) => void
): Promise<string> {
const payload = {
prompt,
temperature: options.temperature,
min_p: 0.01,
n_predict: options.maxTokens,
stream: true
}
return new Promise<string>((resolve) => {
const child = spawn('curl', [
'-s', '-N', '-X', 'POST',
'-H', 'Content-Type: application/json',
'-d', JSON.stringify(payload),
`${endpoint}/completion`
])
let buffer = ''
let final = ''
child.stdout.on('data', (chunk: Buffer) => {
buffer += chunk.toString()
// Process complete lines
let idx
while ((idx = buffer.indexOf('\n')) >= 0) {
const line = buffer.slice(0, idx).trim()
buffer = buffer.slice(idx + 1)
// Handle SSE format: "data: {json}"
const jsonPart = line.startsWith('data:')
? line.slice(5).trim()
: line
if (jsonPart === '[DONE]') {
child.kill()
return
}
try {
const obj = JSON.parse(jsonPart)
const token = obj?.content || obj?.delta || ''
if (token) {
final += token
onToken(token) // Real-time callback
}
} catch {
// Fallback: treat as plain text
final += jsonPart
onToken(jsonPart)
}
}
})
child.on('close', () => resolve(final))
})
}AutoIR uses structured prompt templates with role-based formatting:
export const SYSTEM_PROMPT_TEMPLATE = `You are a senior systems analyst AI agent.
Date: {{TODAY}}
Responsibilities:
1) Investigate system issues by querying TiDB and summarize findings with evidence
2) Propose remediation steps with risk, impact, and rollback strategies
3) Produce concise, actionable incident insights for SREs and on-call engineers
Guidelines:
- When you need data, call the tidb_query tool with safe SELECT statements
- Always include LIMIT in queries for performance
- Prefer targeted queries with filters and time ranges
- Be explicit about assumptions; update them after seeing data
- Use the analysis tool for calculations or reasoning on retrieved data
- Keep answers structured and succinct
`;interface LlmMessage {
role: 'system' | 'user' | 'assistant' | 'tool'
content: string
tool_calls?: ToolCall[]
tool_results?: ToolResult[]
}
// Kimi K2 format conversion
private buildKimiPrompt(messages: LlmMessage[], tools?: Tool[]): string {
let prompt = ''
for (const message of messages) {
switch (message.role) {
case 'system':
prompt += `<|im_system|>system<|im_middle|>${message.content}<|im_end|>`
break
case 'user':
prompt += `<|im_user|>user<|im_middle|>${message.content}<|im_end|>`
break
case 'assistant':
prompt += `<|im_assistant|>assistant<|im_middle|>${message.content}<|im_end|>`
break
case 'tool':
prompt += `<|im_tool|>tool<|im_middle|>${message.content}<|im_end|>`
break
}
}
// Inject tool catalog for self-discovery
if (tools?.length > 0) {
const toolsPrompt = `\n\nAvailable tools:\n${tools.map(t =>
`${t.name}: ${t.description}`
).join('\n')}\n\nYou can use tools by calling them with appropriate parameters.`
// Insert before final assistant prompt
prompt = prompt.replace(
/(<\|im_user\|>.*?<\|im_end\|>)$/,
toolsPrompt + '$1'
)
}
prompt += `<|im_assistant|>assistant<|im_middle|>`
return prompt
}AutoIR tracks comprehensive performance metrics:
interface PerformanceMetrics {
// Embedding performance
embedding_latency_ms: number
embedding_tokens: number
embedding_requests_per_minute: number
// Vector search performance
vector_search_latency_ms: number
vector_search_results: number
vector_search_score_avg: number
// LLM performance
llm_first_token_latency_ms: number
llm_tokens_per_second: number
llm_total_tokens: number
llm_tool_calls: number
// System performance
memory_usage_mb: number
cpu_usage_percent: number
active_connections: number
}Typical Performance Characteristics:
- SageMaker Embedding Latency: 50-400ms (serverless cold start)
- TiDB Vector Search: <100ms for 10K+ log entries
- Kimi K2 First Token: 50-200ms (warm instance)
- Kimi K2 Throughput: 6-9 tokens/second (g6.16xlarge)
- End-to-End Query: <2s (warm endpoints)
- Tool Execution: 10-500ms depending on complexity
- SageMaker Serverless Inference: provisions execution role, registers HF DLC, creates endpoint-config and endpoint; waits until
InServiceand performs a sample invoke. - ECS Fargate Daemon: optional command deploys a log ingestion task that pushes vectors into TiDB and emits metrics.
- CloudWatch + SNS: querying recent windows and publishing incident summaries to SNS for paging/triage.
| Component | Technology | Notes |
|---|---|---|
| Backend | Node.js + TypeScript (oclif) | Modular commands & libraries |
| Vector DB | TiDB Serverless | VECTOR(384), cosine distance, HNSW comment |
| Embeddings | SageMaker Serverless Inference | HF BAAI/bge-small-en-v1.5 feature-extraction |
| Agent Orchestration | Tool-calling loop | SELECT-only DB tool, safe analysis tool |
| AWS Integrations | CloudWatch Logs, ECS Fargate, SNS | Operational glue |
Performance profile (typical):
- Embedding latency (serverless): 50–400 ms per request body
- Vector search (TiDB): sub-100 ms for tens of thousands of rows
- End-to-end semantic query: < 2 s with warm endpoints
- TiDB Serverless: generous free tier; scales elastically
- SageMaker Serverless Inference: pay-per-invocation (memory × duration)
- ECS Fargate: per-task vCPU/GB-hr, often minimal for a single daemon
- CloudWatch + SNS: low cost unless tailing very high-volume groups
src/
├── commands/ # oclif commands (aws, logs, llm, tidb, daemon)
├── lib/ # shared libraries
│ ├── db.ts # schema, incident upsert, cursor tracking
│ ├── llm/ # agent client (tool-calling, streaming)
│ ├── tools/ # tool registry + tool implementations
│ └── config.ts # persisted app config (TiDB, LLM, Fargate)
└── prompts/ # system prompt template
- Create a class in
src/lib/tools/extendingBaseTool - Register it in
ToolManager - Keep side-effects out, prefer read-only operations
- Provide a concise JSON schema for parameters
AutoIR provides a comprehensive CLI with commands organized by functional area:
# Launch main dashboard (default)
autoir
# Launch search-only interface
autoir --no-dashboard
# Specify custom endpoints and regions
autoir --sagemaker-endpoint my-embed-ep --region us-west-2# Setup new Kimi K2 instance
autoir aws kimi-k2-setup \
--region us-east-1 \
--instance-name kimi-prod \
--instance-type g6.16xlarge \
--quantization UD-TQ2_K_XL
# List all Kimi K2 instances
autoir aws kimi-k2-list --region us-east-1
# Manage instance lifecycle
autoir aws kimi-k2-manage --action start --endpoint kimi-prod
autoir aws kimi-k2-manage --action stop --endpoint kimi-prod
autoir aws kimi-k2-manage --action status --endpoint kimi-prod
autoir aws kimi-k2-manage --action terminate --endpoint kimi-prod --confirm# Interactive chat with tools
autoir llm chat --endpoint kimi-prod --tools --stream
# Single-shot analysis
autoir llm chat \
--endpoint kimi-prod \
--message "Analyze recent errors in api-gateway logs" \
--tools \
--temperature 60
# Conversation management
autoir llm chat --list-conversations
autoir llm chat --conversation incident-2024-01 --endpoint kimi-prod
autoir llm chat --delete-conversation old-conv-id
autoir llm chat --clear-history# Tail and ingest CloudWatch logs
autoir logs tail "/aws/lambda/api-gateway" \
--region us-east-1 \
--sagemaker-endpoint embed-ep \
--embed \
--follow
# Semantic log search
autoir logs query "database connection timeout" \
--sagemaker-endpoint embed-ep \
--sagemaker-region us-east-1 \
--limit 50 \
--group "/aws/lambda/*" \
--since 2h
# Interactive search interface
autoir logs search \
--sagemaker-endpoint embed-ep \
--sagemaker-region us-east-1
# Get latest logs (non-semantic)
autoir logs latest "/aws/lambda/api-gateway" \
--region us-east-1 \
--limit 100 \
--since 1h# Bootstrap SageMaker embedding endpoint
autoir aws sagemaker-bootstrap \
--region us-east-1 \
--endpoint autoir-embed-ep \
--memory 2048 \
--concurrency 5
# Deploy ECS Fargate daemon for log ingestion
autoir aws autoir-fargate deploy \
--cluster autoir \
--service autoir \
--region us-east-1 \
--cpu 512 \
--memory 1024
# Fargate lifecycle management
autoir aws autoir-fargate status --cluster autoir --service autoir
autoir aws autoir-fargate logs --cluster autoir --log-group /autoir/daemon
autoir aws autoir-fargate start --cluster autoir --service autoir
autoir aws autoir-fargate stop --cluster autoir --service autoir
autoir aws autoir-fargate destroy --cluster autoir --service autoir
# AWS infrastructure health check
autoir aws check --region us-east-1 --profile production# Configure TiDB connection interactively
autoir tidb dsn
# Set DSN directly
autoir tidb dsn --dsn mysql://user:pass@host:4000/db
# Show current configuration
autoir tidb dsn --show
# Test connection with custom profile
autoir tidb dsn --name staging# Run ingestion and alerting daemon (for containers)
autoir daemon \
--groups "/aws/lambda/api,/aws/ecs/app" \
--region us-east-1 \
--sagemaker-endpoint embed-ep \
--alerts-enabled \
--sns-topic-arn arn:aws:sns:us-east-1:123:alerts
# With custom alerting thresholds
autoir daemon \
--alerts-enabled \
--alerts-interval-sec 300 \
--alerts-window-min 15 \
--alerts-min-confidence 70 \
--alerts-min-severity medium# Configure LLM provider interactively
autoir llm config
# Switch between providers
autoir llm config --provider aws --endpoint kimi-prodMost commands support these common flags:
| Flag | Description | Example |
|---|---|---|
--region |
AWS region | --region us-west-2 |
--profile |
AWS profile | --profile production |
--debug |
Enable verbose logging | --debug |
--dry-run |
Show actions without executing | --dry-run |
--json |
Output in JSON format | --json |
--help |
Show command help | --help |
AutoIR stores configuration in your home directory:
~/.autoir/
├── config.json # Main configuration
├── tidb-profiles.json # TiDB connection profiles
├── llm-config.json # LLM provider settings
├── kimi-k2-endpoints.json # Kimi K2 instance registry
└── conversations/ # Chat conversation history
├── conv-uuid-1.json
└── conv-uuid-2.json
# Interactive setup
autoir tidb dsn
# Or provide DSN directly
autoir tidb dsn --dsn mysql://username.root:password@gateway01.us-west-2.prod.aws.tidbcloud.com:4000/database_nameFor dedicated clusters requiring CA certificates:
# Download CA certificate from TiDB Cloud console
# Then configure with CA path
autoir tidb dsn
# Enter CA certificate path when promptedautoir tidb dsn --dsn mysql://root:password@localhost:4000/test# Install AWS CLI v2
# Configure default credentials
aws configure
# Or use environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1AutoIR requires these AWS permissions:
SageMaker:
sagemaker:CreateModelsagemaker:CreateEndpointConfigsagemaker:CreateEndpointsagemaker:DescribeEndpointsagemaker:InvokeEndpoint
EC2 (for Kimi K2):
ec2:RunInstancesec2:DescribeInstancesec2:StartInstancesec2:StopInstancesec2:TerminateInstancesec2:CreateSecurityGroupec2:AuthorizeSecurityGroupIngress
CloudWatch Logs:
logs:DescribeLogGroupslogs:DescribeLogStreamslogs:GetLogEventslogs:FilterLogEvents
ECS (for Fargate):
ecs:CreateClusterecs:CreateServiceecs:UpdateServiceecs:DescribeServicesecs:RegisterTaskDefinition
IAM:
iam:CreateRoleiam:AttachRolePolicyiam:GetRoleiam:PassRole
# Use spot instances for development
autoir aws kimi-k2-setup \
--instance-type g6.12xlarge \
--spot-instance \
--max-spot-price 2.00
# Scheduled shutdown (via Lambda/EventBridge)
# Stop instances outside business hoursAutoIR supports these environment variables:
# TiDB Connection
export TIDB_DSN="mysql://user:pass@host:4000/db"
export TIDB_HOST="gateway01.us-west-2.prod.aws.tidbcloud.com"
export TIDB_PORT=4000
export TIDB_USER="username.root"
export TIDB_PASSWORD="your_password"
export TIDB_DATABASE="database_name"
# AWS Configuration
export AWS_REGION="us-east-1"
export AWS_PROFILE="production"
# SageMaker
export SAGEMAKER_ENDPOINT="autoir-embed-ep"
export SAGEMAKER_REGION="us-east-1"
# Logging
export LOG_LEVEL="info"
export DEBUG="true"npm install
npm run build
npm test
# Run a command locally
./bin/run.js logs query --help
# Development with auto-rebuild
npm run dev- Retrieval: TiDB vector search returns top matches and distances
- Ranking: the agent can further filter via SQL (e.g., time windows) and then synthesize evidence
- Token safety: iterative tool cycles restrict response sizes; only essential rows are returned via LIMIT
For detailed instructions on deploying Kimi K2 on AWS EC2, refer to the comprehensive guide: Deploying Kimi K2 on Amazon EC2
This guide covers:
- EC2 instance setup with Deep Learning AMI
- Unsloth quantized model deployment
- llama.cpp server configuration
- Security group and networking setup
- Performance optimization and cost considerations
| Component | Technology Stack | Key Features |
|---|---|---|
| Frontend CLI | Node.js + TypeScript + oclif | Interactive dashboards, streaming interfaces |
| Vector Database | TiDB Serverless | Native VECTOR(384), HNSW indexing, cosine distance |
| Embeddings | AWS SageMaker Serverless | HF BGE-small-en-v1.5, JSON invocation |
| LLM Engine | Kimi K2 (32B MoE) | Quantized deployment, tool calling, streaming |
| Infrastructure | AWS (EC2, ECS, CloudWatch) | Serverless and containerized components |
| Tool System | Custom orchestration | Safe, constrained function execution |
Production Scale Characteristics:
- Log Ingestion Rate: 1K-10K events/minute per Fargate task
- Vector Search Latency: <100ms for 100K+ log entries
- Incident Analysis Time: 2-10 seconds end-to-end
- Concurrent Users: 10-50 users per Kimi K2 instance
- Data Retention: Unlimited (TiDB scales elastically)
AutoIR is designed for extensibility:
- Adding New Tools: Extend
BaseToolclass with proper safety constraints - Custom Embeddings: Implement alternative embedding providers via SageMaker
- Additional LLMs: Add new providers to
LlmClientwith appropriate prompt formatting - Enhanced Analytics: Extend SQL queries and analysis functions for domain-specific metrics
For detailed contribution guidelines and development setup, see the repository.
MIT





