Skip to content

adityapt/llm-copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Copilot for Marketing Mix Modeling

Python 3.9+ License: MIT

An intelligent copilot that brings natural language querying, autonomous analysis, and optimization to Marketing Mix Modeling (MMM). Built with agentic AI capabilities to execute algorithms, generate insights, and visualize results—all from plain English questions.

Overview

LLM Copilot for MMM is a Python package that combines:

  • Agentic AI: Autonomous multi-step analysis workflows with tool calling
  • Response Curve Generation: Hill transformation using DeepCausalMMM
  • Budget Optimization: Multiple algorithms (SLSQP, trust-constr, hybrid)
  • Knowledge Base: RAG-powered semantic search over MMM concepts and benchmarks
  • Code Execution: Secure sandbox for dynamic data analysis
  • Natural Language Interface: Query your MMM results in plain English

Key Differentiator: This isn't just a conversational interface—it executes actual MMM algorithms (Hill transformation, trust-constr optimization, curve fitting) on your data.

Features

🤖 Agentic AI System

  • OpenAI Function Calling: Autonomous tool selection and execution
  • Code Execution Sandbox: Generate and run Python code dynamically
  • Chain-of-Thought Reasoning: Transparent decision-making process
  • Multi-Step Workflows: Analytics pipelines (inspired by DeepAnalyze)
  • Web Search Integration: Supplement analysis with external knowledge

📊 Core MMM Capabilities

  • Response Curves: Automatic Hill transformation for saturation analysis
  • Budget Optimization: SLSQP algorithm with business constraints
  • ROI Analysis: Calculate and compare channel performance
  • Trend Analysis: Time-series insights and seasonality
  • Allocation Strategies: Multi-channel budget recommendations

📈 Visualization & Reporting

  • Plotly Charts: Interactive visualizations (bar, line, scatter, curves, donut etc.)
  • Multiple Views: Automatically generates multiple charts when requested
  • Executive Summaries: Data-driven insights in plain language
  • Formatting: Proper spacing, currency, and markdown rendering

🧠 Knowledge & Learning

  • RAG Knowledge Base: Semantic search over MMM concepts, benchmarks, and best practices
  • MMM Glossary: 50+ terms (adstock, saturation, incrementality, etc.)
  • Channel Benchmarks: Industry standards for TV, Search, Social, Display, etc.
  • User Feedback: Learn from thumbs up/down and comments

🏭 Production Ready

  • REST API: FastAPI server with authentication and rate limiting
  • Caching: Redis/File backends for performance
  • Database Connectors: PostgreSQL, MySQL, Snowflake, Databricks, BigQuery
  • Monitoring: Latency, costs, errors, satisfaction tracking
  • Conversation Context: Multi-turn dialogue with entity tracking

Installation

Basic Installation

pip install -e .

With Production Features

# Full installation with all dependencies
pip install -e ".[all]"

# Or specific features
pip install -e ".[api]"        # REST API
pip install -e ".[cache]"      # Redis caching
pip install -e ".[database]"   # Database connectors
pip install -e ".[websearch]"  # Web search capability

Configuration

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: custom endpoint
REDIS_URL=redis://localhost:6379           # Optional: for caching
DATABASE_URL=your_database_url             # Optional: for data loading

Quick Start

1. Prepare Your Data

Required columns:

Column Type Description
Date datetime Any time period (daily/weekly/monthly)
Channel string Marketing channel identifier
Spend float Media spend
Impressions float Reach/impressions
Predicted float Model-predicted contribution
Segment (optional) string Region/DMA/product/taxonomy

2. Initialize the Copilot

import pandas as pd
from llm_copilot import MMMCopilot

# Load your MMM data
data = pd.read_csv('mmm_results.csv')

# Initialize copilot (auto-detects column names)
copilot = MMMCopilot(data=data, api_key="your_openai_key")

# OR: Specify custom column mapping
copilot = MMMCopilot(
    data=data, 
    api_key="your_openai_key",
    date_col='date',              # Your date column name
    channel_col='media_channel',  # Your channel column name
    segment_col='region'          # Optional: for regional analysis
)

3. Ask Questions

# ROI Analysis
response = copilot.query("What is TV's ROI?")
response = copilot.query("Compare all channels by ROI")
response = copilot.query("Show me ROI trends over time")

# Saturation Analysis
response = copilot.query("Is TV saturated?")
response = copilot.query("Visualize TV response curve")
response = copilot.query("Which channels have diminishing returns?")

# Budget Optimization
response = copilot.query("What's the optimal budget allocation for $1M?")
response = copilot.query("How should I allocate my budget to maximize ROI?")

# Time-Based Comparisons
response = copilot.query("Compare TV performance in Q3 vs Q4 2024")
response = copilot.query("Show me contributions over time")

# Multi-Visualization
response = copilot.query("Show me TV and Search response curves")
# ↳ Automatically generates multiple charts

# MMM Knowledge
response = copilot.query("What is adstock?")
response = copilot.query("What's a good ROI for TV channels?")

# Conversational Context
response = copilot.query("What about Radio?", session_id="user123")
# ↳ Remembers previous conversation

4. Working with Results

result = copilot.query("What is TV's ROI?")

# Access answer
print(result['answer'])  # Natural language response

# Access visualizations (if any)
if result.get('figures'):
    for i, fig in enumerate(result['figures']):
        fig.show()  # Display Plotly chart
        # Or save: fig.write_html(f'chart_{i}.html')

# Access sources
print(result['sources'])  # Data sources used

5. Business Constraints (Optional)

# Set min/max spend per channel for optimization
copilot.set_constraints({
    'TV': {'lower': 50000, 'upper': 500000},
    'Search': {'lower': 100000, 'upper': 400000}
})

# Now optimization queries respect these constraints
response = copilot.query("Optimize my budget allocation")

Advanced Usage

REST API Server

# Start the API server
python -m llm_copilot.api.server

# Or with Docker
docker-compose up

# Or with custom config
uvicorn llm_copilot.api.server:app --host 0.0.0.0 --port 8000

API Endpoints:

  • POST /query - Query the copilot
  • POST /upload_data - Upload MMM data
  • GET /channels - List available channels
  • GET /benchmarks/{channel} - Get industry benchmarks
  • POST /feedback - Submit user feedback
  • GET /stats - System statistics

Example API usage:

import requests

# Upload data
response = requests.post("http://localhost:8000/upload_data", json={
    "data": [{"date": "2024-01-01", "channel": "TV", "spend": 50000, ...}]
})

# Query
response = requests.post("http://localhost:8000/query", json={
    "query": "What is TV's ROI?",
    "session_id": "user123"
})
print(response.json()["answer"])

Data Connectors

Load data from databases or DeepCausalMMM models:

from llm_copilot.connectors import DatabaseConnector, DeepCausalMMMConnector

# From PostgreSQL
db = DatabaseConnector.from_url("postgresql://user:pass@host/db")
data = db.load_mmm_data(schema="mmm", table="results")

# From Snowflake
db = DatabaseConnector.from_url("snowflake://...")
data = db.load_mmm_data(schema="mmm", table="results")

# From DeepCausalMMM
mmm_connector = DeepCausalMMMConnector("models/mmm_model.pkl")
data = mmm_connector.load_predictions()

# Initialize copilot with loaded data
copilot = MMMCopilot(data=data, api_key="...")

Caching

Improve performance with caching:

from llm_copilot.core.cache import CacheManager

# Redis cache (production)
cache = CacheManager(
    backend="redis", 
    redis_url="redis://localhost:6379", 
    default_ttl=3600
)

# File cache (development)
cache = CacheManager(
    backend="file", 
    cache_dir=".cache", 
    default_ttl=3600
)

# Use as decorator
@cache.cached(ttl=3600, key_prefix="curves")
def fit_expensive_curve(channel, data):
    # Expensive curve fitting
    return result

Conversation Context

Multi-turn dialogues with memory:

from llm_copilot.core.conversation import ConversationContext

context = ConversationContext(session_id="user123")

# Track conversation
context.add_turn("user", "What's TV's ROI?")
context.add_turn("assistant", "TV has ROI of 1.32x")

# Resolve pronouns
query = context.resolve_pronouns("How does it compare to Radio?")  # "it" -> "TV"

# Get conversation history
history = context.get_context(include_last_n=5)

Monitoring & Observability

Track system performance:

from llm_copilot.monitoring import get_metrics_collector

metrics = get_metrics_collector()

# Get statistics
stats = metrics.get_stats(window='24h')
print(f"Error rate: {stats['queries']['error_rate_pct']}%")
print(f"P95 latency: {stats['latency']['p95_ms']}ms")
print(f"Total cost: ${stats['costs']['total_usd']}")
print(f"User satisfaction: {stats['satisfaction']['avg_rating']}/5")

# Export metrics
metrics.export_metrics()

Architecture

Agentic System Flow

User Query ("What's the optimal budget allocation?")
    ↓
AgenticAnalyzer (Chain-of-Thought reasoning)
    ↓
    ├── Step 1: Understand query intent
    │   └── "User wants budget optimization"
    │
    ├── Step 2: Check data availability
    │   └── "Data has 5 channels, 52 weeks"
    │
    ├── Step 3: Fit response curves
    │   └── Hill transformation for all channels
    │
    ├── Step 4: Tool Selection (OpenAI Function Calling)
    │   ├── execute_python    → Data analysis, visualizations
    │   ├── optimize_budget   → Budget allocation
    │   ├── query_knowledge   → MMM concepts, benchmarks
    │   └── web_search        → Industry trends
    │
    ├── Step 5: Execute tool
    │   └── optimize_budget(budget=1000000, num_weeks=52)
    │       ├── Calls BudgetOptimizer (SLSQP algorithm)
    │       ├── Generates 4 visualizations
    │       └── Returns allocation: {TV: 300k, Search: 400k, ...}
    │
    └── Step 6: Generate answer
        └── LLM summarizes results with context
    ↓
Natural Language Response + Visualizations

Tools Available

Tool Purpose Example Trigger
execute_python Analyze data, generate charts "Show me ROI trends"
optimize_budget Allocate budgets optimally "Optimize $1M budget"
query_knowledge Explain MMM concepts "What is adstock?"
web_search Find industry insights "What are emerging channels?"

Supported Metrics

Ask about any MMM metric—the copilot explains and visualizes automatically:

  • Performance: ROI, ROAS, Contribution, CPA, CPM
  • Saturation: Saturation levels, diminishing returns
  • Dynamics: Adstock, carryover effects, elasticity
  • Attribution: Incrementality, baseline, share of voice

Example: "What is TV's saturation level?" or "Show me adstock effects"

Configuration

Edit config/config.yaml:

openai:
  model: "gpt-4o"
  temperature: 0.1
  max_tokens: 500
  embedding_model: "text-embedding-3-small"

optimization:
  method: "SLSQP"
  max_iter: 400
  jac: "3-point"

response_curves:
  bottom_param: false
  model_level: "Overall"

Testing

# Run all tests
pytest tests/

# With coverage
pytest --cov=src/llm_copilot tests/

# Specific test file
pytest tests/unit/test_copilot.py

Performance Benchmarks

  • Response curve fitting: <1s per channel (Hill transformation)
  • Budget optimization: 1-2s for 10 channels (SLSQP)
  • Query latency: ~1.7s median (includes LLM call + tool execution)
  • Cost per query: ~$0.03-0.05 (varies by complexity)
  • Optimization success rate: 92%+ (SLSQP), 98%+ (trust-constr)

Project Structure

llm-copilot/
├── src/llm_copilot/
│   ├── api/                       # REST API
│   ├── connectors/                # Data connectors
│   ├── core/                      # Core logic
│   │   ├── agentic_system.py     # Agentic AI (Function Calling)
│   │   ├── copilot.py            # Main orchestrator
│   │   ├── response_curves.py    # Hill transformation
│   │   ├── optimization.py       # Budget optimizer
│   │   ├── knowledge_base.py     # RAG retrieval
│   │   ├── conversation.py       # Conversation context
│   │   └── cache.py              # Caching layer
│   ├── knowledge/                 # MMM expertise
│   │   └── mmm_expertise.py      # Glossary, benchmarks
│   ├── monitoring/                # Monitoring
│   ├── visualization/             # Semantic visualization
│   ├── utils/                     # Utilities
│   └── config.py                  # Configuration
├── tests/                         # Test suite
├── examples/                      # Usage examples
├── Dockerfile                     # Docker image
├── docker-compose.yml             # Docker orchestration
└── pyproject.toml                 # Package metadata

Examples

See the examples/ directory for complete examples:

  • quickstart.py - Basic usage
  • optimization_example.py - Budget optimization
  • curve_analysis.py - Response curve generation
  • api_example.py - REST API usage
  • advanced_queries.py - Complex multi-step analysis

Data Requirements

Minimum Requirements:

  • Date column (any granularity: daily, weekly, monthly)
  • Channel column (marketing channel names)
  • Spend column (media spend in dollars)
  • Predicted column (model-predicted contribution/sales)

Optional but Recommended:

  • Impressions/Reach
  • Segment columns (region, DMA, product)
  • Baseline contribution
  • Actual sales (for validation)

Notes:

  • Works with any time granularity
  • Auto-detects common column names
  • Validates inputs with helpful error messages
  • Supports multi-dimensional analysis

Troubleshooting

Common Issues

1. IndentationError or ImportError

# Reinstall the package
pip install -e . --force-reinstall

2. No visualizations generated

  • Ensure query explicitly asks for visualization: "Show me..." or "Visualize..."
  • Check that data has required columns for the requested chart

3. Optimization fails to converge

  • Try different algorithm: method="trust-constr" (more robust)
  • Adjust constraints: ensure min/max are feasible
  • Increase max_iter: max_iter=1000

4. "Data not available" errors

  • Check date range: print(df['date'].min(), df['date'].max())
  • Ensure requested channels exist: print(df['channel'].unique())

5. JSON serialization errors (Period objects)

  • Automatic handling is built-in for Plotly figures
  • For custom exports, use: df['quarter'] = df['quarter'].astype(str)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup:

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black src/ tests/
ruff check src/ tests/

# Type checking
mypy src/

Acknowledgments

  • DeepCausalMMM: Response curve generation (Hill transformation)
  • DeepAnalyze: Inspiration for agentic planning architecture
  • OpenAI: Function Calling and GPT-4o model

Citation

If you use this package in research, please cite:

@software{llm_copilot_mmm,
  title={LLM Copilot for Marketing Mix Modeling},
  author={Aditya Puttaparthi Tirumala},
  year={2025},
  url={https://github.com/adityapt/llm-copilot},
  version={1.0.0}
}

License

MIT License - see LICENSE file for details.

Support


Built with ❤️ for MMM practitioners

About

llm copilot for marketing data science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published