LLM Copilot for Marketing Mix Modeling

An intelligent copilot that brings natural language querying, autonomous analysis, and optimization to Marketing Mix Modeling (MMM). Built with agentic AI capabilities to execute algorithms, generate insights, and visualize results—all from plain English questions.

Overview

LLM Copilot for MMM is a Python package that combines:

Agentic AI: Autonomous multi-step analysis workflows with tool calling
Response Curve Generation: Hill transformation using DeepCausalMMM
Budget Optimization: Multiple algorithms (SLSQP, trust-constr, hybrid)
Knowledge Base: RAG-powered semantic search over MMM concepts and benchmarks
Code Execution: Secure sandbox for dynamic data analysis
Natural Language Interface: Query your MMM results in plain English

Key Differentiator: This isn't just a conversational interface—it executes actual MMM algorithms (Hill transformation, trust-constr optimization, curve fitting) on your data.

Features

🤖 Agentic AI System

OpenAI Function Calling: Autonomous tool selection and execution
Code Execution Sandbox: Generate and run Python code dynamically
Chain-of-Thought Reasoning: Transparent decision-making process
Multi-Step Workflows: Analytics pipelines (inspired by DeepAnalyze)
Web Search Integration: Supplement analysis with external knowledge

📊 Core MMM Capabilities

Response Curves: Automatic Hill transformation for saturation analysis
Budget Optimization: SLSQP algorithm with business constraints
ROI Analysis: Calculate and compare channel performance
Trend Analysis: Time-series insights and seasonality
Allocation Strategies: Multi-channel budget recommendations

📈 Visualization & Reporting

Plotly Charts: Interactive visualizations (bar, line, scatter, curves, donut etc.)
Multiple Views: Automatically generates multiple charts when requested
Executive Summaries: Data-driven insights in plain language
Formatting: Proper spacing, currency, and markdown rendering

🧠 Knowledge & Learning

RAG Knowledge Base: Semantic search over MMM concepts, benchmarks, and best practices
MMM Glossary: 50+ terms (adstock, saturation, incrementality, etc.)
Channel Benchmarks: Industry standards for TV, Search, Social, Display, etc.
User Feedback: Learn from thumbs up/down and comments

🏭 Production Ready

REST API: FastAPI server with authentication and rate limiting
Caching: Redis/File backends for performance
Database Connectors: PostgreSQL, MySQL, Snowflake, Databricks, BigQuery
Monitoring: Latency, costs, errors, satisfaction tracking
Conversation Context: Multi-turn dialogue with entity tracking

Installation

Basic Installation

pip install -e .

With Production Features

# Full installation with all dependencies
pip install -e ".[all]"

# Or specific features
pip install -e ".[api]"        # REST API
pip install -e ".[cache]"      # Redis caching
pip install -e ".[database]"   # Database connectors
pip install -e ".[websearch]"  # Web search capability

Configuration

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: custom endpoint
REDIS_URL=redis://localhost:6379           # Optional: for caching
DATABASE_URL=your_database_url             # Optional: for data loading

Quick Start

1. Prepare Your Data

Required columns:

Column	Type	Description
Date	datetime	Any time period (daily/weekly/monthly)
Channel	string	Marketing channel identifier
Spend	float	Media spend
Impressions	float	Reach/impressions
Predicted	float	Model-predicted contribution
Segment (optional)	string	Region/DMA/product/taxonomy

2. Initialize the Copilot

import pandas as pd
from llm_copilot import MMMCopilot

# Load your MMM data
data = pd.read_csv('mmm_results.csv')

# Initialize copilot (auto-detects column names)
copilot = MMMCopilot(data=data, api_key="your_openai_key")

# OR: Specify custom column mapping
copilot = MMMCopilot(
    data=data, 
    api_key="your_openai_key",
    date_col='date',              # Your date column name
    channel_col='media_channel',  # Your channel column name
    segment_col='region'          # Optional: for regional analysis
)

3. Ask Questions

# ROI Analysis
response = copilot.query("What is TV's ROI?")
response = copilot.query("Compare all channels by ROI")
response = copilot.query("Show me ROI trends over time")

# Saturation Analysis
response = copilot.query("Is TV saturated?")
response = copilot.query("Visualize TV response curve")
response = copilot.query("Which channels have diminishing returns?")

# Budget Optimization
response = copilot.query("What's the optimal budget allocation for $1M?")
response = copilot.query("How should I allocate my budget to maximize ROI?")

# Time-Based Comparisons
response = copilot.query("Compare TV performance in Q3 vs Q4 2024")
response = copilot.query("Show me contributions over time")

# Multi-Visualization
response = copilot.query("Show me TV and Search response curves")
# ↳ Automatically generates multiple charts

# MMM Knowledge
response = copilot.query("What is adstock?")
response = copilot.query("What's a good ROI for TV channels?")

# Conversational Context
response = copilot.query("What about Radio?", session_id="user123")
# ↳ Remembers previous conversation

4. Working with Results

result = copilot.query("What is TV's ROI?")

# Access answer
print(result['answer'])  # Natural language response

# Access visualizations (if any)
if result.get('figures'):
    for i, fig in enumerate(result['figures']):
        fig.show()  # Display Plotly chart
        # Or save: fig.write_html(f'chart_{i}.html')

# Access sources
print(result['sources'])  # Data sources used

5. Business Constraints (Optional)

# Set min/max spend per channel for optimization
copilot.set_constraints({
    'TV': {'lower': 50000, 'upper': 500000},
    'Search': {'lower': 100000, 'upper': 400000}
})

# Now optimization queries respect these constraints
response = copilot.query("Optimize my budget allocation")

Advanced Usage

REST API Server

# Start the API server
python -m llm_copilot.api.server

# Or with Docker
docker-compose up

# Or with custom config
uvicorn llm_copilot.api.server:app --host 0.0.0.0 --port 8000

API Endpoints:

POST /query - Query the copilot
POST /upload_data - Upload MMM data
GET /channels - List available channels
GET /benchmarks/{channel} - Get industry benchmarks
POST /feedback - Submit user feedback
GET /stats - System statistics

Example API usage:

import requests

# Upload data
response = requests.post("http://localhost:8000/upload_data", json={
    "data": [{"date": "2024-01-01", "channel": "TV", "spend": 50000, ...}]
})

# Query
response = requests.post("http://localhost:8000/query", json={
    "query": "What is TV's ROI?",
    "session_id": "user123"
})
print(response.json()["answer"])

Data Connectors

Load data from databases or DeepCausalMMM models:

from llm_copilot.connectors import DatabaseConnector, DeepCausalMMMConnector

# From PostgreSQL
db = DatabaseConnector.from_url("postgresql://user:pass@host/db")
data = db.load_mmm_data(schema="mmm", table="results")

# From Snowflake
db = DatabaseConnector.from_url("snowflake://...")
data = db.load_mmm_data(schema="mmm", table="results")

# From DeepCausalMMM
mmm_connector = DeepCausalMMMConnector("models/mmm_model.pkl")
data = mmm_connector.load_predictions()

# Initialize copilot with loaded data
copilot = MMMCopilot(data=data, api_key="...")

Caching

Improve performance with caching:

from llm_copilot.core.cache import CacheManager

# Redis cache (production)
cache = CacheManager(
    backend="redis", 
    redis_url="redis://localhost:6379", 
    default_ttl=3600
)

# File cache (development)
cache = CacheManager(
    backend="file", 
    cache_dir=".cache", 
    default_ttl=3600
)

# Use as decorator
@cache.cached(ttl=3600, key_prefix="curves")
def fit_expensive_curve(channel, data):
    # Expensive curve fitting
    return result

Conversation Context

Multi-turn dialogues with memory:

from llm_copilot.core.conversation import ConversationContext

context = ConversationContext(session_id="user123")

# Track conversation
context.add_turn("user", "What's TV's ROI?")
context.add_turn("assistant", "TV has ROI of 1.32x")

# Resolve pronouns
query = context.resolve_pronouns("How does it compare to Radio?")  # "it" -> "TV"

# Get conversation history
history = context.get_context(include_last_n=5)

Monitoring & Observability

Track system performance:

from llm_copilot.monitoring import get_metrics_collector

metrics = get_metrics_collector()

# Get statistics
stats = metrics.get_stats(window='24h')
print(f"Error rate: {stats['queries']['error_rate_pct']}%")
print(f"P95 latency: {stats['latency']['p95_ms']}ms")
print(f"Total cost: ${stats['costs']['total_usd']}")
print(f"User satisfaction: {stats['satisfaction']['avg_rating']}/5")

# Export metrics
metrics.export_metrics()

Architecture

Agentic System Flow

User Query ("What's the optimal budget allocation?")
    ↓
AgenticAnalyzer (Chain-of-Thought reasoning)
    ↓
    ├── Step 1: Understand query intent
    │   └── "User wants budget optimization"
    │
    ├── Step 2: Check data availability
    │   └── "Data has 5 channels, 52 weeks"
    │
    ├── Step 3: Fit response curves
    │   └── Hill transformation for all channels
    │
    ├── Step 4: Tool Selection (OpenAI Function Calling)
    │   ├── execute_python    → Data analysis, visualizations
    │   ├── optimize_budget   → Budget allocation
    │   ├── query_knowledge   → MMM concepts, benchmarks
    │   └── web_search        → Industry trends
    │
    ├── Step 5: Execute tool
    │   └── optimize_budget(budget=1000000, num_weeks=52)
    │       ├── Calls BudgetOptimizer (SLSQP algorithm)
    │       ├── Generates 4 visualizations
    │       └── Returns allocation: {TV: 300k, Search: 400k, ...}
    │
    └── Step 6: Generate answer
        └── LLM summarizes results with context
    ↓
Natural Language Response + Visualizations

Tools Available

Tool	Purpose	Example Trigger
`execute_python`	Analyze data, generate charts	"Show me ROI trends"
`optimize_budget`	Allocate budgets optimally	"Optimize $1M budget"
`query_knowledge`	Explain MMM concepts	"What is adstock?"
`web_search`	Find industry insights	"What are emerging channels?"

Supported Metrics

Ask about any MMM metric—the copilot explains and visualizes automatically:

Performance: ROI, ROAS, Contribution, CPA, CPM
Saturation: Saturation levels, diminishing returns
Dynamics: Adstock, carryover effects, elasticity
Attribution: Incrementality, baseline, share of voice

Example: "What is TV's saturation level?" or "Show me adstock effects"

Configuration

Edit config/config.yaml:

openai:
  model: "gpt-4o"
  temperature: 0.1
  max_tokens: 500
  embedding_model: "text-embedding-3-small"

optimization:
  method: "SLSQP"
  max_iter: 400
  jac: "3-point"

response_curves:
  bottom_param: false
  model_level: "Overall"

Testing

# Run all tests
pytest tests/

# With coverage
pytest --cov=src/llm_copilot tests/

# Specific test file
pytest tests/unit/test_copilot.py

Performance Benchmarks

Response curve fitting: <1s per channel (Hill transformation)
Budget optimization: 1-2s for 10 channels (SLSQP)
Query latency: ~1.7s median (includes LLM call + tool execution)
Cost per query: ~$0.03-0.05 (varies by complexity)
Optimization success rate: 92%+ (SLSQP), 98%+ (trust-constr)

Project Structure

llm-copilot/
├── src/llm_copilot/
│   ├── api/                       # REST API
│   ├── connectors/                # Data connectors
│   ├── core/                      # Core logic
│   │   ├── agentic_system.py     # Agentic AI (Function Calling)
│   │   ├── copilot.py            # Main orchestrator
│   │   ├── response_curves.py    # Hill transformation
│   │   ├── optimization.py       # Budget optimizer
│   │   ├── knowledge_base.py     # RAG retrieval
│   │   ├── conversation.py       # Conversation context
│   │   └── cache.py              # Caching layer
│   ├── knowledge/                 # MMM expertise
│   │   └── mmm_expertise.py      # Glossary, benchmarks
│   ├── monitoring/                # Monitoring
│   ├── visualization/             # Semantic visualization
│   ├── utils/                     # Utilities
│   └── config.py                  # Configuration
├── tests/                         # Test suite
├── examples/                      # Usage examples
├── Dockerfile                     # Docker image
├── docker-compose.yml             # Docker orchestration
└── pyproject.toml                 # Package metadata

Examples

See the examples/ directory for complete examples:

quickstart.py - Basic usage
optimization_example.py - Budget optimization
curve_analysis.py - Response curve generation
api_example.py - REST API usage
advanced_queries.py - Complex multi-step analysis

Data Requirements

Minimum Requirements:

Date column (any granularity: daily, weekly, monthly)
Channel column (marketing channel names)
Spend column (media spend in dollars)
Predicted column (model-predicted contribution/sales)

Optional but Recommended:

Impressions/Reach
Segment columns (region, DMA, product)
Baseline contribution
Actual sales (for validation)

Notes:

Works with any time granularity
Auto-detects common column names
Validates inputs with helpful error messages
Supports multi-dimensional analysis

Troubleshooting

Common Issues

1. IndentationError or ImportError

# Reinstall the package
pip install -e . --force-reinstall

2. No visualizations generated

Ensure query explicitly asks for visualization: "Show me..." or "Visualize..."
Check that data has required columns for the requested chart

3. Optimization fails to converge

Try different algorithm: method="trust-constr" (more robust)
Adjust constraints: ensure min/max are feasible
Increase max_iter: max_iter=1000

4. "Data not available" errors

Check date range: print(df['date'].min(), df['date'].max())
Ensure requested channels exist: print(df['channel'].unique())

5. JSON serialization errors (Period objects)

Automatic handling is built-in for Plotly figures
For custom exports, use: df['quarter'] = df['quarter'].astype(str)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup:

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black src/ tests/
ruff check src/ tests/

# Type checking
mypy src/

Acknowledgments

DeepCausalMMM: Response curve generation (Hill transformation)
DeepAnalyze: Inspiration for agentic planning architecture
OpenAI: Function Calling and GPT-4o model

Citation

If you use this package in research, please cite:

@software{llm_copilot_mmm,
  title={LLM Copilot for Marketing Mix Modeling},
  author={Aditya Puttaparthi Tirumala},
  year={2025},
  url={https://github.com/adityapt/llm-copilot},
  version={1.0.0}
}

License

MIT License - see LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: puttaparthy.aditya@gmail.com

Built with ❤️ for MMM practitioners

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
config		config
data		data
examples		examples
src/llm_copilot		src/llm_copilot
test_chroma_db		test_chroma_db
test_chroma_db_factory		test_chroma_db_factory
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHATGPT_APPROACH.md		CHATGPT_APPROACH.md
Dockerfile		Dockerfile
ENV_SETUP.md		ENV_SETUP.md
LICENSE		LICENSE
PHASE1_PHASE2_IMPLEMENTATION.md		PHASE1_PHASE2_IMPLEMENTATION.md
PRODUCTION_FEATURES.md		PRODUCTION_FEATURES.md
README.md		README.md
TEMP_FIX.txt		TEMP_FIX.txt
docker-compose.yml		docker-compose.yml
inspect_chromadb.py		inspect_chromadb.py
pyproject.toml		pyproject.toml
show_chromadb_structure.py		show_chromadb_structure.py
test_chromadb.py		test_chromadb.py
test_persistence.py		test_persistence.py

License

adityapt/llm-copilot

Folders and files

Latest commit

History

Repository files navigation

LLM Copilot for Marketing Mix Modeling

Overview

Features

🤖 Agentic AI System

📊 Core MMM Capabilities

📈 Visualization & Reporting

🧠 Knowledge & Learning

🏭 Production Ready

Installation

Basic Installation

With Production Features

Configuration

Quick Start

1. Prepare Your Data

2. Initialize the Copilot

3. Ask Questions

4. Working with Results

5. Business Constraints (Optional)

Advanced Usage

REST API Server

Data Connectors

Caching

Conversation Context

Monitoring & Observability

Architecture

Agentic System Flow

Tools Available

Supported Metrics

Configuration

Testing

Performance Benchmarks

Project Structure

Examples

Data Requirements

Troubleshooting

Common Issues

Contributing

Acknowledgments

Citation

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages