Alxandria: AI-Powered ML Research Assistant
https://github.com/Tyronita/Alxandria.git
Inspiration
As ML researchers and practitioners, we constantly face the same frustrating bottleneck: translating curiosity into action. Reading dozens of papers, finding the right datasets, understanding SOTA benchmarks, and setting up experiments takes hours or days. We envisioned Alxandria—named after the ancient library of knowledge—as a tool that compresses weeks of research literature review into minutes, automatically generating executable Kaggle notebooks with battle-tested code, proper citations, and real datasets ready to run.
The inspiration came from watching researchers spend 80% of their time on setup and only 20% on actual experimentation. We wanted to flip that ratio.
What it does
Alxandria transforms a simple research query (like "medical image classification" or "transformer-based time series forecasting") into a comprehensive, executable ML research package in under 5 minutes:
- Intelligent Literature Review: Searches academic sources (arXiv, Papers with Code, GitHub) and synthesizes the top 3 most relevant papers with working links, contributions, and code repositories
- Gap Analysis: Identifies 2-3 specific research gaps with difficulty ratings, expected impact, and proposed solution approaches
- Dataset Discovery: Finds 3-4 relevant datasets from Kaggle/HuggingFace with size, format, SOTA performance metrics, and direct access links
- Implementation Roadmap: Generates a technical checklist with environment setup, data pipeline architecture, baseline models, evaluation metrics, and current SOTA benchmarks
- One-Click Kaggle Deployment: Automatically pushes a pre-populated Jupyter notebook to your Kaggle account with:
- Full research background with citations
- Identified gaps and opportunities
- Executable dataset loading code (not comments!) that downloads, extracts, and loads data
- PyTorch model templates with training loops
- Evaluation functions and submission helpers
The entire workflow is conversational and guided—users simply enter a topic and click through 4 research steps, then get a shareable Kaggle link instantly.
How we built it
Architecture
Frontend: React with Tailwind CSS provides a minimal, conversational UI. Users progress through a 5-step wizard (Papers → Gaps → Datasets → Implementation → Ship), with real-time loading indicators since each Perplexity API call takes 10-30 seconds.
Backend: FastAPI serves a RESTful API with MongoDB for session persistence. The core /api/research/step endpoint handles the multi-turn research flow, while /api/ship/push-to-kaggle orchestrates notebook generation and deployment.
Perplexity API Integration (The Brain)
The system leverages Perplexity's Chat Completions API (https://api.perplexity.ai/chat/completions) using the sonar-pro model, which combines real-time web search with LLM reasoning:
perplexity_client = OpenAI(
api_key=PERPLEXITY_API_KEY,
base_url="https://api.perplexity.ai"
)
response = perplexity_client.chat.completions.create(
model="sonar-pro",
messages=[
{"role": "system", "content": structured_prompt},
{"role": "user", "content": user_query}
],
extra_body={
"search_domain_filter": [
"arxiv.org",
"github.com",
"paperswithcode.com",
"kaggle.com"
]
}
)
Each research step uses carefully crafted system prompts that enforce output structure (markdown tables for papers, detailed gap analysis with difficulty ratings, dataset comparisons with SOTA metrics). The search_domain_filter ensures high-quality academic and technical sources rather than general web content.
Critical Design Decision: We chose sonar-pro over sonar-deep-research after discovering the latter caused 60+ second timeouts. The Pro model balances depth with response time (10-30s per step), maintaining user engagement while delivering comprehensive results.
MongoDB Session Management
Each research session generates 4 documents storing:
- Step 1 content → research papers and analysis
- Step 2 content → gaps (stored as update to step 1 doc)
- Step 3 content → datasets (stored as update)
- Step 4 content → implementation plan (stored as update)
This allows the notebook generator to reconstruct the entire research journey from a single session_id.
Notebook Generation Engine
The generate_notebook_from_research() function is the culmination of all research steps:
async def generate_notebook_from_research(
session_id: str,
topic: str,
dataset: str
) -> dict:
# Fetch ALL research data
research_doc = await db.research.find_one({
"session_id": session_id,
"step": 1
})
# Extract content from each step
research_content = research_doc.get('content', '')
gaps_content = research_doc.get('gaps', '')
dataset_content = research_doc.get('datasets', '')
implementation_content = research_doc.get('implementation', '')
# Build Jupyter notebook JSON structure
notebook = {
"cells": [
# Markdown cells with research background
{
"cell_type": "markdown",
"source": [research_content] # Full Perplexity response
},
# CODE cell with ACTUAL executable dataset loading
{
"cell_type": "code",
"source": [
f"!kaggle datasets download -d {dataset} --force\n",
"with zipfile.ZipFile(zip_file, 'r') as zip_ref:\n",
" zip_ref.extractall('./data')\n",
"df = pd.read_csv(csv_files[0])\n",
"print(df.head())"
]
},
# Model training, evaluation, submission cells...
]
}
return notebook
Key Innovation: Unlike most notebook generators that use placeholder comments (# TODO: Load your data), Alxandria generates fully executable Python code that actually downloads the Kaggle dataset, extracts it, lists files, and auto-loads CSV data—all without user intervention.
Kaggle CLI Integration
Pushing to Kaggle required deep understanding of their API constraints:
- Metadata Requirements: Kaggle's
kernel-metadata.jsonneeds specific fields (id,title,code_file,kernel_type,language,is_private, etc.) and the title must resolve to the same slug as the ID - Slug Generation: Topic "Using Transformers to Detect Illegal Deforestation" must convert to
alxandria-using-transformers-to-detect-i-{timestamp}(shortened to avoid 50-char limit) - Authentication: Uses environment variables (
KAGGLE_USERNAME,KAGGLE_KEY) and adds/root/.venv/binto PATH for CLI access
The push workflow:
# Generate safe kernel slug
safe_topic = ''.join(c if c.isalnum() else '-' for c in topic.lower())
safe_topic = '-'.join(filter(None, safe_topic.split('-')))[:30]
kernel_slug = f"alxandria-{safe_topic}-{timestamp}"
# Create metadata that matches Kaggle's requirements
metadata = {
"id": f"{username}/{kernel_slug}",
"title": f"Alxandria {safe_topic.replace('-', ' ').title()} {timestamp}",
"code_file": "notebook.ipynb",
"language": "python",
"kernel_type": "notebook",
"is_private": False,
"enable_gpu": True,
"enable_internet": True
}
# Push via CLI
run_kaggle_command(['kaggle', 'kernels', 'push', '-p', temp_dir])
# Return shareable link
return f"https://www.kaggle.com/code/{username}/{kernel_slug}"
Challenges we ran into
1. The 403 Forbidden Kaggle Mystery
Problem: Backend returned success messages, generated perfect Kaggle links, but every link led to 404 errors. The notebooks didn't actually exist.
Investigation:
- First attempt: "Maybe metadata format is wrong?" → Fixed
languagefield issues - Second attempt: "Maybe the slug doesn't match?" → Fixed title/slug alignment
- Third attempt: "Maybe it's the dataset format?" → Added validation
- Root cause (after 3 hours): Kaggle API requires phone verification on the account before allowing programmatic kernel creation via API
Solution: User verified their phone number on Kaggle settings, and immediately the 403 errors became 200 success responses. Notebooks started appearing on Kaggle within seconds.
Lesson: Always check API service-level requirements (account verification, rate limits, permissions) before debugging code.
2. Perplexity API Timeouts
Problem: Initial implementation used sonar-deep-research model, which took 60-90 seconds per request. Frontend axios calls timed out at 30 seconds, causing "Failed to load step" errors despite successful backend responses.
Solution:
- Switched to
sonar-promodel (10-30s response time) - Added explicit 60-second timeout to frontend axios:
javascript const response = await axios.post(`${API}/research/step`, payload, { timeout: 60000 // Critical for long-running Perplexity calls }); - Added proper error messages for timeout vs network failures
3. Kaggle Slug Title Mismatch (400 Bad Request)
Problem: Kaggle API rejected notebooks with "Your kernel title does not resolve to the specified id" errors. Titles like "Alxandria: Detecting Illegal Deforestation with Transformers" didn't convert to the slug alxandria-detecting-illegal-deforestation-with-transformers-.
Root Cause:
- Trailing hyphens in slugs
- Title contained special characters (
:) that Kaggle strips - Slug exceeded 50-character limit
Solution: Implemented proper slug sanitization:
# Remove special chars, collapse hyphens, trim length
safe_topic = ''.join(c if c.isalnum() else '-' for c in topic.lower())
safe_topic = '-'.join(filter(None, safe_topic.split('-')))[:30]
# Add timestamp for uniqueness
kernel_slug = f"alxandria-{safe_topic}-{timestamp}"
# Match title to slug format
kernel_title = f"Alxandria {safe_topic.replace('-', ' ').title()} {timestamp}"
4. Empty Notebooks on Kaggle
Problem: Early versions pushed notebooks successfully but they contained placeholder text: "Research data not found. Using baseline template."
Root Cause: The notebook generation function was fetching from MongoDB but the research data wasn't being stored properly during the Perplexity API calls.
Solution: Added explicit logging and verified that each research step's content field was being stored:
await db.research.insert_one({
"session_id": session_id,
"step": 1,
"topic": topic,
"content": response.choices[0].message.content, # Perplexity response
"timestamp": datetime.now(timezone.utc).isoformat()
})
# For subsequent steps, update the document
await db.research.update_one(
{"session_id": session_id, "step": 1},
{"$set": {"gaps": content}} # Add gaps, datasets, implementation
)
This ensured notebooks contained full research data (6000+ characters per section).
Accomplishments that we're proud of
1. True End-to-End Automation
Most "notebook generators" create skeleton code with TODOs. Alxandria generates fully executable code that:
- Downloads real Kaggle datasets:
!kaggle datasets download -d {dataset} --force - Automatically extracts zip files:
zipfile.extractall('./data') - Detects and loads CSV files:
pd.read_csv(csv_files[0]) - Lists all extracted files for user reference
- Includes proper error handling and progress messages
Users can literally click "Run All" in Kaggle and watch their entire experiment execute without writing a single line of code.
2. Research Quality with Citations
Unlike generic LLM responses, Alxandria provides:
- Verifiable sources: Every claim links back to arXiv papers, GitHub repos, or Papers with Code
- Working links: We validate that paper URLs resolve (no broken links)
- Structured analysis: Tables comparing papers, gap analysis with difficulty ratings, dataset comparisons with SOTA benchmarks
- Up-to-date information: Perplexity's real-time search ensures recent papers (2023-2024) appear in results
Example output quality:
## Top 3 Research Papers
| # | Paper | Authors | Contribution | Links |
|---|-------|---------|--------------|-------|
| 1 | Vision Transformer for Small-Size Datasets | Lee et al. | Shifted Patch Tokenization for low-data regimes | [Paper](arxiv.org/...) · [Code](github.com/...) |
3. 5-Minute Research → Production Pipeline
Traditional ML research workflow:
- Day 1-2: Literature review (reading 10-20 papers)
- Day 3: Finding datasets and understanding formats
- Day 4-5: Setting up environment, writing boilerplate code
- Day 6+: Actual experimentation
Alxandria workflow:
- Minute 1: Enter research topic
- Minute 2-3: Review papers, gaps, datasets (4 guided steps)
- Minute 4: Click "Push to Kaggle"
- Minute 5: Open notebook, click "Run All", start experimenting
95% time reduction on research setup.
4. Robust Error Handling with User Guidance
When things fail, Alxandria provides actionable feedback:
- 403 Forbidden → "Your account needs phone verification. Go to kaggle.com/settings"
- Timeout errors → "Request timed out. Perplexity API takes 20-30 seconds, please wait"
- Invalid dataset format → "Dataset must be in format: username/dataset-name"
- Slug mismatch → Automatically fixes by adding timestamps and sanitizing titles
This turns frustrating debugging sessions into guided fixes.
What we learned
1. Real-time LLM APIs Require Patience
Perplexity's search-augmented generation takes 10-30 seconds per request because it's:
- Searching the web for relevant sources
- Filtering by domain (academic, technical)
- Analyzing and synthesizing information
- Generating structured markdown with citations
Lesson: Always set frontend timeouts 2-3x longer than expected backend response time. Show progress indicators with realistic time estimates ("This may take 20-30 seconds") to manage user expectations.
2. API Documentation ≠ API Reality
Kaggle's official docs say language is a valid metadata field. Reality: it causes Invalid field 'language' errors in 2024. Phone verification requirement isn't mentioned anywhere in API docs.
Lesson: When facing mysterious API failures:
- Check GitHub issues for the API library
- Search recent Stack Overflow questions (last 3-6 months)
- Test with minimal examples before complex implementations
- Use web search tools to find recent changes/requirements
3. Notebook Code Must Be Executable, Not Educational
Early versions had "educational" code with comments explaining concepts:
# First, we need to load the data
# You can use pandas for CSV files:
# df = pd.read_csv('your_data.csv')
Users wanted runnable code:
!kaggle datasets download -d {dataset} --force
with zipfile.ZipFile(f"{dataset.split('/')[-1]}.zip", 'r') as zip_ref:
zip_ref.extractall('./data')
csv_files = list(Path('./data').glob('**/*.csv'))
df = pd.read_csv(csv_files[0])
print(f"Loaded {len(df)} rows")
Lesson: Code generation tools should prioritize "copy-paste-run" over "read-and-understand". Documentation can be in markdown cells, but code cells must execute.
4. Domain Filtering Dramatically Improves LLM Output Quality
Generic web search returns blog posts, tutorials, and outdated content. Filtering to arxiv.org, github.com, paperswithcode.com ensures:
- Peer-reviewed papers (not Medium articles)
- Working code repositories (not broken links)
- SOTA benchmarks (not inflated marketing claims)
- Recent research (2023-2024 papers)
Lesson: When using search-augmented LLMs, always constrain the search space to authoritative sources for your domain.
What's next for Alxandria
1. Multi-Dataset Experiments
Currently, Alxandria generates notebooks for single datasets. Next version will support:
- Comparative analysis: Automatically test the same model on 3-4 related datasets
- Cross-dataset validation: Train on Dataset A, test on Dataset B to check generalization
- Ensemble strategies: Combine predictions from multiple dataset-specific models
2. Automated Hyperparameter Tuning
Generate notebooks with Optuna/Ray Tune integration:
def objective(trial):
lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
# Model training with these hyperparameters
return validation_accuracy
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
3. GitHub Integration for Version Control
Instead of just Kaggle:
- Create GitHub repo with notebook, README, and requirements.txt
- Set up GitHub Actions for automated training on commits
- Generate reproducible experiment tracking with MLflow/Weights & Biases
4. Smart Dataset Recommendations
Use embeddings to match research topics to datasets:
- Embed user query: "medical image classification with limited labels"
- Embed dataset descriptions from Kaggle/HuggingFace
- Rank by semantic similarity + metadata (size, license, recent updates)
- Prioritize datasets with active competitions or high engagement
5. Live Experiment Monitoring
After pushing to Kaggle:
- Poll notebook execution status
- Display training metrics in real-time (loss curves, accuracy)
- Send alerts when training completes or fails
- Auto-generate comparison tables if user runs multiple experiments
6. Research Paper Upload
Allow users to upload specific papers (PDFs):
- Extract methodology sections with PyPDF2/LlamaParse
- Generate notebooks that replicate the paper's approach
- Include proper citations and attribution
- Highlight differences between original paper code and our implementation
7. Multi-Modal Research Support
Extend beyond computer vision:
- NLP: Tokenization, transformer fine-tuning, evaluation metrics (BLEU, ROUGE)
- Time Series: ARIMA, Prophet, LSTM templates with proper train/val/test splits
- Reinforcement Learning: Environment setup, agent training loops, reward visualization
- Audio/Speech: Librosa for feature extraction, wav2vec models
Built With
- fastapi
- kaggle
- openai
- perplexity
- react


Log in or sign up for Devpost to join the conversation.