¡alacard!: The open source instruction layer for AI
Imagine a new model drops hours before your livestream or client call. You need working code, not slides. Today that means a dozen tabs, broken examples, and guesswork.
What we built
¡alacard! turns model cards and tool docs into runnable, traceable, self improving notebooks. You pick a Recipe. The agent composes the notebook, runs it safely in Daytona, records every step in Weights & Biases Weave, detects a failure, proposes a minimal patch, and reruns until it passes. You share the verified notebook, remix it, or deploy the same Recipe on Vertex AI Agent Engine.
Dual innovation
- Cookbook Hub (value today): an open hub for cross provider Recipes that combine modular Cards like Weave, Tavily, Daytona, and Vertex into tested workflows that generate pinned notebooks.
- RL/RLHF Data Foundation (value tomorrow): each execution logs a structured state-action-reward tuple. We record pass or fail, retries, latency, error class, and patch choice to build a dataset for future policy learning.
How it works
- Choose a Recipe:
search → browse → execute → score. - Generate a pinned notebook with labeled ops.
- Run in Daytona. Open the Weave trace so you see every call.
- Fail on purpose. Show the error class.
- Apply a policy patch: selector and wait, timeout, dependency pin, or temperature.
- Rerun to green.
- Speak two metrics: time to green and retries.
- Save and Remix. The improvement policy carries over.
Why now
Models and tools evolve faster than adoption. Teams waste time wiring brittle demos. Vendors lose users when examples fail to run. ¡alacard! converts documentation into execution and makes integrations fast, observable, and reusable.
What inspired us
We have felt the panic of a last minute demo. We wanted a system that ships a working notebook on demand and proves behavior with a trace.
Challenges
- Pinning dependencies across Cards without bloating install time
- Normalizing error classes so one policy can choose the right patch
- Keeping the live flow under two minutes with clear Weave spans
- Making the first run fail in a safe and repeatable way
How we built it
- Google ADK coordinates Composer, Executor, and Improver agents.
- Daytona executes notebooks in an isolated sandbox and returns exit codes, stdout, and stderr.
- Weights & Biases Weave traces each Card op with
@weave.opand logs structured signals. - Vertex AI serves as the optional deployment target from the same Recipe.
- Tavily powers the research Card.
- Local Postgres stores Recipes, runs, reward history, and Remix lineage.
- Google Colab delivers notebooks with one click run links.
What we learned
- Judges trust numbers they can see. We keep the Weave trace open and speak time to green and retries the same way every time.
- Small, deterministic policy patches beat complicated logic in a hackathon.
- A verified notebook beats a perfect slide every time.
Accomplishments
- Live fail → patch → rerun to green with visible metric delta
- Pinned, shareable notebook others can run and remix
- Complete state-action-reward log for future policy learning
- One QR to a Weave permalink so judges can inspect the trace
What is next
- Learn a policy from the collected tuples and compare to the greedy baseline
- Expand Card types and Recipe packs
- Add more evals and publish reward dashboards
- Grow contribution paths for Cards and Recipes
Reward function
We combine pass, latency, retries, and clarity:
$$ R = w_1 \cdot \text{pass} + w_2 \cdot (1 - \text{latency}_{norm}) + w_3 \cdot (1 - \text{retries}_{norm}) + w_4 \cdot \text{clarity}$$ We accept a patch only if: $$ \Delta R > 0 $$.
Built with
- Google’s Agent Development Kit (orchestration)
- Weights & Biases Weave (observability)
- Daytona (sandboxed execution)
- Vertex AI (deployment and scaling)
- Tavily (research and documentation)
The result
When a new model drops, ¡alacard! gives you a working notebook before the hype even starts.
Built With
- ag-ui
- browserbase
- copilotkit
- daytona
- google-adk
- google-colab
- jupyter
- mastra
- papermill
- postgresql
- python
- stagehand
- tavily
- typescript
- vertex-ai
- weave
- weights-and-biases


Log in or sign up for Devpost to join the conversation.