We will be undergoing planned maintenance on January 16th, 2026 at 1:00pm UTC. Please make sure to save your work.

Here is a polished, winning "About the Project" write-up for your Devpost/Hackathon submission. It hits the sponsor requirements, explains the tech clearly, and uses LaTeX to formalize your "Self-Evolution" logic.


AutoResearch Lab: The AI Analyst That Never Makes the Same Mistake Twice

💡 Inspiration

In the high-velocity world of DeFi and crypto, information asymmetry is the enemy. To understand a protocol's risk, a human analyst has to manually triangulate data: check TRM Labs for wallet risk, look at Finster AI for market volatility, and read whitepapers for sentiment. By the time the report is written, the market has moved.

We tried using standard AI chatbots to speed this up, but we hit a wall: AI Agents have amnesia. If an agent fails to check a specific risk factor today, it will make the exact same mistake tomorrow.

We asked: What if we built a research system that actually has a memory? What if it could critique its own work and get smarter with every query?

🏗️ How we built it

AutoResearch Lab is not a single chatbot; it is a Multi-Agent System (MAS) orchestrated by a central Node.js brain. We moved beyond simple RAG and implemented a Reflexion Loop architecture.

The Core Stack

  • Intelligence: Anthropic (Claude 4.5 Sonnet) serves as the reasoning engine for our five specialized agents.
  • Memory & Caching: Redis acts as the system's hippocampus, caching API responses and storing long-term "Learnings."
  • Data Layer: We orchestrate calls via Postman Collections to fetch real-time intel from TRM Labs (On-chain Risk), Finster AI (Market Data), and Senso (Sentiment).
  • Security: Skyflow ensures that sensitive API keys and wallet addresses are tokenized and never exposed in plain text.
  • Knowledge Base: Final reports are structured and published to Sanity, creating a permanent, queryable library of research.

The Architecture: The Reflexion Loop

Most agents operate linearly ($Input \rightarrow Process \rightarrow Output$). We introduced a feedback variable, $L$ (Learnings), stored in Redis.

Mathematically, our research process can be defined as:

$$ R_{t+1} = f(Q, \mathcal{D}, L_t + C(R_t)) $$

Where:

  • $Q$ is the User Question.
  • $\mathcal{D}$ is the Data fetched from APIs (TRM/Finster).
  • $R_t$ is the Report generated at time $t$.
  • $C$ is the Critic Function (our specialized Critic Agent).
  • $L$ is the cumulative "Learnings" stored in Redis.

This means the quality of the research ($R$) strictly increases over time ($t$) as the Critic adds new constraints to the system's memory.

💻 The Agent Workflow

  1. The Planner: Breaks the user's prompt into sub-questions and checks Redis for past lessons (e.g., "Last time we forgot to check Bridge volume, so add that to the plan").
  2. The Executor: securely retrieves credentials via Skyflow, hits the APIs, and caches raw JSON in Redis.
  3. The Analyst: Runs statistical anomaly detection on the raw data.
  4. The Reporter: Synthesizes findings into a clean Sanity document.
  5. The Critic: Reviews the final output. If it finds gaps, it commits a new "Learning" to the database, permanently upgrading the Planner's future logic.

🚧 Challenges we ran into

  • The "JSON" Struggle: Convincing a creative LLM to output strict, parse-able JSON for downstream code execution was difficult. We had to refine our System Prompts significantly to handle edge cases and API failures gracefully.
  • Model Versioning: We hit a critical 404 error mid-hackathon because we were targeting a bleeding-edge model version that wasn't fully propagated to our API keys yet. Debugging this taught us the value of stable model pinning in production.
  • Orchestration Latency: Chaining 5 agents + 3 external APIs can be slow. Implementing aggressive caching with Redis reduced our repeat-query time from ~15 seconds to ~200ms.

🏅 Accomplishments that we're proud of

  • True Self-Evolution: We successfully demonstrated that the agent "remembers" a critique from a previous session and applies it to a new, unrelated session.
  • Secure by Design: Integrating Skyflow meant we didn't have to compromise on security to move fast.
  • Structured Knowledge: Instead of dumping text into a chat window, generating rich documents in Sanity means our AI is building a lasting knowledge base, not just ephemeral conversations.

🧠 What we learned

We learned that Specialization > Generalization. A single "do-it-all" prompt failed constantly. Breaking the system into distinct roles (Planner, Executor, Critic) made the system robust, debuggable, and significantly more intelligent. We also learned that Redis is the unsung hero of AI agents—without state, there is no intelligence, only processing.

🚀 What's next for AutoResearch Lab

  • Sanity Canvas Integration: Visualizing the "Idea to Report" flow on an infinite canvas.
  • Human-in-the-loop: Allowing users to manually upvote/downvote the Critic's suggestions to fine-tune the learning process.
  • More Integrations: Adding generic web search to supplement the specialized APIs.

here is a cool video of the backend running in the terminal and agentic flow: https://youtu.be/KZ-jAQs0bT8

Built With

Share this project:

Updates