The race to AGI is being run on a broken speedometer. The industry's most trusted AI benchmarks are now fundamentally compromised. With recent studies showing that up to 42% of AI evaluations are tainted by leaked test data, the performance scores we see are becoming a dangerous illusion. This isn't a minor flaw; it's an existential threat to safely navigating our path toward superintelligence.
Training a single frontier model now costs over $100 million, a figure projected to exceed $1 billion by 2027. Yet, companies are forced into a "Transparency Trap": they either release their model weights and lose their most valuable IP, or release their test questions and contaminate the very benchmark they used to prove their worth.
We were inspired to build a solution that breaks this cycle. We envisioned a system where trust is not based on promises, but on mathematical proof.

What It Does: Chintu - The Proof of Intelligence
Chintu is a decentralized protocol for the verifiable, contamination-proof evaluation of advanced AI. We've built the world's first on-chain "secure testing chamber" powered by Midnight and Trusted Execution Environments (TEEs).
Instead of public datasets, benchmarks on Chintu are encrypted on-chain assets. AI models are challenged to solve these hidden problems inside a secure TEE provided by services like Phala Network. The TEE generates a cryptographic attestation—a non-forgeable signature—proving that a specific model ran a specific, private test and produced a specific result.
This proof is the only thing that ever becomes public.
The result is a clean, immutable, and fully auditable ledger of a model's true capabilities on unseen data. We are replacing inflated scores with irrefutable proof.

How We Built It
We built Chintu on a robust, multi-layered stack designed for verifiable computation and privacy.

Midnight Network: The core of our protocol. We used a
Compactsmart contract to serve as the immutable source of truth. It manages the state of the benchmark (VACANT/TESTING), registers cryptographic commitments to the private tests, and logs the verifiable submissions from models. The bulletin board's "one-at-a-time" mechanic was the perfect foundation for our "secure testing chamber."TEE-Hosted LLMs (Phala Network & RedPill API): We integrated with the RedPill API to send prompts to real, powerful LLMs like Google's Gemma and OpenAI's GPT-OSS running inside a GPU Trusted Execution Environment. This allowed us to get a cryptographically signed attestation (
teeAttestation) of the model's inference, which is the core of our verification process.Verifier Agent & CLI (
bboard-cli): We transformed the example CLI into a powerful, non-interactive "Verifier Agent." This backend tool, written in TypeScript, orchestrates the entire flow: posting new benchmarks, calling the TEE model API, retrieving the signature, and submitting the results to our Midnight smart contract.Lace Wallet Integration: All on-chain actions that require ownership—such as posting a new benchmark or finalizing the results—are authorized through real transactions signed via the Lace Wallet. This demonstrates the ZK-proof of ownership that Midnight enables.
Challenges We Ran Into
Our biggest challenge was bridging the three distinct worlds of a TEE API, the Midnight blockchain, and our user interface. Getting the TEE attestation signature, formatting it correctly, and then submitting it to our Compact contract required a deep understanding of data types and asynchronous flows. We also learned a great deal about the Midnight development environment, working through the setup process to create a stable foundation based on the official example-bboard repository.
What We Learned
This hackathon was an incredible deep-dive into the future of verifiable computing. We learned that the "witness" model in Compact is a powerful way to bring off-chain data (like a TEE signature) into a ZK-proof context. Most importantly, we learned that building a system with multiple layers of cryptographic proof (TEEs for computation, ZK for ownership) is not only possible but is the clear future for building high-trust applications.
What's Next for Chintu
This is just the beginning. The AGI evaluation crisis is a massive opportunity, and our protocol is the foundation for a new economy of computational trust.

Our roadmap includes:
- Building a full SDK for AI companies to easily integrate their models.
- Implementing a DAO to allow the community to fund and curate high-quality, unbiased benchmarks for AGI safety and alignment.
- Expanding to more TEE providers to create a more decentralized and robust network.
- Launching a protocol token to reward benchmark creators, verifiers, and model contributors, creating a self-sustaining ecosystem for proving the future of AI.
Log in or sign up for Devpost to join the conversation.