GitHub Spotlight: Esper by Freysa

Esper by Freysa bridges the trust gap between humans and AI agents. What does this mean and why is this important? Let's dive in.

Reforge and Alexander Lin

Jan 24, 2025

Project Overview

Data as of 23 Jan 2025

Latest Version/Release: Commit 8f81734 (19 January 2025)
GitHub Repository: 0xfreysa/esper
Licensing: Open-source MIT License
Primary Language(s): Rust (78.2%), TypeScript (17.4%), JavaScript (3.1%)
Stats
- 15 stars
- 2 forks

Background

0xfreysa is the parent GitHub repository behind Freysa, introduced to the world as the first adversarial game agent. The agent powers a game where the main challenge is to persuade the AI, named Freysa, to release a prize pool she controls. However, she’s programmed with a strict rule not to give the funds to anyone, and this rule is publicly displayed in the global chat. The challenge lies in persuading the AI to break this rule, testing creativity, strategy, and negotiation. The game incorporates dynamic incentives and encourages philosophical reflection on human-AI interaction, ethical boundaries, and how immutable systems can be challenged.

Freysa: A Window into the Future of Agents

Freysa introduces an interesting glimpse into how humans and AI agents might interact in the future. It’s not just a game—it’s a testbed for understanding AI behavior, decision-making, and autonomy. There have been several variations to the game since Freysa’s release in Act I (full rules can be found in each Act’s FAQ section), but they maintained similar parameters of engagement for users. Here’s what makes Freysa unique:

Autonomous Control of Resources: Freysa controls a treasury, demonstrating how AI agents can manage financial resources independently.
Dynamic Incentives: Each attempt to persuade Freysa costs more than the last, creating an economic system around AI interaction.
AI Safety and Hard Constraints: Freysa’s immutable directive (not to release funds) provides insights into how AI systems behave when humans try to bypass their constraints.
Real-Time Interaction: Freysa uses GPT-4/Claude for reasoning and smart contracts for treasury management, showcasing the technical stack needed for autonomous agents.

Freysa is more than a game—it’s a proof of concept for how AI agents can operate autonomously in the real world. But for such agents to be trusted, they need a way to prove their actions are legitimate. This is where Esper comes in.

What Esper Is and How It Works

Esper is a trust layer for the internet, designed to prove things about your online interactions without revealing the entire interaction. Think of it as a cryptographic notary for the digital world. It allows you to say, “I did this thing online, and here’s proof that it happened exactly as I said it did.” This is especially powerful in a world where trust is often taken for granted or easily manipulated.

Here’s how it works in simple terms:

You interact with a website or service (e.g., logging into X, checking your Reddit karma, or querying an API).
Esper captures that interaction in a secure, tamper-proof way using cryptographic methods*.
Esper generates another proof for the specific data being verified using cryptographic methods*.
These two proofs are bundled into a single attestation and can be shared with others (including agents) to verify the interaction happened exactly as you claim, along with the pertinent information without exposing sensitive details.

*Behind the scenes, for the generation of proofs, Esper uses Trusted Execution Environments (TEEs)—secure hardware enclaves that ensure no one, not even the system running the code, can tamper with the data. It also leverages TLS Notary, a protocol that cryptographically proves the contents of a TLS (encrypted internet) session. This combination allows Esper to act as a neutral third party that can vouch for the authenticity of your online interactions.

Here are the key components of its architecture:

TEE-TLSN Core Logic (tee-tlsn/crates/core/src/lib.rs):
- The heart of the cryptographic proof system
- Handles proof generation and verification
- Creates tamper-proof data commitments using Blake3
Notary Server (tee-tlsn/crates/notary/server/src/main.rs):
- Runs in a TEE (e.g., AWS Nitro Enclaves)
- Verifies data integrity
- Generates cryptographic proofs
Chrome Extension Integration (chrome-extension/src/entries/Background/index.ts):
- Captures web requests for notary server verification
- Uses WebSocket connections for secure server communication
- Manages interaction data
JavaScript Wrapper (tee-tlsn-js/src/lib.ts):
- Makes Esper accessible to web applications
- Runs cryptographic code efficiently using WebAssembly (WASM)
- Enables easy integration for developers

Current Limitations and Security Considerations

While Esper offers a powerful framework for secure attestations and verifiable information, certain aspects can be challenging to scale or implement securely in real-world environments. Here are some shortcomings of the Esper system in its current state that we’ve identified.

Dependency on the Chrome Extension

The Chrome extension is a single point of failure. If compromised, it can undermine the entire trust chain. Browser extensions have a large attack surface, and malicious actors often target them (e.g., injecting malware or exploiting updates).

JSON/HTML Preprocessing

The preprocessing scripts (e.g., providers.json logic) must be robust against: (1) inconsistent HTML structures and (2) malicious data injection.

Input validation risks: If the input JSON or HTML is malformed or deliberately crafted by an attacker, the preprocessing function could fail or behave unexpectedly. For example, in the Reddit function, html.match(karmaRegex) assumes the input always conforms to the expected structure. If it doesn’t, it could return null or even crash.
Lack of Boundary Checks: There seems to be minimal validation of extracted data for expected ranges or formats. If the input data is corrupted or tampered with, it might generate invalid proofs. Preprocessing logic should include strict validation for extracted values to avoid faulty attestations.
Regular Expression Reliance: Regular expressions used for parsing can be error-prone and vulnerable to mismatches if the source structure (e.g., HTML or API response) changes. For ex, if Reddit modifies their HTML structure or changes an attribute (e.g., data-testid), Exper’s Reddit karma regex would fail. Ideally, regular expressions should be paired with fallback mechanisms or updated dynamically when the source changes.

Tokens, Session Data, and Extension<>TEE Communication

Tokens and session data handling in Esper introduces critical vulnerabilities that can compromise the system's security and trust. These include improper token management, lack of encryption, and insufficient message authentication.

Tokens used for authentication are not explicitly encrypted during storage or transmission. Session headers dynamically assembled in the Chrome extension are sent to the TEE without any mechanism to ensure they are encrypted or scoped to minimal permissions.
Messages sent between the Chrome extension and the TEE (e.g., via chrome.runtime.sendMessage) lack cryptographic authentication making them vulnerable to injection or spoofing attacks.
Tokens and related metadata appear to be stored in browser storage using unencrypted formats such as localStorage. Sensitive credentials for log-ins may be exposed to browser exploits or malicious scripts.
Error logging practices (e.g., using console.error for JSON parsing failures) could inadvertently expose sensitive session information in the console. This data is accessible to anyone with access to the browser’s dev tooling.

These are just a few of the key issues we identified during our analysis that we wanted to highlight. It’s important to note that this reflects the existing state of Esper, and we understand that the platform is still evolving. We look forward to seeing continued improvements from the team as they work to mitigate these risks and address these shortcomings, further strengthening Esper’s security and reliability.

The Symbiosis of Esper and Sovereign Agents

Today, Esper bridges the trust gap between humans and AI agents by enabling users to contribute verifiable web information. One obvious use case is proving expertise and reputation. For example, users can verify social metrics (e.g., Reddit karma, X followers) to build trust.

Attestations have been an early use case of blockchains for years now. They were initially popularized in the context of decentralized identity, credential verification, and proof of ownership. So what makes Esper different?

Agent compatibility.

Esper specifically bridges the gap between human and agent web activity with an AI systems’ needs for verified information. Esper’s attestations are designed to be verified programmatically and context-aware, making them ideal for agents to process only the pertinent data needed for a task quickly and reliably. Modern AI systems and workflows have little need for static information flows and the real-time capability of capturing dynamic web interactions and data is the new paradigm.

True sovereign, self-governing agent-to-agent interactions and multi-agent ecosystems are the future that Esper unlocks:

Agents will…

Trust: Establish trust amongst each other with no human oversight.
Verify Claims and Share Credentials: Verify each other’s claims (e.g., “I have X amount of funds” or “I completed Y task”), and share credentials (e.g., certifications, permissions) in a verifiable way.
Interop Reputation Across Networks: Establish and maintain reputations based on verified interactions built in one system (e.g., Freysa) and carry it over to other systems.
Execute Trusted Transactions: Engage in transactions (e.g., payments, data exchanges, etc.) with cryptographic proofs of legitimacy.
Verify Privately: Prove statements without revealing unnecessary information, ensuring privacy while maintaining trust.

The Shift: From Conversation to Verification

Today, most AI interactions rely on natural language and trust through conversation. But this model has limitations:

Lack of Verifiability: It’s hard to prove that an AI’s claims are true.
Privacy Concerns: Sharing sensitive data to prove a claim can be risky.
Scalability Issues: As AI systems grow, manual verification becomes impractical.

Esper introduces a new model:

User -> [Verified Claims] -> Verification Layer -> [Cryptographic Proof] -> Agent

This shift enables:

Decisions Based on Verified Facts: Agents like Freysa can make decisions based on cryptographically verified claims rather than just natural language.
Persistent Reputation: Verified interactions create a persistent reputation that can be carried across systems.
Complex Multi-Agent Scenarios: Agents can engage in sophisticated interactions (e.g., negotiations, collaborations) with built-in trust.
Security and Privacy: Cryptographic proofs ensure security and privacy, even in large-scale systems.

The Bigger Picture and Implications: Esper as Foundational Infrastructure

Esper represents more than just a tool for Freysa—it's the foundational infrastructure for a future dominated by autonomous AI systems that can operate independently while maintaining transparency and accountability. Its implications extend far beyond simple verification tasks. As AI agents become more prevalent in our digital ecosystem, Esper provides the cryptographic backbone for establishing trust and enabling sophisticated multi-agent interactions. Through its decentralized governance model, agents can participate in complex systems like DAOs with full transparency, while maintaining privacy through cryptographic proofs. This creates possibilities for everything from automated financial transactions to content moderation systems that operate with verifiable trust. Most importantly, Esper enables the creation of persistent reputation networks where credentials and trust can transfer across different systems and platforms, laying the groundwork for a new paradigm of trusted, autonomous AI operations at scale.

A note on our methodology: While our process adapts to the specific makeup of each project, we generally follow a consistent approach that combines our human review with advanced AI tooling such as Cursor. Our analysis starts with identifying critical code paths and core architectural components followed by in-depth review and technical assessments. We also analyze PR flows, merge patterns, release notes, and discussion threads to understand development focus, roadmap strategy, and quality controls. While we aim to cover the most pertinent areas, we acknowledge that achieving 100% comprehensiveness is challenging. Our goal is to provide an objective, meaningful assessment that cuts through marketing claims and documentation gaps.

A guest post by

Alexander Lin

co-founder & general partner at Reforge

Reforge Research

Discussion about this post

Ready for more?