Building an A2A Skill Retrieval System: Our Journey

The Inspiration

Our journey began with a vision of what could be possible with Google's newly released Agent-to-Agent (A2A) protocol. While the protocol is still in its early stages, we recognized its transformative potential for AI agent ecosystems. The protocol establishes a foundation for how agents can publish their capabilities through "agent cards" — public metadata files describing what they can do. We saw an opportunity to help accelerate adoption of this promising standard.

We realized that as more developers begin creating A2A-compatible agents, finding the right agent with the right skill for a specific task would become increasingly challenging. It would be like having a phonebook of specialists but no efficient way to search it. By developing a skill retrieval system now, we could demonstrate the protocol's importance and potential impact while solving a problem that would only grow more significant as the ecosystem matures.

The A2A protocol provides a solid foundation for agent communication, but we saw an opportunity to enhance discovery from the beginning. What if we could create a system that understood natural language requests and matched them to the most relevant agent skills? This would unlock the true potential of a multi-agent ecosystem by creating a frictionless discovery layer between human needs and agent capabilities.

What We Built

We developed a skill retrieval system that leverages semantic search to connect natural language queries with the most appropriate agent skills. At its core, our system:

  1. Ingests agent cards from the A2A ecosystem and extracts structured skill information
  2. Transforms skills into dense vector embeddings using a sentence transformer model
  3. Indexes these embeddings using FAISS (Facebook AI Similarity Search) with an HNSW algorithm
  4. Matches user queries by converting them to the same vector space and finding the nearest neighbors
  5. Returns ranked results with relevant metadata about the agent and skill

The architecture consists of four main components:

  • A PostgreSQL database storing agent card data and skill information
  • A FastAPI application handling HTTP requests and orchestrating the retrieval process
  • A sentence transformer encoding natural language into dense vector embeddings
  • A FAISS index providing high-performance approximate nearest neighbor search

Leveraging Claude

Claude played a pivotal role in our project beyond just being integrated as an agent:

  • Synthetic Data Generation: We used Claude to generate a diverse set of realistic agent cards. This allowed us to:

    • Create hundreds of synthetic agent cards with varied capabilities to stress-test our system
    • Generate skill descriptions with different wording styles to improve our retrieval robustness
    • Simulate domain-specific agents across industries like healthcare, finance, and education
    • Produce edge cases to identify weaknesses in our matching algorithm
  • Creating Functional Agents: We leveraged Claude to power agents that could be discovered and utilized through our matching system:

    • Built specialized Claude-powered agents with distinct skill sets that published A2A-compatible agent cards
    • Created agents that could demonstrate "skill chaining" by discovering and delegating to other agents

How We Built It

Our development process followed these key steps:

  1. Data Collection & Analysis: We examined the A2A protocol specification to understand the structure of agent cards and skill descriptions. This helped us design a database schema that could effectively capture all relevant information.

  2. Database Design: We chose PostgreSQL with the pgvector extension to store both structured data and vector embeddings. This allowed for efficient querying of agent metadata alongside semantic search capabilities.

  3. Embedding Selection: After experimenting with several options, we settled on a 384-dimensional sentence transformer model that balanced accuracy with performance. The smaller dimension size (compared to 768-D alternatives) improved search speed without significantly sacrificing accuracy.

  4. Index Optimization: We implemented FAISS with the Hierarchical Navigable Small World (HNSW) algorithm, which provided excellent retrieval accuracy with low latency. We tuned the EF_CONSTRUCTION and EF_SEARCH parameters to balance build time, memory usage, and search performance.

  5. API Development: We built a clean REST API using FastAPI to expose our search functionality, making it easy for other applications to leverage our skill retrieval system.

  6. Evaluation Framework: We created a comprehensive evaluation suite to measure top-k accuracy, latency, and other important metrics across a diverse set of test queries.

Challenges We Faced

Some of the key challenges we encountered include:

1. Integration with Google's A2A Protocol

Working with Google's A2A protocol presented an initial learning curve. The specification was comprehensive but still evolving, which meant we needed to stay flexible as the protocol matured. We had to carefully parse agent card JSON structures and ensure our system could handle variations in how different agents expressed their capabilities.

2. Claude Integration Complexity

One of our biggest challenges was integrating Claude with Google's A2A protocol. Anthropic's Claude and Google's systems have different architectures and communication patterns. We used Google's Agent Development Kit (ADK), which provided a standardized way to wrap Claude's capabilities in an A2A-compatible format.

3. Semantic Search Quality

Achieving high-quality semantic matches between natural language queries and skill descriptions proved challenging. Skill descriptions varied widely in length, specificity, and language use.

4. Performance Optimization

As our index grew to accommodate more agents, search latency became a concern. We implemented several optimizations:

  • Fine-tuning FAISS HNSW parameters for our specific use case
  • Adding a query cache for frequently requested terms
  • Optimizing database queries to reduce overhead
  • Implementing batch processing for embedding generation

5. Handling Ambiguous Queries

Some user queries could reasonably match multiple different skills. As next steps, we want to develop a scoring system that considers not just vector similarity but also additional factors like skill popularity, agent reliability, and query-specific heuristics.

What We Learned

This project taught us valuable lessons about building AI-powered systems:

  1. Vector Search is Powerful: It enables fast matches. But, small changes in embedding models or index parameters can have significant impacts on both accuracy and performance.

  2. Protocol Standards Matter: Google's A2A protocol provided a strong foundation that enabled interoperability. Standards reduce integration friction and enable ecosystems to thrive.

  3. Hybrid Approaches Win: While pure vector search performed well, we think combining it with traditional filtering and metadata-based ranking would help us improve.

  4. Evaluation is Critical: Building an evaluation framework early helped us make decisions about architecture and algorithm choices.

  5. AI Systems Need Careful Integration: Integrating different AI systems (like Claude) into new protocols requires thoughtful adaptation of capabilities and careful handling of different interaction patterns.

Future Directions

Looking ahead, we see several opportunities to enhance our skill retrieval system:

  1. Fine-tuning our embedding model on domain-specific data to better capture the nuances of agent capabilities
  2. Implementing hybrid retrieval that combines dense vector search with sparse methods like BM25
  3. Adding metadata pre-filtering to improve search efficiency and accuracy
  4. Scaling our system to support thousands more agent cards and skills
  5. Exploring multi-lingual skill retrieval to make agent capabilities accessible across language barriers

By building this bridge between human intent and agent capabilities, we hope to accelerate the adoption of multi-agent systems and help realize the potential of Google's A2A protocol.

Built With

Share this project:

Updates