Apollo Token: A unified multi-model gateway with a built-in engine to slash AI costs by refining prompts.

As the AI ecosystem expands, developers and businesses are increasingly overwhelmed by the fragmentation of AI providers and the rising costs of token consumption. We realized that many users waste significant resources on verbose, inefficient prompts. We wanted to build a "smart gateway" that doesn't just connect you to LLMs but actively works to make your interactions more affordable and effective.

What it does

Apollo Token is a unified chat gateway that centralizes access to multiple AI providers. Its core feature is a Prompt Optimization Engine that intercepts user prompts and refines them to reduce token counts and costs. The platform provides real-time insights into token savings and cost deltas, allowing users to see the tangible impact of optimization before sending their request to the final model.

How we built it

The project is built on a modern, modular stack designed for scalability: Frontend: Developed with React and Next.js, utilizing Tailwind CSS for a clean, professional UI. Backend: Powered by Node.js and Express, featuring a centralized /chat endpoint to orchestrate routing and normalization. Database: MongoDB manages prompt collections, storing original and optimized data for detailed analytics. Optimization: We integrated the Gemini API as the primary engine to handle prompt refinement and improvement logic.

Challenges we ran into

One of the primary hurdles was Response Normalization. Different AI providers return data in various structures; we had to design a consistent "envelope" so the frontend could handle responses from diverse providers interchangeably without breaking the UI. Additionally, fine-tuning the optimizer to reduce tokens without losing the user's original intent required rigorous testing.

Accomplishments

We successfully created a fully functional Prompt Optimization Engine that provides a clear improvement_reason and estimated_token_saving. We are also proud of the system's modularity, which allows switching between models through a single interface while maintaining a complete audit trail in MongoDB.

What we learned

Building Apollo Token deepened our understanding of API orchestration and the economic side of LLMs. We learned how to effectively use compound indexing and TTL in MongoDB to handle high-frequency chat data. More importantly, we discovered that small, automated refinements in prompt engineering lead to significant cumulative cost savings.

What's next for Apollo Token

Our vision is to evolve Apollo Token into a universal AI bridge. We are working toward a future where our gateway integrates with every major AI platform on the market, creating a truly boundary-less ecosystem. A key part of this roadmap is deepening our integration with OpenRouter and Ollama, allowing users to effortlessly toggle between massive cloud providers and local, private models. By expanding this multi-model support, we aim to give developers total control over their AI infrastructure—choosing between cost, speed, or privacy with a single click.