squarenetes

prompting page
landing page

Inspiration

We noticed coworkers submitting huge prompts to a single AI model, which often took tens of minutes to finish each request. It highlighted a gap in our workflow and showed us there was a real need for a faster way to handle large problems.

What it does

Our solution uses a Kubernetes-style approach. It splits a large prompt into smaller sub-prompts, then distributes them across worker pods based on each pod’s efficiency. It also supports common LLMs such as OpenAI, Anthropic, and Gemini.

How we built it

For our tech stack, we used Kubernetes and Docker for worker orchestration, OpenAI, Gemini, and Anthropic for Compatible API Keys, and React for front-end. To distribute the subprompts among the pods, we calculated the speed of each pod, and divided each speed by the sum of all the speeds to get the fraction of subprompts received by that pod.

Challenges we ran into

We ran into challenges with obtaining the correct number of subprompts to distribute to each pod. We eventually realized that we weren’t clearing the nodes between sessions, which resulted in an incorrect subprompt count. We also faced issues with rate limiting since we sent too many API requests at a time during our testing.

Accomplishments that we're proud of

We’re proud of the fact that we were able to develop a platform that can significantly reduce the time to finish requests to multiple AI models. We pushed through debugging issues that could have derailed the project, and we proved that parallelizing LLM workloads can deliver real performance gains. Seeing our system scale across multiple models and pods felt like a strong validation of our idea.

What we learned

We learned how important orchestration and cleanup are when working with distributed systems. Even small oversights, like not resetting nodes between sessions, can create large downstream problems. We also learned how rate limits shape real-world performance, and how valuable efficient load distribution is when dealing with expensive model calls. Beyond the technical lessons, we learned how to divide work, debug as a team, and keep moving when a problem doesn’t have an obvious fix.

What's next for squarenetes

Next, we want to expand automated load balancing so pods can scale dynamically based on workload. We plan to add support for more model providers and improve the front-end to give users clearer visibility into task distribution and performance. We also want to integrate caching and model-selection logic so Squarenetes can route each subprompt to the model that handles it best. Our long-term goal is to turn Squarenetes into a fully managed platform for fast, parallel LLM computation.

Built With

claude
docker
fastapi
gemini
kubernetes
openai
openrouter
python

Updates

Anika Thakur started this project — Nov 09, 2025 01:04 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.