Inspiration
We were inspired by the MapReduce paradigm in its ability to break complex, high-volume tasks into many simpler subtasks and aggregating results. Combined with the high scalability of Modal's serverless GPU architecture, we believed this strategy could be applied for abstracted orchestration of AI swarms.
What it does
Split orchestrates complex and/or expensive LLM tasks, such as large document summary, among smaller, parallel, specialized agents. Split accomplishes this in three major steps:
- Initialization: Users provide data and prompts. Split designates a single master agent for identifying potential subtasks and their relationships. This information is summarized with a dependency tree, which helps determine how many agents are needed and in what order.
- Mapping: The master agent quickly spawns children agents, each equipped with their own tooling to accomplish their respective subtasks. The results from each child agent is aggregated into a shared memory layer for context-awareness.
- Reduction: The final step, the master agent polls from the shared memory layer to create the inference.
How we built it
Infrastructure*: We used Modal for the GPU compute and ephemeral container scheduling. We used Supermemory for the shared memory layer.
Backend: We used the native FastAPI support in Modal to exposed endpoints.
Frontend: The web client was building using pure HTML/CSS and Javascript.
Challenges we ran into
The main challenge was learning to use Modal for optimizing LLMs. We quickly found deploying LLMs per child agent was creating massive overhead. To fix this, we split the infrastructure on modal between the orchestration planner and centralized vLLM server. We also kept at least 3 containers in the centralized vLLM server warm to avoid cold start overhead.
Accomplishments that we're proud of
We are proud it works.
What we learned
We learned a lot about agentic systems. Before, we weren't really familiar with serverless architecture, but after playing around with Modal, we were able to understand the benefits of highly scalable and ephemeral containers.
What's next for Split
We're planning on expanding agent tooling for even more categories of complex tasks!
Built With
- cloudflare
- css
- fastapi
- html
- huggingface
- javascript
- modal
- python
- qwen
- smolagents
- supermemory
- vllm

Log in or sign up for Devpost to join the conversation.