split_

Our logo, very cool
Our distinguished leader James Yang
This sick monkey mug

Inspiration

We were inspired by the MapReduce paradigm in its ability to break complex, high-volume tasks into many simpler subtasks and aggregating results. Combined with the high scalability of Modal's serverless GPU architecture, we believed this strategy could be applied for abstracted orchestration of AI swarms.

What it does

Split orchestrates complex and/or expensive LLM tasks, such as large document summary, among smaller, parallel, specialized agents. Split accomplishes this in three major steps:

Initialization: Users provide data and prompts. Split designates a single master agent for identifying potential subtasks and their relationships. This information is summarized with a dependency tree, which helps determine how many agents are needed and in what order.
Mapping: The master agent quickly spawns children agents, each equipped with their own tooling to accomplish their respective subtasks. The results from each child agent is aggregated into a shared memory layer for context-awareness.
Reduction: The final step, the master agent polls from the shared memory layer to create the inference.

How we built it

Infrastructure*: We used Modal for the GPU compute and ephemeral container scheduling. We used Supermemory for the shared memory layer.

Backend: We used the native FastAPI support in Modal to exposed endpoints.

Frontend: The web client was building using pure HTML/CSS and Javascript.

Challenges we ran into

The main challenge was learning to use Modal for optimizing LLMs. We quickly found deploying LLMs per child agent was creating massive overhead. To fix this, we split the infrastructure on modal between the orchestration planner and centralized vLLM server. We also kept at least 3 containers in the centralized vLLM server warm to avoid cold start overhead.

Accomplishments that we're proud of

We are proud it works.

What we learned

We learned a lot about agentic systems. Before, we weren't really familiar with serverless architecture, but after playing around with Modal, we were able to understand the benefits of highly scalable and ephemeral containers.