Refactor runner for implementing batching by rltakashige · Pull Request #1632 · exo-explore/exo

rltakashige · 2026-02-27T15:32:40Z

Motivation

Batching will require us to send tasks concurrently and queue them up. Our current infrastructure cannot handle that all. This PR gets us closer to this by allowing multiple tasks to be sent in parallel and then queuing up tasks.

Changes

Change Plan logic
Make runner main into a class
Add a "BatchGenerator" to which tasks can be submitted (although tasks are handled sequentially) and sent back through an MpSender.
Refactor runner to accept tasks during generation
Keep the generator threading
Separate the runner into several files for better readability

Test Plan

Manual Testing

Tested manually, needs a lot more automated testing. Cancellation still works on a single device. Needs checking on multiple devices.

Automated Testing

src/exo/routing/router.py

src/exo/utils/channels.py

JakeHillion

This looks good to me, there's a lot of it but I like the direction. Would be great to see what continuous batching looks like on top of this before merging.

Evanev7

nice first pass, i like the direction this is going in

src/exo/worker/runner/llm_inference/runner.py

Evanev7 · 2026-02-27T19:32:17Z

src/exo/worker/runner/llm_inference/batch_generator.py

was this intended to be included in this PR?

(the batch generation code that is)

This isn't batch generation. We just queue up a batch of tasks and process sequentially. The resulting behaviour is a no-op, but this gives it the structure of batch generation that we can replace with a true implementation.

it is a batch generator, it's just a bad one.

Since we're going to have both implementations in the next PR, I've renamed this to a SequentialGenerator that implements an Inference Generator ABC. (Not sure about that naming but I couldn't come up with a better one)

src/exo/worker/runner/llm_inference/runner.py

Evanev7 · 2026-02-27T20:13:40Z

src/exo/utils/channels.py

        return d


+class NonBlockingGenerator[T](Generator[T | None, None, None]):


ahh monads my old friend. this is (i reckon) not quite the right abstraction here. in its current iteration, id suggest using just the receiver and letting the WouldBlock exception bubble up so we don't need to do this T | None dance all the way through the pipeline.

Have replied similarly in a different comment, but I want to be able to use this like mlx generate that can be composed with the model output parsers with the option for None when a result isn't available.

which other comment? and, i suppose if you're set on that api we should make the split explicit; have one class throw WouldBlock and have an outer wrapper that catches WouldBlock and converts it to None if you don't want to adjust the current generator mapping functions.

Evanev7 · 2026-02-27T20:14:24Z

src/exo/worker/runner/bootstrap.py

        if bound_instance.is_image_model:
            from exo.worker.runner.image_models.runner import main
+
+            main(bound_instance, event_sender, task_receiver, cancel_receiver)


a change of this scale should be reflected in the image runner.

I think this would make the diff a bit unmanageable. I'll add another pr on top that does this.

playing devils advocate a little but;

im not jake; i dont care about diff size

i do not want the two runners to fall out of sync at the current moment, i want a clean break from old style main to new style class

this is already out of scope of what a single pr could be (i.e. the batch generator interface)

I thought the diffs would be a lot worse but that file isn't too big... Made it a separate commit so I can extract if it ever feels necessary

src/exo/worker/plan.py

Evanev7 · 2026-02-27T20:15:50Z

src/exo/worker/runner/runner_supervisor.py

                        self.status = event.runner_status
                    if isinstance(event, TaskAcknowledged):
-                        self.pending.pop(event.task_id).set()
+                        self.pending[event.task_id].set()


this is a very significant change to the meaning of "pending" in the supervisor, that I'm not a huge fan of. if you want a union of pending and active there should be other ways to implement it.

Will think on this

Added an in progress set instead

Evanev7 · 2026-02-27T21:04:24Z

src/exo/worker/runner/llm_inference/runner.py

+                    self.batch_generator = BatchGenerator(
+                        model=self.inference_model,
+                        tokenizer=self.tokenizer,
+                        group=self.group,
+                        kv_prefix_cache=self.kv_prefix_cache,
+                        model_id=self.model_id,
+                        device_rank=self.device_rank,
+                        cancel_receiver=self.cancel_receiver,
+                        cancelled_tasks=self.cancelled_tasks,
+                        event_sender=self.event_sender,
+                        check_for_cancel_every=self.check_for_cancel_every,
+                    )


if you are going to include a dedicated batch generator task, it should either take ownership of these values or be independent of them. really don't like this sharing.

Since warmup requires a lot of these fields, I'm not sure how I should go about doing this. I have gotten rid of cancelled tasks as that can be entirely local to the generator.

what da ya think! --------- Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>

Evanev7 · 2026-03-02T10:11:50Z

src/exo/worker/runner/image_models/runner.py

thanks for implementing this as well

…actor, regressing from exo-explore#1262.

rltakashige force-pushed the leo/prepare-batch-implementation branch 3 times, most recently from 32fd885 to cdf721e Compare February 27, 2026 16:45

JakeHillion reviewed Feb 27, 2026

View reviewed changes

src/exo/routing/router.py Show resolved Hide resolved

JakeHillion reviewed Feb 27, 2026

View reviewed changes

src/exo/utils/channels.py Outdated Show resolved Hide resolved

rltakashige force-pushed the leo/prepare-batch-implementation branch from cdf721e to 76260fb Compare February 27, 2026 17:51

JakeHillion approved these changes Feb 27, 2026

View reviewed changes

Evanev7 requested changes Feb 27, 2026

View reviewed changes

Evanev7 reviewed Feb 27, 2026

View reviewed changes

rltakashige force-pushed the leo/prepare-batch-implementation branch 6 times, most recently from b771f67 to 8cb9bac Compare March 2, 2026 03:09

rltakashige requested a review from Evanev7 March 2, 2026 03:33

rltakashige force-pushed the leo/prepare-batch-implementation branch 2 times, most recently from ca53647 to 0b00f1a Compare March 2, 2026 17:00

Evanev7 force-pushed the leo/prepare-batch-implementation branch from 593cd59 to b05ddff Compare March 2, 2026 17:03

rltakashige force-pushed the leo/prepare-batch-implementation branch 2 times, most recently from b9c4199 to f6eccf1 Compare March 2, 2026 17:26

rltakashige added 2 commits March 3, 2026 10:49

Refactor runner for implementing batching

f77a672

Match with image runner

6962838

Evanev7 force-pushed the leo/prepare-batch-implementation branch from f6eccf1 to 6962838 Compare March 3, 2026 10:49

Evanev7 and others added 4 commits March 3, 2026 14:06

Batch cleanup (#1649)

e919358

what da ya think! --------- Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>

Pass CI

06beffe

Merge branch 'main' into leo/prepare-batch-implementation

401ccfb

pass CI yet again

725264c

rltakashige enabled auto-merge (squash) March 3, 2026 14:36

Evanev7 approved these changes Mar 3, 2026

View reviewed changes

src/exo/worker/runner/image_models/runner.py

Copy link

Member

Evanev7 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for implementing this as well

rltakashige merged commit 37296c8 into main Mar 3, 2026
6 checks passed

rltakashige deleted the leo/prepare-batch-implementation branch March 3, 2026 14:38

rltakashige mentioned this pull request Mar 3, 2026

Implement continuous batching #1642

Open

michaelharrigan added a commit to michaelharrigan/exo that referenced this pull request Mar 5, 2026

Fixed KV Cache regression introduced in exo-explore#1632 batching ref…

ab7d049

…actor, regressing from exo-explore#1262.

michaelharrigan mentioned this pull request Mar 5, 2026

fix: KVPrefixCache Regression #1668

Merged

supideo mentioned this pull request Mar 7, 2026

Bug: Runner crashes with KeyError in handle_generation_tasks when task is deleted twice #1678

Open

		return d


		class NonBlockingGenerator[T](Generator[T \| None, None, None]):

Conversation

rltakashige commented Feb 27, 2026

Motivation

Changes

Test Plan

Manual Testing

Automated Testing

Uh oh!

Uh oh!

Uh oh!

JakeHillion left a comment

Choose a reason for hiding this comment

Uh oh!

Evanev7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rltakashige Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rltakashige Feb 27, 2026 •

edited

Loading