Skip to content

Refactor runner for implementing batching#1632

Merged
rltakashige merged 6 commits intomainfrom
leo/prepare-batch-implementation
Mar 3, 2026
Merged

Refactor runner for implementing batching#1632
rltakashige merged 6 commits intomainfrom
leo/prepare-batch-implementation

Conversation

@rltakashige
Copy link
Collaborator

Motivation

Batching will require us to send tasks concurrently and queue them up. Our current infrastructure cannot handle that all. This PR gets us closer to this by allowing multiple tasks to be sent in parallel and then queuing up tasks.

Changes

Change Plan logic
Make runner main into a class
Add a "BatchGenerator" to which tasks can be submitted (although tasks are handled sequentially) and sent back through an MpSender.
Refactor runner to accept tasks during generation
Keep the generator threading
Separate the runner into several files for better readability

Test Plan

Manual Testing

Tested manually, needs a lot more automated testing. Cancellation still works on a single device. Needs checking on multiple devices.

Automated Testing

@rltakashige rltakashige force-pushed the leo/prepare-batch-implementation branch 3 times, most recently from 32fd885 to cdf721e Compare February 27, 2026 16:45
@rltakashige rltakashige force-pushed the leo/prepare-batch-implementation branch from cdf721e to 76260fb Compare February 27, 2026 17:51
Copy link
Member

@JakeHillion JakeHillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, there's a lot of it but I like the direction. Would be great to see what continuous batching looks like on top of this before merging.

Copy link
Member

@Evanev7 Evanev7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice first pass, i like the direction this is going in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intended to be included in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the batch generation code that is)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't batch generation. We just queue up a batch of tasks and process sequentially. The resulting behaviour is a no-op, but this gives it the structure of batch generation that we can replace with a true implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a batch generator, it's just a bad one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going to have both implementations in the next PR, I've renamed this to a SequentialGenerator that implements an Inference Generator ABC. (Not sure about that naming but I couldn't come up with a better one)

return d


class NonBlockingGenerator[T](Generator[T | None, None, None]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh monads my old friend. this is (i reckon) not quite the right abstraction here. in its current iteration, id suggest using just the receiver and letting the WouldBlock exception bubble up so we don't need to do this T | None dance all the way through the pipeline.

Copy link
Collaborator Author

@rltakashige rltakashige Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have replied similarly in a different comment, but I want to be able to use this like mlx generate that can be composed with the model output parsers with the option for None when a result isn't available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which other comment? and, i suppose if you're set on that api we should make the split explicit; have one class throw WouldBlock and have an outer wrapper that catches WouldBlock and converts it to None if you don't want to adjust the current generator mapping functions.

if bound_instance.is_image_model:
from exo.worker.runner.image_models.runner import main

main(bound_instance, event_sender, task_receiver, cancel_receiver)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a change of this scale should be reflected in the image runner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would make the diff a bit unmanageable. I'll add another pr on top that does this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

playing devils advocate a little but;

  • im not jake; i dont care about diff size
  • i do not want the two runners to fall out of sync at the current moment, i want a clean break from old style main to new style class
  • this is already out of scope of what a single pr could be (i.e. the batch generator interface)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the diffs would be a lot worse but that file isn't too big... Made it a separate commit so I can extract if it ever feels necessary

self.status = event.runner_status
if isinstance(event, TaskAcknowledged):
self.pending.pop(event.task_id).set()
self.pending[event.task_id].set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very significant change to the meaning of "pending" in the supervisor, that I'm not a huge fan of. if you want a union of pending and active there should be other ways to implement it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will think on this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an in progress set instead

Comment on lines +276 to +287
self.batch_generator = BatchGenerator(
model=self.inference_model,
tokenizer=self.tokenizer,
group=self.group,
kv_prefix_cache=self.kv_prefix_cache,
model_id=self.model_id,
device_rank=self.device_rank,
cancel_receiver=self.cancel_receiver,
cancelled_tasks=self.cancelled_tasks,
event_sender=self.event_sender,
check_for_cancel_every=self.check_for_cancel_every,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are going to include a dedicated batch generator task, it should either take ownership of these values or be independent of them. really don't like this sharing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since warmup requires a lot of these fields, I'm not sure how I should go about doing this. I have gotten rid of cancelled tasks as that can be entirely local to the generator.

@rltakashige rltakashige force-pushed the leo/prepare-batch-implementation branch 6 times, most recently from b771f67 to 8cb9bac Compare March 2, 2026 03:09
@rltakashige rltakashige requested a review from Evanev7 March 2, 2026 03:33
@rltakashige rltakashige force-pushed the leo/prepare-batch-implementation branch 2 times, most recently from ca53647 to 0b00f1a Compare March 2, 2026 17:00
@Evanev7 Evanev7 force-pushed the leo/prepare-batch-implementation branch from 593cd59 to b05ddff Compare March 2, 2026 17:03
@rltakashige rltakashige force-pushed the leo/prepare-batch-implementation branch 2 times, most recently from b9c4199 to f6eccf1 Compare March 2, 2026 17:26
@Evanev7 Evanev7 force-pushed the leo/prepare-batch-implementation branch from f6eccf1 to 6962838 Compare March 3, 2026 10:49
Evanev7 and others added 4 commits March 3, 2026 14:06
what da ya think!

---------

Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
@rltakashige rltakashige enabled auto-merge (squash) March 3, 2026 14:36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for implementing this as well

@rltakashige rltakashige merged commit 37296c8 into main Mar 3, 2026
6 checks passed
@rltakashige rltakashige deleted the leo/prepare-batch-implementation branch March 3, 2026 14:38
michaelharrigan added a commit to michaelharrigan/exo that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants