Skip to content

Conversation

@tomerm-iguazio
Copy link
Contributor

@tomerm-iguazio tomerm-iguazio commented Jan 4, 2026

📝 Description

Added support for OpenAI batch processing, including asynchronous batch execution.
Added integration and unit tests.

Depend on #9117

🛠️ Changes Made

Added batch invocation support (sync via ThreadPoolExecutor, async via asyncio.gather) with two-level concurrency control: (1) instance-level thread pool (_executor) for sync batches with per-batch semaphore limiting concurrent threads, and (2) class-level global async semaphore (_global_async_semaphore) shared across all instances to enforce total async request limits while per-batch semaphores ensure fair distribution-preventing API rate limit violations and resource monopolization.


✅ Checklist

  • I updated the documentation (if applicable)
  • I have tested the changes in this PR
  • I confirmed whether my changes are covered by system tests
    • If yes, I ran all relevant system tests and ensured they passed before submitting this PR
    • I updated existing system tests and/or added new ones if needed to cover my changes
  • If I introduced a deprecation:

🧪 Testing


🔗 References

  • Ticket link: ML-11682
  • Design docs links:
  • External links:

🚨 Breaking Changes?

  • Yes (explain below)
  • No

🔍️ Additional Notes

# Conflicts:
#	tests/serving/test_async_flow.py
# Conflicts:
#	tests/serving/test_async_flow.py
update test to use v2modelserver as model_mode and not as None
add simpler test: test_monitoring_with_model_runner_batch_infer
Copy link
Contributor

@royischoss royischoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey looks good some comments

Copy link
Contributor

@davesh0812 davesh0812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, just need to go over the tests.
One thing we should still think about is how the naive execution mechanism interacts with the asyncio event loop.

Copy link
Contributor

@davesh0812 davesh0812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

one comment for followup pr

try:
# gather() stops on first exception - fast fail
return await asyncio.gather(*tasks)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add relevant list of exceptions here

@davesh0812 davesh0812 merged commit dd942e1 into mlrun:development Jan 15, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants