-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Currently all the suggest methods (CLI command, REST API method, project and backend methods) always take just one document at a time. This is inefficient for backends that could process many documents in parallel.
We should introduce a batch version of suggest (called e.g. suggest_batch or suggest_many unless someone has a better idea?) for each of these contexts. Individual backends can then choose to implement it when it gives a performance boost; otherwise, the batch is simply passed to the regular suggest method one document at a time. I believe that at least NN ensemble, SVC, fastText and MLLM backends could benefit from parallel suggest operations. Also, this would be very useful for the proposed XTransformer backend.
A note on scope: This issue is about implementing the scaffolding necessary for batching suggest operations, as well as using them in at least some (not necessarily all) operations that would benefit from it: e.g. eval, hyperopt, optimize, index. Changes to individual backends are out of scope but separate issues for them should be opened after this basic scaffolding is in place.