Per discussion with @jpountz , it is unlikely that we can implement a circuit breaker on the memory used by the _analyzer api when generating tokens. But we may be able to implement a soft limit on the number of tokens produced which is directly related to memory usage.
Per discussion with @jpountz , it is unlikely that we can implement a circuit breaker on the memory used by the _analyzer api when generating tokens. But we may be able to implement a soft limit on the number of tokens produced which is directly related to memory usage.