Right now, keep_checkpoints_num has multiple actor calls that block the control loop. All of this can be implemented on the worker, having the Trainable keep track of the checkpoint history, and removing checkpoints as needed. The driver should also mirror this by using rsync --delete.