Describe the problem
Currently RLlib disables V2 behavior entirely. We should allow eager execution (at least for a subset of algorithms). This will be easier once #4788 is done.
One possibility is to still use graph mode, but allow eager in the loss function and model with:
https://www.tensorflow.org/api_docs/python/tf/py_function