[rllib] Support eager execution in TF2

### Describe the problem

Currently RLlib disables V2 behavior entirely. We should allow eager execution (at least for a subset of algorithms). This will be easier once https://github.com/ray-project/ray/issues/4788 is done.

One possibility is to still use graph mode, but allow eager in the loss function and model with:
https://www.tensorflow.org/api_docs/python/tf/py_function