Facebear's RL implementation:\
Offline_RL:
SBAC(soft behavior regularized actor critic)
TD3+BC(A Minimalist approach to Offline RL)
BCQ(Batch-constrained Q learning)
BEAR(Bootstrapping Error Accumulation Reduction)\
Online_RL:
PPO(Proximal policy optimization)
TD3(Twin delayed deep deterministic policy gradient)
SAC(Soft Actor Critic)\