We introduce DeFlow, a framework that assigns the responsibility of manifold representation and policy improvement to distinct components in offline RL.
The codebase is built on FQL, floq. You may look at their repos for installation.
The main implementation of DeFlow is in agents/deflow.py, and the combination of DeFlow and floq can be found in agents/deflowvf.py Here are some example commands (see the section below for the complete list):
# DeFlow on OGBench antmaze-large (offline RL)
python main.py --env_name=antmaze-large-navigate-singletask-task1-v0 --agent.q_agg=min --agent.target_divergence=0.01 --agent.normalize_q_loss=True
# DeFlow on OGBench visual-cube-single (offline RL)
python main.py --env_name=visual-cube-single-play-singletask-task1-v0 --offline_steps=500000 --agent.target_divergence=0.001--agent.encoder=impala_small --p_aug=0.5 --frame_stack=3 --agent.normalize_q_loss=True
# DeFlow on OGBench scene (offline-to-online RL)
python main.py --env_name=scene-play-singletask-v0 --online_steps=1000000 --agent.target_divergence=0.001 --agent.normalize_q_loss=TrueTo fix the muti-step flow at online stage. you can set --agent.fix_bc_flow_online=True.
For the target divergence
Basically, we observe a strong empirical correlation between the Intrinsic Action Variance (IAV) and the optimal constraint
-
Fine Manipulation Tasks (e.g.,
cube-double-play,scene-play):- Estimated IAV:
$\approx 0.01$ - Recommended
$\delta$ :$0.001$ (i.e.,$0.1 \times \text{IAV}$ )
- Estimated IAV:
-
Navigation/Locomotion Tasks (e.g.,
antsoccer,antmaze):- Estimated IAV:
$\approx 0.01$ - Recommended
$\delta$ :$0.01$ (i.e.,$1.0 \times \text{IAV}$ )
- Estimated IAV:
-
D4RL Benchmarks:
- Recommended
$\delta$ :$10^{-3}$ (based on data quality)
- Recommended
We provide the complete list of the exact command-line flags used to produce the main results of DeFlow in the paper.
Click to expand the full list of commands
# DeFlow on OGBench antmaze-large-navigate-singletask-v0 (=antmaze-large-navigate-singletask-task1-v0)
python main.py --env_name=antmaze-large-navigate-singletask-task1-v0 --agent.q_agg=min --agent.target_divergence=0.01 --agent.normalize_q_loss=True
# DeFlow on OGBench antmaze-giant-navigate-singletask-v0 (=antmaze-giant-navigate-singletask-task1-v0)
python main.py --env_name=antmaze-giant-navigate-singletask-task1-v0 --agent.discount=0.995 --agent.q_agg=min --agent.target_divergence=0.01 --agent.normalize_q_loss=True
# DeFlow on OGBench humanoidmaze-medium-navigate-singletask-v0 (=humanoidmaze-medium-navigate-singletask-task1-v0)
python main.py --env_name=humanoidmaze-medium-navigate-singletask-task1-v0 --agent.discount=0.995 --agent.target_divergence=0.001 --agent.normalize_q_loss=True
# DeFlow on OGBench humanoidmaze-large-navigate-singletask-v0 (=humanoidmaze-large-navigate-singletask-task1-v0)
python main.py --env_name=humanoidmaze-large-navigate-singletask-task1-v0 --agent.discount=0.995 --agent.target_divergence=0.001 --agent.normalize_q_loss=True
# DeFlow on OGBench antsoccer-arena-navigate-singletask-v0 (=antsoccer-arena-navigate-singletask-task4-v0)
python main.py --env_name=antsoccer-arena-navigate-singletask-task4-v0 --agent.discount=0.995 --agent.target_divergence=0.01 --agent.normalize_q_loss=True
# DeFlow on OGBench cube-single-play-singletask-v0 (=cube-single-play-singletask-task2-v0)
python main.py --env_name=cube-single-play-singletask-v0 --agent.target_divergence=0.001 --agent.normalize_q_loss=True
# DeFlow on OGBench cube-double-play-singletask-v0 (=cube-double-play-singletask-task2-v0)
python main.py --env_name=cube-double-play-singletask-v0 --agent.target_divergence=0.001 --agent.normalize_q_loss=True
# DeFlow on OGBench scene-play-singletask-v0 (=scene-play-singletask-task2-v0)
python main.py --env_name=scene-play-singletask-v0 --agent.target_divergence=0.001 --agent.normalize_q_loss=True
# DeFlow on OGBench puzzle-3x3-play-singletask-v0 (=puzzle-3x3-play-singletask-task4-v0)
python main.py --env_name=puzzle-3x3-play-singletask-v0 --agent.target_divergence=0.005 --agent.normalize_q_loss=True
# DeFlow on OGBench puzzle-4x4-play-singletask-v0 (=puzzle-4x4-play-singletask-task4-v0)
python main.py --env_name=puzzle-4x4-play-singletask-v0 --agent.target_divergence=0.005 --agent.normalize_q_loss=TrueFor other tasks, just replace the environment name accordingly.
# DeFlow on OGBench visual-cube-single-play-singletask-task1-v0
python main.py --env_name=visual-cube-single-play-singletask-task1-v0 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.001 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3
# DeFlow on OGBench visual-cube-double-play-singletask-task1-v0
python main.py --env_name=visual-cube-double-play-singletask-task1-v0 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.001 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3
# DeFlow on OGBench visual-scene-play-singletask-task1-v0
python main.py --env_name=visual-scene-play-singletask-task1-v0 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.001 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3
# DeFlow on OGBench visual-puzzle-3x3-play-singletask-task1-v0
python main.py --env_name=visual-puzzle-3x3-play-singletask-task1-v0 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.0005 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3
# DeFlow on OGBench visual-puzzle-4x4-play-singletask-task1-v0
python main.py --env_name=visual-puzzle-4x4-play-singletask-task1-v0 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.0005 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3# DeFlow on D4RL antmaze-umaze-v2
python main.py --env_name=antmaze-umaze-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL antmaze-umaze-diverse-v2
python main.py --env_name=antmaze-umaze-diverse-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL antmaze-medium-play-v2
python main.py --env_name=antmaze-medium-play-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL antmaze-medium-diverse-v2
python main.py --env_name=antmaze-medium-diverse-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL antmaze-large-play-v2
python main.py --env_name=antmaze-large-play-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL antmaze-large-diverse-v2
python main.py --env_name=antmaze-large-diverse-v2 --offline_steps=500000 --agent.normalize_q_loss=True --agent.target_divergence=0.015
# DeFlow on D4RL pen-human-v1
python main.py --env_name=pen-human-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.01
# DeFlow on D4RL pen-cloned-v1
python main.py --env_name=pen-cloned-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.01
# DeFlow on D4RL pen-expert-v1
python main.py --env_name=pen-expert-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.01
# DeFlow on D4RL door-human-v1
python main.py --env_name=door-human-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL door-cloned-v1
python main.py --env_name=door-cloned-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL door-expert-v1
python main.py --env_name=door-expert-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL hammer-human-v1
python main.py --env_name=hammer-human-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL hammer-cloned-v1
python main.py --env_name=hammer-cloned-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL hammer-expert-v1
python main.py --env_name=hammer-expert-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL relocate-human-v1
python main.py --env_name=relocate-human-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL relocate-cloned-v1
python main.py --env_name=relocate-cloned-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001
# DeFlow on D4RL relocate-expert-v1
python main.py --env_name=relocate-expert-v1 --offline_steps=500000 --agent.q_agg=min --agent.normalize_q_loss=True --agent.target_divergence=0.001Just add --online_steps=1000000 to the offline RL commands above and change environment names accordingly if needed.
This codebase is built on top of FQL and floq. Thank the authors for their great works.
