Forward solver and inverse learning in infinite-horizon MDP Games. Learning reward parameters that best explain the given multi-agent demonstrations, generated from ma_gym environments.
Data Generation: ma_gym
Using a policy trained by the algorithm VDN, 100 trajectories from all three players for ma_gym:PredatorPrey5x5-v1 are sampled, processed, and saved in the format of pickle files in directory /processed_data/ma_gym:PredatorPrey5x5-v1/.
Convert data to Julia
julia convert_data_to_julia.jl --env ma_gym:PredatorPrey5x5-v1 --type vdn
After converted to julia JLD2, files are saved in /julia_data/ma_gym:PredatorPrey5x5-v1/ directory.
julia test_game.jl --env ma_gym:PredatorPrey5x5-v1 --type vdn --seed 1
Results will be saved in directory results/ma_gym:PredatorPrey5x5-v1/vdn/.
julia test_decoupled.jl --env ma_gym:PredatorPrey5x5-v1 --type vdn --seed 1
Results will be saved in directory results/ma_gym:PredatorPrey5x5-v1/vdn_decoupled/.