About a cluster of TensorFlow servers, and how to distribute a computation graph across that cluster
- Python 3.5.2
- TensorFlow >= 1.4.0 (tf.data.FixedLengthRecordDataset)
- horovod
python cifar10_download_and_extract.py
srun -n 4 --mpi=pmi2 --partition=k80 --gres=gpu:4 python cifar10_main.py --data_dir=data/cifar10_data