-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Hi,
I am getting the following error when running pretrain_gpt.sh
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/qfs/people/shar703/scripts/mega_ai/deepspeed_megatron/DeepSpeed/deepspeed']
deepspeed info ................... 0.5.9+1d295ff, 1d295ff, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=1ac4a44 git_branch=main ****
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
using torch.float16 for parameters ...
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. False
adam_beta1 ...................................... 0.9
adam_beta2 ...................................... 0.999
adam_eps ........................................ 1e-08
adlr_autoresume ................................. False
adlr_autoresume_interval ........................ 1000
apply_query_key_layer_scaling ................... True
apply_residual_connection_post_layernorm ........ False
attention_dropout ............................... 0.1
attention_softmax_in_fp32 ....................... False
bert_binary_head ................................ True
bert_load ....................................... None
bf16 ............................................ False
bias_dropout_fusion ............................. True
bias_gelu_fusion ................................ True
biencoder_projection_dim ........................ 0
biencoder_shared_query_context_model ............ False
block_data_path ................................. None
checkpoint_activations .......................... True
checkpoint_in_cpu ............................... False
checkpoint_num_layers ........................... 1
clip_grad ....................................... 1.0
consumed_train_samples .......................... 0
consumed_train_tokens ........................... 0
consumed_valid_samples .......................... 0
contigious_checkpointing ........................ False
cpu_optimizer ................................... False
cpu_torch_adam .................................. False
curriculum_learning ............................. False
data_impl ....................................... infer
data_parallel_size .............................. 1
data_path ....................................... ['cord19/chemistry_cord19_abstract_document']
dataloader_type ................................. single
DDP_impl ........................................ local
decoder_seq_length .............................. None
deepscale ....................................... False
deepscale_config ................................ None
deepspeed ....................................... False
deepspeed_activation_checkpointing .............. False
deepspeed_config ................................ None
deepspeed_mpi ................................... False
distribute_checkpointed_activations ............. False
distributed_backend ............................. nccl
embedding_path .................................. None
encoder_seq_length .............................. 1024
eod_mask_loss ................................... False
eval_interval ................................... 100
eval_iters ...................................... 10
evidence_data_path .............................. None
exit_duration_in_mins ........................... None
exit_interval ................................... None
ffn_hidden_size ................................. 4096
finetune ........................................ False
fp16 ............................................ True
fp16_lm_cross_entropy ........................... False
fp32_residual_connection ........................ False
global_batch_size ............................... 8
hidden_dropout .................................. 0.1
hidden_size ..................................... 1024
hysteresis ...................................... 2
ict_head_size ................................... None
ict_load ........................................ None
img_dim ......................................... 224
indexer_batch_size .............................. 128
indexer_log_interval ............................ 1000
init_method_std ................................. 0.02
init_method_xavier_uniform ...................... False
initial_loss_scale .............................. 4294967296
kv_channels ..................................... 64
layernorm_epsilon ............................... 1e-05
lazy_mpu_init ................................... None
load ............................................ checkpoints/gpt2_345m
local_rank ...................................... None
log_batch_size_to_tensorboard ................... False
log_interval .................................... 10
log_learning_rate_to_tensorboard ................ True
log_loss_scale_to_tensorboard ................... True
log_num_zeros_in_grad ........................... False
log_params_norm ................................. False
log_timers_to_tensorboard ....................... False
log_validation_ppl_to_tensorboard ............... False
loss_scale ...................................... None
loss_scale_window ............................... 1000
lr .............................................. 0.00015
lr_decay_iters .................................. 320000
lr_decay_samples ................................ None
lr_decay_style .................................. cosine
lr_decay_tokens ................................. None
lr_warmup_fraction .............................. 0.01
lr_warmup_iters ................................. 0
lr_warmup_samples ............................... 0
make_vocab_size_divisible_by .................... 128
mask_prob ....................................... 0.15
masked_softmax_fusion ........................... True
max_position_embeddings ......................... 1024
memory_centric_tiled_linear ..................... False
merge_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-merges.txt
micro_batch_size ................................ 4
min_loss_scale .................................. 1.0
min_lr .......................................... 0.0
mmap_warmup ..................................... False
no_load_optim ................................... None
no_load_rng ..................................... None
no_save_optim ................................... None
no_save_rng ..................................... None
num_attention_heads ............................. 16
num_channels .................................... 3
num_classes ..................................... 1000
num_layers ...................................... 24
num_layers_per_virtual_pipeline_stage ........... None
num_workers ..................................... 2
onnx_safe ....................................... None
openai_gelu ..................................... False
optimizer ....................................... adam
override_lr_scheduler ........................... False
params_dtype .................................... torch.float16
partition_activations ........................... False
patch_dim ....................................... 16
pipeline_model_parallel_size .................... 1
profile_backward ................................ False
query_in_block_prob ............................. 0.1
rampup_batch_size ............................... None
rank ............................................ 0
remote_device ................................... none
reset_attention_mask ............................ False
reset_position_ids .............................. False
retriever_report_topk_accuracies ................ []
retriever_score_scaling ......................... False
retriever_seq_length ............................ 256
sample_rate ..................................... 1.0
save ............................................ checkpoints/gpt2_345m
save_interval ................................... 500
scatter_gather_tensors_in_pipeline .............. True
scattered_embeddings ............................ False
seed ............................................ 1234
seq_length ...................................... 1024
sgd_momentum .................................... 0.9
short_seq_prob .................................. 0.1
split ........................................... 969, 30, 1
split_transformers .............................. False
synchronize_each_layer .......................... False
tensor_model_parallel_size ...................... 1
tensorboard_dir ................................. None
tensorboard_log_interval ........................ 1
tensorboard_queue_size .......................... 1000
tile_factor ..................................... 1
titles_data_path ................................ None
tokenizer_type .................................. GPT2BPETokenizer
train_iters ..................................... 500000
train_samples ................................... None
train_tokens .................................... None
use_checkpoint_lr_scheduler ..................... False
use_contiguous_buffers_in_ddp ................... False
use_cpu_initialization .......................... None
use_one_sent_docs ............................... False
use_pin_memory .................................. False
virtual_pipeline_model_parallel_size ............ None
vocab_extra_ids ................................. 0
vocab_file ...................................... ../deepspeed_megatron/gpt_files/gpt2-vocab.json
weight_decay .................................... 0.01
world_size ...................................... 1
zero_allgather_bucket_size ...................... 0.0
zero_contigious_gradients ....................... False
zero_reduce_bucket_size ......................... 0.0
zero_reduce_scatter ............................. False
zero_stage ...................................... 1.0
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 2
building GPT2BPETokenizer tokenizer ...
padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)
initializing torch distributed ...
initializing tensor model parallel with size 1
initializing pipeline model parallel with size 1
setting random seeds to 1234 ...
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
compiling dataset index builder ...
make: Entering directory/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data' make: Nothing to be done fordefault'.
make: Leaving directory `/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/data'done with dataset index builder. Compilation time: 0.051 seconds
compiling and loading fused kernels ...
Traceback (most recent call last):
File "/people/shar703/anaconda3/envs/deepspeed/bin/ninja", line 33, in
sys.exit(load_entry_point('ninja', 'console_scripts', 'ninja')())
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py", line 51, in ninja
raise SystemExit(_program('ninja', sys.argv[1:]))
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/init.py", line 47, in _program
return subprocess.call([os.path.join(BIN_DIR, name)] + args)
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/ninja-1.10.2.3-py3.8-linux-x86_64.egg/ninja/data/bin/ninja'
Traceback (most recent call last):
File "pretrain_gpt.py", line 231, in
pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/training.py", line 96, in pretrain
initialize_megatron(extra_args_provider=extra_args_provider,
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py", line 89, in initialize_megatron
_compile_dependencies()
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/initialize.py", line 137, in _compile_dependencies
fused_kernels.load(args)
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py", line 71, in load
scaled_upper_triang_masked_softmax_cuda = _cpp_extention_load_helper(
File "/qfs/people/shar703/scripts/mega_ai/Megatron-DeepSpeed/megatron/fused_kernels/init.py", line 47, in _cpp_extention_load_helper
return cpp_extension.load(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1292, in _jit_compile
_write_ninja_file_and_build_library(
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1373, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "/people/shar703/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1429, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions