-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
@stas00 @tjruwase fyi
Minimalistic working script for BLOOM-176B:
>>> import torch
>>> torch.load("/net/llm-shared-nfs/data/BLOOM/models--bigscience--bloom-optimizer-states/snapshots/fffeb1434b96997490396f46df742fb0be8f7774/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/bloom/lib/python3.8/site-packages/torch/serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/bloom/lib/python3.8/site-packages/torch/serialization.py", line 1049, in _load
result = unpickler.load()
File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/bloom/lib/python3.8/site-packages/torch/serialization.py", line 1042, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'fragment_address' on <module 'deepspeed.runtime.bf16_optimizer' from '/net/llm-shared-nfs/nfs/mayank/DeepSpeed/deepspeed/runtime/bf16_optimizer.py'