Saving with trainer deepspeed zero3 missing config.json and tokenizer files.

trainer will not save tokenizer and config.json when training in deepspeed-**zero3** with `stage3_gather_16bit_weights_on_model_save=False`.

line 2776 will `raise ValueError`, so line 2778 `self._save` never run to save tokenizer and other stuff. is this expected behavior?

https://github.com/huggingface/transformers/blob/d4bd33cc9f11ca48635e54983d75249c78d72e2a/src/transformers/trainer.py#L2771-L2784

_Originally posted by @zjjMaiMai in https://github.com/huggingface/transformers/issues/24728#issuecomment-1669067573_
            

	elif self.is_deepspeed_enabled:
	# this takes care of everything as long as we aren't under zero3
	if version.parse(accelerate_version) <= version.parse("0.20.3"):
	raise ValueError("Install Accelerate from main branch")
	try:
	state_dict = self.accelerator.get_state_dict(self.deepspeed)
	if self.args.should_save:
	self._save(output_dir, state_dict=state_dict)
	except ValueError:
	logger.warning(
	" stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use"
	" zero_to_fp32.py to recover weights"
	)
	self.model_wrapped.save_checkpoint(output_dir)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving with trainer deepspeed zero3 missing config.json and tokenizer files. #25368

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Saving with trainer deepspeed zero3 missing config.json and tokenizer files. #25368

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions