Describe the bug
Hi,
There's a bug in pipeline_utils.py which causes pipeline.from_pretrained to fail if the pipeline was partially downloaded. Specifically the code doesn't handle missing components in the feature_extractor, safety_checker, scheduler and tokenizer folders, at least on Windows platform.
The cause of the bug is this line
allow_patterns += [os.path.join(k, "*") for k in folder_names if k not in model_folder_names]
This turns some folder names into a regexp pattern but on Windows the path joining is done via {parent}\\{child}, which gives a pattern like this
[
'text_encoder/model.safetensors',
'vae/diffusion_pytorch_model.bin',
'vae/diffusion_pytorch_model.safetensors',
'text_encoder/pytorch_model.bin',
'unet/diffusion_pytorch_model.safetensors',
'unet/diffusion_pytorch_model.bin',
'feature_extractor\\*',
'safety_checker\\*',
'scheduler\\*',
'tokenizer\\*'
]
The \\* pattern doesn't play nice with the regexp matching later and causes some files to be incorrectly excluded from the "consider list", after the expected_files = [f for f in expected_files if any(p.match(f) for p in re_allow_pattern)] call, expected_files is only
[
'unet/diffusion_pytorch_model.bin',
'vae/diffusion_pytorch_model.bin',
'text_encoder/pytorch_model.bin',
'model_index.json'
]
And this means if some files are missing in those folders mentioned above, diffusers will not even try to download them and causes loading errors down the road.
To reproduce:
- Use Windows
- Call
StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base") and wait until the pipeline is loaded
- Delete the
feature_extractor folder from the pipeline cache folder C:\Users\<YOU>\.cache\huggingface\hub\models--stabilityai--stable-diffusion-2-1-base\snapshots\<HASH>
- Call
StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base") again and observe the error
Fix:
I've tried changing the line above to below and it seems to fix the bug for me. This should be safe for non-Windows platforms as well as that's how path joining works for them in the first place.
allow_patterns += [f"{k}/*" for k in folder_names if k not in model_folder_names]
Reproduction
See above
Logs
No response
System Info
diffusers 0.16.1
Describe the bug
Hi,
There's a bug in pipeline_utils.py which causes
pipeline.from_pretrainedto fail if the pipeline was partially downloaded. Specifically the code doesn't handle missing components in thefeature_extractor,safety_checker,schedulerandtokenizerfolders, at least on Windows platform.The cause of the bug is this line
This turns some folder names into a regexp pattern but on Windows the path joining is done via
{parent}\\{child}, which gives a pattern like thisThe
\\*pattern doesn't play nice with the regexp matching later and causes some files to be incorrectly excluded from the "consider list", after theexpected_files = [f for f in expected_files if any(p.match(f) for p in re_allow_pattern)]call, expected_files is onlyAnd this means if some files are missing in those folders mentioned above, diffusers will not even try to download them and causes loading errors down the road.
To reproduce:
StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")and wait until the pipeline is loadedfeature_extractorfolder from the pipeline cache folderC:\Users\<YOU>\.cache\huggingface\hub\models--stabilityai--stable-diffusion-2-1-base\snapshots\<HASH>StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")again and observe the errorFix:
I've tried changing the line above to below and it seems to fix the bug for me. This should be safe for non-Windows platforms as well as that's how path joining works for them in the first place.
Reproduction
See above
Logs
No response
System Info
diffusers 0.16.1