new model: IDEFICS via HuggingFaceM4 by stas00 · Pull Request #24796 · huggingface/transformers

stas00 · 2023-07-12T23:01:59Z

**important: The following notes are for my team mates and they won't work for anybody else as the data isn't ready for the public yet. should be made public next week **

Meanwhile to try it out:

$ git clone https://github.com/huggingface/transformers -b add-model-idefics
$ cd transformers

$ cat generate.py
import torch
from transformers import IdeficsForVisionText2Text, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = "HuggingFaceM4/idefics-9b"
#checkpoint = "HuggingFaceM4/tiny-random-idefics"

model = IdeficsForVisionText2Text.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device)
processor = AutoProcessor.from_pretrained(checkpoint)

prompts = [
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image.",
        
        "Assistant: An image of two kittens in grass.",
        
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image.",
        
        "Assistant:",
    ],
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image.",
        
        "Assistant: An image of a dog wearing funny glasses.",

        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image.",

        "Assistant:",
    ],
]

# batched mode
inputs = processor(prompts, return_tensors="pt").to(device)
# single sample mode
#inputs = processor(prompts[0], return_tensors="pt").to(device)

generated_ids = model.generate(**inputs, max_length=100)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
for i,t in enumerate(generated_text):
    print(f"{i}:\n{t}\n")

and then run:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=src python generate.py

Demos

A PR with examples/demos, including finetuning, is here: huggingface/notebooks#418

TODOs before merging

make the models public - which coincides with the announcement/release

HuggingFaceDocBuilderDev · 2023-07-12T23:20:30Z

The documentation is not available anymore as the PR was closed or merged.

flozi00 · 2023-07-13T10:01:00Z

Is it possible to be a private repo ? ;-) The m4 repo from huggingface organisation does not exist

stas00 · 2023-07-13T15:34:36Z

Thank you for your interest, @flozi00 - please give us some time. It says WIP because it's not ready for a public consumption. I edited the OP to clarify that.

src/transformers/image_processing_utils.py

src/transformers/models/idefics/processing_idefics.py

stas00 · 2023-07-31T23:52:35Z

Thank you, @sgugger, @HugoLaurencon and @leot13 for your reviews - I have addressed everything you have raised.

sgugger

Thanks a lot for all your work on this @stas00 !

src/transformers/models/idefics/configuration_idefics.py

src/transformers/models/idefics/processing_idefics.py

amyeroberts · 2023-08-01T09:47:34Z

src/transformers/models/idefics/image_processing_idefics.py

+
+    def __init__(
+        self,
+        image_size: int = 224,


Because of the ambiguity of how size is handled in torchvision transforms (and reflected in our feature extractors), the image size parameter for image processors is a dictionary size, which contains one of:

{"height": h, "width": w}

{"shortest_edge": x}

{"shortest_edge": x, "longest_edge": y}

e.g. like here for PVT or here for CLIP.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

VictorSanh · 2023-08-08T20:01:01Z

prompts = [
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image."
        
        "Assistant: An image of two kittens in grass.",
        
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image".
        
        "Assistant:",
    ],
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image."
        
        "Assistant: An image of a dog wearing funny glasses.",

        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image".

        "Assistant:",
    ],
]

For posterity, that part of the OP (i can't edit unfortunately) is missing some "," (commas) at some end of string (for instance "Describe this image". -> "Describe this image",). this is important for the tokenization in particular when we call processor with add_end_of_utterance_token=True.

sgugger · 2023-08-09T06:15:12Z

I can edit if need be. You should also be able to push commits to this branch, since it's in the main fork and you have write permissions @VictorSanh :-)

src/transformers/image_processing_utils.py

gante

Nit: we have been avoiding using the variable name past in generate-related code, preferring the clearer past_key_values instead.

I've added suggested changes in the related lines :)

src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

stas00 · 2023-08-18T14:35:27Z

Thanks a lot, @gante, for the suggestions - merged

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

…25442) * add image_embeddings option in generate-related methods * style * rename image_embeddings and allow perceiver embeddings precomputation * compute embeddings within generate * make is_encoder_decoder= True the default in config * nested if else fix * better triple check * switch if elif order for pixel values / img embeds * update model_kwargs perceiver only at the end * use _prepare_model_inputs instead of encoder_decoder logic * fix comment typo * fix config default for is_encoder_decoder * style * add typehints * precompute in forward * doc builder * style * pop instead of get image hidden states * Trigger CI * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * + indentation + style * simplify a bit the use_resampler logic using comments * update diocstrings * Trigger CI --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

sgugger

Thanks again for all the work on this!

* rename * restore * mappings * unedited tests+docs * docs * fixes * fix auto-sync breakage * cleanup * wip * wip * add fetch_images * remove einops dependency * update * fix * fix * fix * fix * fix * re-add * add batching * rework * fix * improve * add Leo as I am extending his work * cleanup * fix * cleanup * slow-test * fix * fix * fixes * deal with warning * rename modified llama classes * rework fetch_images * alternative implementation * cleanup * strict version * cleanup * [`IDEFICS`] Fix idefics ci (#25056) * Fix IDEFICS CI * fix test file * fixup * some changes to make tests pass * fix * fixup * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * remove compat checks * style * explain that Idefics is not for training from scratch * require pt>=2.0 * fix idefics vision config (#25092) * fix idefics vision config * fixup * clean * Update src/transformers/models/idefics/configuration_idefics.py --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * cleanup * style * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * upcase * sequence of images * handle the case with no images * Update src/transformers/image_processing_utils.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * support pure lm take 2 * support tokenizer options * parameterize num_channels * fix upcase * s|IdeficsForCausalLM|IdeficsForVisionText2Text|g * manual to one line * addressing review * unbreak * remove clip dependency * fix test * consistency * PIL import * Idefics prefix * Idefics prefix * hack to make tests work * style * fix * fix * revert * try/finally * cleanup * clean up * move * [`IDEFICS`] Fix idefics config refactor (#25149) * refactor config * nuke init weights * more refactor * oops * remove visual question answering pipeline support * Update src/transformers/models/idefics/clip.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py * cleanup * mv clip.py vision.py * tidyup --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> * fix * license * condition on pt * fix * style * fix * rm torchvision dependency, allow custom transforms * address review * rework device arg * add_eos_token * s/transforms/transform/ * fix top level imports * fix return value * cleanup * cleanup * fix * style * license * license * Update src/transformers/models/idefics/image_processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add a wrapper to freeze vision layears * tidyup * use the correct std/mean settings * parameterize values from config * add tests/models/idefics/test_image_processing_idefics.py * add test_processor_idefics.py * cleanup * cleanups * fix * fix * move to the right group * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add perceiver config * reset * missing arg docs * Apply suggestions from code review Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * address review comments * inject automatic end of utterance tokens (#25218) * inject automatic end of utterance tokens * fix * fix * fix * rework to not use the config * not end_of_utterance_token at the end * Update src/transformers/models/idefics/processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address review * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/image_processing_utils.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * [`Idefics`] add image_embeddings option in generate-related methods (#25442) * add image_embeddings option in generate-related methods * style * rename image_embeddings and allow perceiver embeddings precomputation * compute embeddings within generate * make is_encoder_decoder= True the default in config * nested if else fix * better triple check * switch if elif order for pixel values / img embeds * update model_kwargs perceiver only at the end * use _prepare_model_inputs instead of encoder_decoder logic * fix comment typo * fix config default for is_encoder_decoder * style * add typehints * precompute in forward * doc builder * style * pop instead of get image hidden states * Trigger CI * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * + indentation + style * simplify a bit the use_resampler logic using comments * update diocstrings * Trigger CI --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix rebase changes * unbreak #25237 - to be fixed in follow up PRs * is_composition = False * no longer needed --------- Co-authored-by: leot13 <leo.tronchon@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename * restore * mappings * unedited tests+docs * docs * fixes * fix auto-sync breakage * cleanup * wip * wip * add fetch_images * remove einops dependency * update * fix * fix * fix * fix * fix * re-add * add batching * rework * fix * improve * add Leo as I am extending his work * cleanup * fix * cleanup * slow-test * fix * fix * fixes * deal with warning * rename modified llama classes * rework fetch_images * alternative implementation * cleanup * strict version * cleanup * [`IDEFICS`] Fix idefics ci (huggingface#25056) * Fix IDEFICS CI * fix test file * fixup * some changes to make tests pass * fix * fixup * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * remove compat checks * style * explain that Idefics is not for training from scratch * require pt>=2.0 * fix idefics vision config (huggingface#25092) * fix idefics vision config * fixup * clean * Update src/transformers/models/idefics/configuration_idefics.py --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * cleanup * style * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * upcase * sequence of images * handle the case with no images * Update src/transformers/image_processing_utils.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * support pure lm take 2 * support tokenizer options * parameterize num_channels * fix upcase * s|IdeficsForCausalLM|IdeficsForVisionText2Text|g * manual to one line * addressing review * unbreak * remove clip dependency * fix test * consistency * PIL import * Idefics prefix * Idefics prefix * hack to make tests work * style * fix * fix * revert * try/finally * cleanup * clean up * move * [`IDEFICS`] Fix idefics config refactor (huggingface#25149) * refactor config * nuke init weights * more refactor * oops * remove visual question answering pipeline support * Update src/transformers/models/idefics/clip.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py * cleanup * mv clip.py vision.py * tidyup --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> * fix * license * condition on pt * fix * style * fix * rm torchvision dependency, allow custom transforms * address review * rework device arg * add_eos_token * s/transforms/transform/ * fix top level imports * fix return value * cleanup * cleanup * fix * style * license * license * Update src/transformers/models/idefics/image_processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add a wrapper to freeze vision layears * tidyup * use the correct std/mean settings * parameterize values from config * add tests/models/idefics/test_image_processing_idefics.py * add test_processor_idefics.py * cleanup * cleanups * fix * fix * move to the right group * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add perceiver config * reset * missing arg docs * Apply suggestions from code review Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * address review comments * inject automatic end of utterance tokens (huggingface#25218) * inject automatic end of utterance tokens * fix * fix * fix * rework to not use the config * not end_of_utterance_token at the end * Update src/transformers/models/idefics/processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address review * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/image_processing_utils.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * [`Idefics`] add image_embeddings option in generate-related methods (huggingface#25442) * add image_embeddings option in generate-related methods * style * rename image_embeddings and allow perceiver embeddings precomputation * compute embeddings within generate * make is_encoder_decoder= True the default in config * nested if else fix * better triple check * switch if elif order for pixel values / img embeds * update model_kwargs perceiver only at the end * use _prepare_model_inputs instead of encoder_decoder logic * fix comment typo * fix config default for is_encoder_decoder * style * add typehints * precompute in forward * doc builder * style * pop instead of get image hidden states * Trigger CI * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * + indentation + style * simplify a bit the use_resampler logic using comments * update diocstrings * Trigger CI --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix rebase changes * unbreak huggingface#25237 - to be fixed in follow up PRs * is_composition = False * no longer needed --------- Co-authored-by: leot13 <leo.tronchon@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

rename

c0fee5f

stas00 changed the title ~~[WIP] new model: IDEFIX via HuggingFaceM4~~ [WIP] new model: IDEFICS via HuggingFaceM4 Jul 12, 2023

stas00 force-pushed the add-model-idefics branch from 289a799 to c0fee5f Compare July 19, 2023 22:09

stas00 added 11 commits July 19, 2023 15:11

restore

535c696

Merge remote-tracking branch 'origin/main' into add-model-idefics

22d369a

mappings

3041404

unedited tests+docs

702831f

docs

728154e

fixes

2f70508

fix auto-sync breakage

936fd87

cleanup

acdd4e6

wip

9b33761

wip

a9fc12b

add fetch_images

b07e5ad

stas00 commented Jul 20, 2023

View reviewed changes

src/transformers/image_processing_utils.py Show resolved Hide resolved

stas00 commented Jul 20, 2023

View reviewed changes

src/transformers/models/idefics/processing_idefics.py Outdated Show resolved Hide resolved

stas00 added 11 commits July 20, 2023 11:41

remove einops dependency

9bdbae4

update

4ad1102

fix

64d1c6d

fix

569f10c

fix

78fab55

fix

ede00bd

fix

855b003

Merge remote-tracking branch 'origin/main' into add-model-idefics

c1953d5

re-add

d9b2dd1

add batching

17e9c81

rework

5db6061

sgugger approved these changes Aug 1, 2023

View reviewed changes

src/transformers/models/idefics/configuration_idefics.py Outdated Show resolved Hide resolved

src/transformers/models/idefics/processing_idefics.py Outdated Show resolved Hide resolved

amyeroberts mentioned this pull request Aug 1, 2023

Move rescale dtype recasting to match torchvision ToTensor #25229

Merged

5 tasks

amyeroberts reviewed Aug 1, 2023

View reviewed changes

stas00 and others added 3 commits August 1, 2023 07:09

Merge remote-tracking branch 'origin/main' into add-model-idefics

b597998

Update src/transformers/models/idefics/processing_idefics.py

76f7fdf

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

address review

b3a8a7c

fxmarty mentioned this pull request Aug 3, 2023

[Docs / BetterTransformer ] Added more details about flash attention + SDPA #25265

Merged

Narsil reviewed Aug 16, 2023

View reviewed changes

src/transformers/image_processing_utils.py Outdated Show resolved Hide resolved

gante reviewed Aug 16, 2023

View reviewed changes

gante mentioned this pull request Aug 16, 2023

Adding Idefics multi modal model. huggingface/text-generation-inference#842

Merged

5 tasks

Apply suggestions from code review

ae26401

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

stas00 and others added 5 commits August 18, 2023 07:36

Update src/transformers/image_processing_utils.py

58a2b2e

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

Merge remote-tracking branch 'origin/main' into add-model-idefics

35e28a8

fix rebase changes

49dd882

unbreak #25237 - to be fixed in follow up PRs

906485b

This was referenced Aug 18, 2023

25237 needs a follow up work #25597

Closed

Deal with nested configs better in base class #25237

Merged

stas00 added 2 commits August 18, 2023 10:45

is_composition = False

2e37e6c

no longer needed

1e416d9

sgugger approved these changes Aug 18, 2023

View reviewed changes

stas00 merged commit 6c811a3 into main Aug 18, 2023

stas00 deleted the add-model-idefics branch August 18, 2023 21:12

Conversation

stas00 commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Demos

TODOs before merging

Uh oh!

HuggingFaceDocBuilderDev commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flozi00 commented Jul 13, 2023

Uh oh!

stas00 commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stas00 commented Jul 31, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amyeroberts Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VictorSanh commented Aug 8, 2023

Uh oh!

sgugger commented Aug 9, 2023

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stas00 commented Aug 18, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

stas00 commented Jul 12, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 12, 2023 •

edited

Loading

stas00 commented Jul 13, 2023 •

edited

Loading

amyeroberts Aug 1, 2023 •

edited

Loading