Chameleon: add model by zucchini-nlp · Pull Request #31534 · huggingface/transformers

zucchini-nlp · 2024-06-21T09:45:01Z

What does this PR do?

Adds Chameleon, a vision language model from Meta AI.

from transformers import ChameleonForCausalLM, ChameleonProcessor
from PIL import Image
import requests
import torch

model_path = "MODEL_PATH"
model = ChameleonForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map="auto")
processor = ChameleonProcessor.from_pretrained(model_path)

prompt = "I'm very intrigued by this work of art:<image>Please tell me about the artist."
image = Image.open(requests.get("https://uploads4.wikiart.org/images/paul-klee/death-for-the-idea-1915.jpg!Large.jpg", stream=True).raw)

inputs = processor(prompt, images=[image], return_tensors="pt").to(model.device, dtype=torch.bfloat16)
out = model.generate(**inputs, max_new_tokens=40, do_sample=False)
generated_text = processor.batch_decode(out, skip_special_tokens=False)[0]
print(f"Generated text: {generated_text}")

# Multi-image example
prompt = "I used to know a lot about constellations when I was younger, but as I grew older, I forgot most of what I knew. These are the only two constellations that I really remember now.<image><image>I would like for you to tell me about 3 more constellations and give me a little bit of history about the constellation."
image = Image.open(requests.get("https://nineplanets.org/wp-content/uploads/2020/12/the-big-dipper-1.jpg", stream=True).raw)
image_2 = Image.open(requests.get("https://www.kxan.com/wp-content/uploads/sites/40/2020/10/ORION.jpg", stream=True).raw)

inputs = processor(prompt, images=[image, image_2], return_tensors="pt").to(model.device, dtype=torch.bfloat16)
out = model.generate(**inputs, max_new_tokens=200, do_sample=False)
generated_text = processor.batch_decode(out, skip_special_tokens=True)[0]
print(f"Generated text: {generated_text}")
>>>

Project repo: https://github.com/facebookresearch/chameleon
Paper: https://arxiv.org/abs/2405.09818v1

Update

Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…d to the modeling file

HuggingFaceDocBuilderDev · 2024-06-21T11:37:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jsm69 · 2024-06-22T09:18:30Z

@amyeroberts @ArthurZucker

ArthurZucker · 2024-07-11T08:51:33Z

src/transformers/models/chameleon/modeling_chameleon.py

+        # we need to expand on num_heads because there was not sharding done in 7B model
+        # and we need to calculate mean/var over each head_dim
+        # for sharded model we don't do expansion and simply do norm


you should be able to bake that by updating the alpha and beta for model parallelisme

ArthurZucker · 2024-07-11T08:52:27Z

src/transformers/models/chameleon/modeling_chameleon.py

+        # permute key/value to use transformers RoPE implementation (see for more: https://github.com/huggingface/transformers/issues/25199)
+        # NOTE: permutation is done same way as in llama conversion script
+        key_states = key_states.view(-1, self.num_key_value_heads, self.head_dim // 2, 2).transpose(3, 2)
+        query_states = query_states.view(-1, self.num_heads, self.head_dim // 2, 2).transpose(3, 2)


IMO we should permute everything in the weights

ArthurZucker · 2024-07-12T09:01:10Z

src/transformers/models/chameleon/modeling_chameleon.py

+    def __init__(self, hidden_size, *args, **kwargs):
+        super().__init__(hidden_size, *args, **kwargs)
+        self.normalized_shape = (hidden_size[-1],)


does this mean we are computing over say "head_dim" ?
How different is this from normal nn.layer_norm((head_dim,))
?

Ah the weights are different (hidden_size) but the applied dim is hidden_size is that it?

Yes, it's only the weights that are different for 30B model. The 7B has simple repeated weights over all heads

src/transformers/models/chameleon/modeling_chameleon.py

leloykun · 2024-07-14T10:20:41Z

src/transformers/models/chameleon/convert_chameleon_weights_to_hf.py

+
+        for token in tokenizer_config["added_tokens"]:
+            if token["content"] == "<reserved08707>":
+                token["content"] = "<image>"


@zucchini-nlp @ArthurZucker We should also set token["special"] = False so that we can decode this token.

What do you guys think?

(It's what I'm currently doing in my PR btw and I haven't encountered any errors yet)

Hi! Special tokens should be decodable with skip_special_tokens=False

For my understanding, why do we need to decode the image token? Afaik it shouldn't affect image generation because it's a token we added manually to keep track of where to add an image in the text

ArthurZucker

🔥 great work

molbap and others added 24 commits March 4, 2024 15:58

Merge pull request #9 from huggingface/update

e536f6a

Update

Merge branch 'main' of github.com:huggingface/new-model-addition

ef8c0fb

Chameleon model integration

387e5f5

Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>

fix 7B, again. mask away image tokens

cd8f271

Apply suggestions from code review

63b819f

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

remove pretrained_config_map

7bde0ce

make fixup passing up to utils/check_config_docstrings.py; vqgan move…

f7aff2c

…d to the modeling file

remove tokenizer (use llama's); remove codechameleon tests

cfa92be

a few copied from statements and minor changes

9d21d04

copied from in ChameleonModel

84ff660

some copies in ChameleonForCausalLM

91631ae

a few more copies

13390fa

VQModel moved to ChameleonModel (as opposed to being in the processor)

60297d8

ChameleonProcessor ready

61bc3f3

Merge branch 'main' of github.com:huggingface/transformers into main

c899136

Merge branch 'main' of github.com:huggingface/transformers

23b12a3

Merge branch 'main' of github.com:huggingface/new-model-addition

4e4a957

Merge remote-tracking branch 'origin' into dev

23966c3

Fix chameleon weights convert

b2bed85

update conversion script

627ec7a

clean-up processing

ae2f23b

update modeling a bit

74c1454

Merge remote-tracking branch 'upstream/main' into dev

a4a5a0a

update

c94a245

update (throws error...)

cd182dc

jacobkahn mentioned this pull request Jun 21, 2024

Hugging Face links in readme point to 404s facebookresearch/chameleon#3

Closed

ywang96 mentioned this pull request Jun 23, 2024

[New Model]: Chameleon support vllm-project/vllm#5721

Closed

correct conversion ready

371243e

ArthurZucker reviewed Jul 11, 2024

View reviewed changes

zucchini-nlp added 7 commits July 11, 2024 12:03

tests and repo check

c9b392e

moved rope permutations to conversion, YAY!

f636865

fix past kv check

86f9c86

docs

44706af

Merge remote-tracking branch 'upstream/main' into dev

65b1596

layernorm done!

ccc8352

let's be consistent in naming

6e10890

ArthurZucker reviewed Jul 12, 2024

View reviewed changes

leloykun reviewed Jul 14, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Jul 15, 2024

Make special image tokens attribute of tokenizer #31967

Closed

ArthurZucker approved these changes Jul 15, 2024

View reviewed changes

zucchini-nlp closed this Jul 15, 2024

zucchini-nlp reopened this Jul 15, 2024

zucchini-nlp added 2 commits July 16, 2024 06:45

Merge remote-tracking branch 'upstream/main' into dev

fd3844a

fix slow tests

f01dd54

zucchini-nlp mentioned this pull request Jul 16, 2024

Enable interleaved image-image generation w/ Chameleon & Anole #31919

Closed

zucchini-nlp added 5 commits July 16, 2024 09:58

weird thing with slow CI, but let's see

08f0b1f

once more try

80e702d

remove past-kv as tuple following llama

97e9a85

ignore

b78dd3c

style

3e1d22e

zucchini-nlp merged commit 24cfcc2 into main Jul 17, 2024

zucchini-nlp deleted the chameleon branch July 17, 2024 05:41

ywang96 mentioned this pull request Jul 22, 2024

[VLM][Model] Support image input for Chameleon vllm-project/vllm#6633

Merged

Gaiejj mentioned this pull request Jul 23, 2024

feat: support chameleon PKU-Alignment/align-anything#19

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chameleon: add model#31534

Chameleon: add model#31534
zucchini-nlp merged 86 commits intomainfrom
chameleon

zucchini-nlp commented Jun 21, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 21, 2024

Uh oh!

jsm69 commented Jun 22, 2024

Uh oh!

ArthurZucker Jul 11, 2024

Uh oh!

ArthurZucker Jul 11, 2024

Uh oh!

ArthurZucker Jul 12, 2024

Uh oh!

ArthurZucker Jul 12, 2024

Uh oh!

zucchini-nlp Jul 12, 2024

Uh oh!

Uh oh!

Uh oh!

leloykun Jul 14, 2024 •

edited

Loading

Uh oh!

zucchini-nlp Jul 15, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

zucchini-nlp commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 21, 2024

Uh oh!

jsm69 commented Jun 22, 2024

Uh oh!

ArthurZucker Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leloykun Jul 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

zucchini-nlp commented Jun 21, 2024 •

edited

Loading

leloykun Jul 14, 2024 •

edited

Loading