Conversation
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by: Leonid Shamis <leonid.shamis@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
…d to the modeling file
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # we need to expand on num_heads because there was not sharding done in 7B model | ||
| # and we need to calculate mean/var over each head_dim | ||
| # for sharded model we don't do expansion and simply do norm |
There was a problem hiding this comment.
you should be able to bake that by updating the alpha and beta for model parallelisme
| # permute key/value to use transformers RoPE implementation (see for more: https://github.com/huggingface/transformers/issues/25199) | ||
| # NOTE: permutation is done same way as in llama conversion script | ||
| key_states = key_states.view(-1, self.num_key_value_heads, self.head_dim // 2, 2).transpose(3, 2) | ||
| query_states = query_states.view(-1, self.num_heads, self.head_dim // 2, 2).transpose(3, 2) |
There was a problem hiding this comment.
IMO we should permute everything in the weights
| def __init__(self, hidden_size, *args, **kwargs): | ||
| super().__init__(hidden_size, *args, **kwargs) | ||
| self.normalized_shape = (hidden_size[-1],) |
There was a problem hiding this comment.
does this mean we are computing over say "head_dim" ?
How different is this from normal nn.layer_norm((head_dim,))
?
There was a problem hiding this comment.
Ah the weights are different (hidden_size) but the applied dim is hidden_size is that it?
There was a problem hiding this comment.
Yes, it's only the weights that are different for 30B model. The 7B has simple repeated weights over all heads
|
|
||
| for token in tokenizer_config["added_tokens"]: | ||
| if token["content"] == "<reserved08707>": | ||
| token["content"] = "<image>" |
There was a problem hiding this comment.
@zucchini-nlp @ArthurZucker We should also set token["special"] = False so that we can decode this token.
What do you guys think?
(It's what I'm currently doing in my PR btw and I haven't encountered any errors yet)
There was a problem hiding this comment.
Hi! Special tokens should be decodable with skip_special_tokens=False
For my understanding, why do we need to decode the image token? Afaik it shouldn't affect image generation because it's a token we added manually to keep track of where to add an image in the text
What does this PR do?
Fixes #31505.
Adds Chameleon, a vision language model from Meta AI.
Project repo: https://github.com/facebookresearch/chameleon
Paper: https://arxiv.org/abs/2405.09818v1