added mlp and attn bias option to flash and paged llama models by JRosenkranz · Pull Request #85 · IBM/text-generation-inference

JRosenkranz · 2024-05-06T15:35:59Z

Motivation

[Describe why this change is needed]

The Calico models currently set the mlp and attention bias to true, which was hard-coded to false in flash and paged llama implementations. This will use the config params set in huggingface/transformers#30031 to set those values properly.

Modifications

[Describe the code changes]

added attention_bias, mlp_bias to config for Flash and Paged Llama implementations (default is False)
set bias in attention and mlp to the config value

Result

[Describe how the changes affects existing behavior and how to test it]

Models should be able to load properly if containing attention and mlp bias

Related Issues

NA

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill

Thanks @JRosenkranz @joerunde!

added mlp and attn bias option to flash and paged llama models (#85)

[pull] main from IBM:main

#### Motivation The `Calico` models currently set the mlp and attention bias to true, which was hard-coded to false in flash and paged llama implementations. This will use the config params set in huggingface/transformers#30031 to set those values properly. #### Modifications - added attention_bias, mlp_bias to config for Flash and Paged Llama implementations (default is False) - set bias in attention and mlp to the config value #### Result Models should be able to load properly if containing attention and mlp bias --------- Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>

added mlp and attn bias to flash and paged llama models

7d57879

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz requested review from njhill and tdoublep May 6, 2024 15:36

JRosenkranz self-assigned this May 6, 2024

🐛 add lm_head.weight alias

b9aa214

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill approved these changes May 6, 2024

View reviewed changes

njhill merged commit deb99f6 into main May 6, 2024

njhill deleted the attn_mlp_bias branch May 6, 2024 21:18

openshift-merge-bot bot referenced this pull request in red-hat-data-services/text-generation-inference May 7, 2024

Merge pull request #32 from heyselbi/rhoai-2-8-3

3853574

added mlp and attn bias option to flash and paged llama models (#85)

Xaenalt pushed a commit to Xaenalt/text-generation-inference that referenced this pull request Aug 1, 2024

Merge pull request IBM#85 from IBM/main

43623db

[pull] main from IBM:main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added mlp and attn bias option to flash and paged llama models#85

added mlp and attn bias option to flash and paged llama models#85
njhill merged 2 commits intomainfrom
attn_mlp_bias

JRosenkranz commented May 6, 2024

Uh oh!

njhill left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JRosenkranz commented May 6, 2024

Motivation

Modifications

Result

Related Issues

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants