Skip to content

Conversation

@Yejing-Lai
Copy link

  1. When split shape(total_size) < 64, tp sharding will fail. We have added support for split_shape < 64.
  2. falcon-40b will fail on SNC3. This reason is falcon-40b kv_head_names is "num_kv_heads". Need to add this name in mp_param list and kv_head_names list.

@delock delock merged commit 14f5058 into delock:gma/run-opt-branch Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants