### Feature request split head_dim from hidden_size like gemma or mistral ### Motivation make not to align head_dim with hidden_size ### Your contribution slightly revise modeling code and submit PR.