Skip to content

Conversation

@giladgd
Copy link
Member

@giladgd giladgd commented Mar 18, 2024

Description of change

  • feat: read tensor info from gguf files
  • feat: inspect gguf command
  • feat: inspect measure command
  • feat: readGgufFileInfo function
  • feat: GGUF file info on LlamaModel
  • feat: estimate VRAM usage of the model and context with certain options to adapt to current VRAM state and set great defaults for gpuLayers and contextSize. no manual configuration of those options is needed anymore to maximize performance
  • feat: JinjaTemplateChatWrapper
  • feat: use the tokenizer.chat_template header from the gguf file when available - use it to find a better specialized chat wrapper or use JinjaTemplateChatWrapper with it as a fallback
  • feat: improve resolveChatWrapper
  • feat: simplify generation CLI commands: chat, complete, infill
  • feat: read GPU device names
  • feat: get token type
  • refactor: gguf
  • test: separate gguf tests to model dependent and model independent tests
  • test: switch to new vitest test signature
  • fix: use the new llama.cpp CUDA flag
  • fix: improve chat wrappers tokenization
  • fix: bugs

Fixes #133

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run format to apply eslint formatting
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

@giladgd giladgd requested a review from ido-pluto March 18, 2024 19:40
@giladgd giladgd self-assigned this Mar 18, 2024
@giladgd giladgd marked this pull request as draft March 20, 2024 00:54
@giladgd giladgd changed the title test: organize gguf tests feat: read tensor info from gguf files Mar 20, 2024
@giladgd giladgd changed the title feat: read tensor info from gguf files feat: automatically adapt to current free VRAM state Apr 2, 2024
@giladgd giladgd marked this pull request as ready for review April 2, 2024 23:22
Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giladgd giladgd merged commit 35e6f50 into beta Apr 4, 2024
@giladgd giladgd deleted the gilad/bugFixes2 branch April 4, 2024 19:25
@github-actions
Copy link

github-actions bot commented Apr 4, 2024

🎉 This PR is included in version 3.0.0-beta.15 🎉

The release is available on:

Your semantic-release bot 📦🚀

@giladgd giladgd mentioned this pull request Apr 4, 2024
17 tasks
@github-actions
Copy link

github-actions bot commented Sep 24, 2024

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

feat: max GPU layers param

3 participants