Skip to content

llama-cpp-python support#70

Closed
Maximilian-Winter wants to merge 45 commits intoguidance-ai:mainfrom
Maximilian-Winter:main
Closed

llama-cpp-python support#70
Maximilian-Winter wants to merge 45 commits intoguidance-ai:mainfrom
Maximilian-Winter:main

Conversation

@Maximilian-Winter
Copy link
Copy Markdown

I have added llama-cpp-python support. I also created a example notebook on how to use it!

@Maximilian-Winter
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@Maximilian-Winter Maximilian-Winter changed the title I have added llama-cpp-python support. llama-cpp-python support May 20, 2023
@alxspiker
Copy link
Copy Markdown

Thank you!

@Maximilian-Winter
Copy link
Copy Markdown
Author

@alxspiker I found a couple of problems with my implementation and are fixing them right now!

@alxspiker
Copy link
Copy Markdown

Anyway to support mmap? Seems like its not.

@alxspiker
Copy link
Copy Markdown

llama_print_timings:        load time =  4772.70 ms
llama_print_timings:      sample time =     3.01 ms /     1 runs   (    3.01 ms per run)
llama_print_timings: prompt eval time = 11246.46 ms /    23 tokens (  488.98 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 12235.80 ms
Traceback (most recent call last):
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 428, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\_program_executor.py", line 395, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 158, in select
    option_logprobs = await recursive_select("")
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 131, in recursive_select
    sub_logprobs = await recursive_select(rec_prefix, allow_token_extension=False)
  [Previous line repeated 477 more times]
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\library\_select.py", line 107, in recursive_select
    gen_obj = await parser.llm_session(
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llama_cpp.py", line 244, in __call__
    key = self._cache_key(locals())
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 76, in _cache_key
    key = self._gen_key(args_dict)
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in _gen_key
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
  File "C:\Users\Haley The Retard\Documents\GitHub\AI-X\guidance\llms\_llm.py", line 69, in <listcomp>
    return "_---_".join([str(v) for v in ([args_dict[k] for k in var_names] + [self.llm.model_name, self.llm.__class__.__name__, self.llm.cache_version])])
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Error in program:  maximum recursion depth exceeded while getting the repr of an object

@Maximilian-Winter
Copy link
Copy Markdown
Author

@alxspiker I have fixed all errors on my side but couldn't reproduce your error, but I added nmap to the settings!

@Maximilian-Winter
Copy link
Copy Markdown
Author

@alxspiker At the moment you have to use my fork of llama-cpp-python to use guidance.
You will find the fork here:
https://github.com/Maximilian-Winter/llama-cpp-python

@Mihaiii
Copy link
Copy Markdown
Contributor

Mihaiii commented May 20, 2023

Related PR in llama-cpp-python: abetlen/llama-cpp-python#252

It would be awesome to use guidance with llama.cpp! I'm excited :)

@slundberg
Copy link
Copy Markdown
Collaborator

@Maximilian-Winter this is great, thanks! It will probably be Monday before I can review it properly. Are the there any basic units tests we can add for this? (with small LMs that don't slow down the test process too much) ...might not be possible with LLaMA, but even a file with test that only run locally would be good so we can make sure this stays working :)

@slundberg
Copy link
Copy Markdown
Collaborator

(I also just approved the unit tests to run for this)

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg I have added a test file in the tests/llms folder called "test_llamacpp.py".
I used the test_transformers file as a template.

@DanielusG
Copy link
Copy Markdown

After many attempts I could not get the role chat to work

I've use this code:

import re
import guidance

# define the model we will use

settings = guidance.llms.LlamaCppSettings()
settings.n_gpu_layers = 14
settings.n_threads = 16
settings.n_ctx = 1024
settings.use_mlock = True
settings.model = "path/to/model"
# Create a LlamaCpp instance and pass the settings to it.
llama = guidance.llms.LlamaCpp(settings=settings)
guidance.llm = llama
def parse_best(prosandcons, options):
    best = int(re.findall(r'Best=(\d+)', prosandcons)[0])
    return options[best]

create_plan = guidance('''
{{#system~}}
You are a helpful assistant.
{{~/system}}

{{! generate five potential ways to accomplish a goal }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.
{{~! generate potential options ~}}
Can you please generate one option for how to accomplish this?
Please make the option very short, at most one line.
{{~/user}}

{{#assistant~}}
{{gen 'options' n=5 temperature=1.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate pros and cons for each option and select the best option }}
{{#block hidden=True}}
{{#user~}}
I want to {{goal}}.

Can you please comment on the pros and cons of each of the following options, and then pick the best option?
---{{#each options}}
Option {{@index}}: {{this}}{{/each}}
---
Please discuss each option very briefly (one line for pros, one for cons), and end by saying Best=X, where X is the best option.
{{~/user}}

{{#assistant~}}
{{gen 'prosandcons' temperature=0.0 max_tokens=500}}
{{~/assistant}}
{{/block}}

{{! generate a plan to accomplish the chosen option }}
{{#user~}}
I want to {{goal}}.
{{~! Create a plan }}
Here is my plan:
{{parse_best prosandcons options}}
Please elaborate on this plan, and tell me how to best accomplish it.
{{~/user}}

{{#assistant~}}
{{gen 'plan' max_tokens=500}}
{{~/assistant}}''')
out = create_plan(
    goal='read more books',
    parse_best=parse_best # a custom python function we call in the program
)

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg I have implemented proper role_end again, also implemented streaming support.

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg I think the best way would be to test just locally. The smallest model right now is a 7B parameter model which is already 3.8gb of memory.

@slundberg
Copy link
Copy Markdown
Collaborator

Just a note here, I was still getting some tokenization issues and realized it is going to be hard to maintain so much code that is similar between transformers and llamacpp, so I am going to try and push a proposal to share more code tonight.

@slundberg
Copy link
Copy Markdown
Collaborator

I pushed a proposal in the form of LlamaCpp2, along with lots of updates to Transformers that are related because we will want to depend on them. I think we need to inherit from the Transformers LLM class because otherwise we duplicate lots of code that is tricky and should only live in one place :)

LlamaCpp2 does not work fully yet, but I am pushing to to see what you think @Maximilian-Winter.

thanks again for all the hard work pushing on this :)

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg Will take a look later today

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

@slundberg
Copy link
Copy Markdown
Collaborator

slundberg commented May 31, 2023

@slundberg Looks good to me! But I think the tokenizer of llama.cpp is bugged because it refuse to give me the eos or bos token.
It is always empty when I try to decode it from the id!

Yeah, I just think we can just return </s> directly for now. I just pushed a few more fixes. Can I hand this back over to you to wrap up? There is some difference with the way the logprobs are returned that is not quite matching how transformers returns it yet, but otherwise I think we are close!

I also noticed that logit bias processor inside llama-cpp-python seems to save the bias values after the local logits variable is already set:
https://github.com/abetlen/llama-cpp-python/blob/232880cbc677db1998afa240c25e58090f399072/llama_cpp/llama.py#L373-L383

@Maximilian-Winter
Copy link
Copy Markdown
Author

@slundberg I will try to make it work later!

return self.model_obj.detokenize(tokens).decode("utf-8", errors="ignore") # errors="ignore" is copied from llama-cpp-python

def convert_ids_to_tokens(self, ids):
return [self.decode([id]) for id in ids]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return [self.decode(ids)]?

@Jchang4
Copy link
Copy Markdown

Jchang4 commented Jun 15, 2023

Please merge this microsoft or at least help support it. This would be HUGE for guidance

@vmajor
Copy link
Copy Markdown

vmajor commented Jun 16, 2023

Is there progress with this? Oh I see there is already a message. Yes, there are a few of us spamming refresh on this...

@Maximilian-Winter
Copy link
Copy Markdown
Author

Sorry, was very busy with other stuff at work! Will look into this!

@kongjiellx
Copy link
Copy Markdown

Any progress?

@Blueoctopusinc
Copy link
Copy Markdown

Any updates on this?

@charles-dyfis-net
Copy link
Copy Markdown

charles-dyfis-net commented Jul 28, 2023

Hmm. Looks like there's a conflict with 47b1cd4. Trying to use a519012, a merge which brings the former commit into the PR...

>>> import guidance
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/__init__.py", line 7, in <module>
    from ._program import Program
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/_program.py", line 17, in <module>
    from .llms import _openai
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/__init__.py", line 7, in <module>
    from ._llama_cpp import LlamaCpp
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 17, in <module>
    class LlamaCpp(LLM):
  File "/nix/store/0j6zxl1kvqkvhcw0i2chdqxx22xsi5sf-python3-3.10.11-env/lib/python3.10/site-packages/guidance/llms/_llama_cpp.py", line 21, in LlamaCpp
    cache = LLM._open_cache("_llama_cpp.diskcache")
AttributeError: type object 'LLM' has no attribute '_open_cache'

@Jchang4
Copy link
Copy Markdown

Jchang4 commented Jul 29, 2023

yeah this needs to be updated. I've tried forking Max's and git pulling Microsoft's main branch, but there have been a lot of changes since June so lots of things need tweaking.

@talhalatifkhan
Copy link
Copy Markdown

Any updates on this?

@nielsrolf
Copy link
Copy Markdown

Any plans on merging this at some point?

@freckletonj
Copy link
Copy Markdown

guidance's templating is miles more friendly to use than lmql.

But... guidance, are you still alive?

@akashAD98
Copy link
Copy Markdown

any update on this ???

@slundberg
Copy link
Copy Markdown
Collaborator

@Maximilian-Winter thank you so much for all your hard work on this! Due to some external circumstances over the summer I couldn't come back to push it over the finish line until this fall (with v0.1). This PR strongly informed the design decisions we made for Llama.cpp support in v0.1 though so it was very useful.

I am closing this now since we now have full llama.cpp support in v0.1 :)

@slundberg slundberg closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.