Load parallel.cpp -f file.txt external prompt file by pudepiedj · Pull Request #3416 · ggml-org/llama.cpp

pudepiedj · 2023-09-30T16:25:45Z

This branch includes amendments to three files in ./llama.cpp/examples necessary to implement the external prompt file option -f file.txt that arises in ./bin/parallel --help. The three affected files are:

common.h: add new params.prompt_file placeholder to gpt_params definition
common.cpp: assign the name of the file in the command-prompt using argv[i]
parallel.cpp: add slice function to segment params.prompt_file; add code to assign the segments to k_prompts and overwrite the default values; display the contents of the external prompts file; add a datetime stamp to the final report including the name of the external file using #include <ctime> etc.

Command-line code (second run example with -ns 128):

% ./build/bin/parallel -m ./models/llama-2-13b/ggml-model-q8_0.gguf -f "ParallelQuestions.txt" -n 512 -t 1 -s 3456 -ngl 100 -c 8192 -np 4 -ns 16 -cb

Example output from two different runs on M2 MAX 32GB and MacOS Sonoma 14.0 (omitting the initialisation):

llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 6400.00 MB
llama_new_context_with_model: compute buffer total size = 691.88 MB
llama_new_context_with_model: max tensor size =   166.02 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size = 13190.58 MB, (13191.20 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  6402.00 MB, (19593.20 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   686.02 MB, (20279.22 / 21845.34)

Now printing the external prompt file ParallelQuestions.txt

 1 prompt: What do you know about Hobbits?
 2 prompt: What is quantum field theory?
 3 prompt: Why did the chicken cross the road?
 4 prompt: Who is the president of the United States?
 5 prompt: How do I run CMake on MacOS?
 6 prompt: Do you agree that C++ is a really finicky language compared with Python3?
 7 prompt: Is it a good idea to invest in technology?
 8 prompt: Do you like Wagner's Ring?
 9 prompt: Do you think this file input option is really neat?
10 prompt: What should we all do about climate change?
11 prompt: Is time-travel possible within the laws of current physics?
12 prompt: Is it like anything to be a bat?
13 prompt: Once the chicken has crossed the road, does it try to go back?
14 prompt: Who is the greatest of all musical composers?
15 prompt: What is art?
16 prompt: Is there life elsewhere in the universe?
17 prompt: What is intelligence?
18 prompt: What is the difference between knowledge and intelligence?
19 prompt: Will religion ever die?
20 prompt: Do we understand ourselves?
21 prompt: What is the best way to cook eggs?
22 prompt: If you cannot see things, on what basis do you evaluate them?
23 prompt: Explain the role of the np junction in photovoltaic cells?
24 prompt: Is professional sport a good or bad influence on human behaviour?
25 prompt: Is capital punishment immoral?
26 prompt: Should we care about other people?
27 prompt: Who are you?
28 prompt: Which sense would you surrender if you could?
29 prompt: Was Henry Ford a hero or a villain?
30 prompt: Do we need leaders?
31 prompt: What is nucleosynthesis?
32 prompt: Who is the greatest scientist of all time so far?


main: Simulating parallel requests from clients:
main: n_parallel = 4, n_sequences = 16, cont_batching = 1, system tokens = 305

main: Evaluating the system prompt ...

Processing requests ...

main: clearing the KV cache
Client   0, seq    0, started decoding ...
Client   1, seq    1, started decoding ...
Client   2, seq    2, started decoding ...
Client   3, seq    3, started decoding ...
Client   2, seq   2/ 16, prompt   15 t, response   63 t, time  9.33 s, speed  8.36 t/s, cache miss 0  
Input:    Is it a good idea to invest in technology?
Response: It depends on your personal circumstances and financial situation. Investing in technology can be a way to diversify your portfolio and potentially generate higher returns than traditional investments like stocks and bonds. However, it is important to do your research and understand the risks before making any investment decisions.

Client   2, seq    4, started decoding ...
Client   3, seq   3/ 16, prompt   14 t, response   65 t, time  9.63 s, speed  8.20 t/s, cache miss 0  
Input:    What should we all do about climate change?
Response: We should all do something about climate change. Everyone can make a difference by reducing their carbon footprint, conserving energy and water, eating less meat, and using public transportation or biking instead of driving whenever possible. We should also support policies that promote renewable energy sources like solar and wind power.

Many lines omitted. This from the end of a run with -ns 128.

Client   0, seq 121/128, prompt   21 t, response  149 t, time 55.81 s, speed  3.05 t/s, cache miss 103  
Input:    Once the chicken has crossed the road, does it try to go back?
Response: It is difficult to say for sure what the chicken's motivations are. Chickens may cross roads for a variety of reasons, such as searching for food or water, escaping predators, or simply exploring their surroundings. If a chicken has crossed a road once, it is possible that it will attempt to go back across the same road in the future if it finds something worth returning to on the other side. However, it is also possible that the chicken may not want to cross the road again, depending on its experiences and motivations at the time. Ultimately, it is hard to know for sure what a particular chicken's thoughts or intentions are when it comes to crossing roads.

main: clearing the KV cache

RUN PARAMETERS as at Sat Sep 30 17:05:27 2023

main: n_parallel = 64, n_sequences = 128, cont_batching = 1, system tokens = 305
external prompt file (if any): ParallelQuestions.txt

Total prompt tokens:   1711, speed: 14.74 t/s
Total gen tokens:      9152, speed: 78.82 t/s
Total speed (AVG):           speed: 93.55 t/s
Cache misses:           103


llama_print_timings:        load time =   647.84 ms
llama_print_timings:      sample time =  6313.92 ms /  9280 runs   (    0.68 ms per token,  1469.77 tokens per second)
llama_print_timings: prompt eval time = 106014.63 ms / 11134 tokens (    9.52 ms per token,   105.02 tokens per second)
llama_print_timings:        eval time =  2807.62 ms /    34 runs   (   82.58 ms per token,    12.11 tokens per second)
llama_print_timings:       total time = 116116.12 ms
ggml_metal_free: deallocating

cmake_all.sh

ParallelQuestions.txt

common/common.h

examples/parallel/parallel.cpp

…edj/llama.cpp into load-parallel-prompt-file

pudepiedj · 2023-10-03T13:29:06Z

Thank you for the review. I have made all the changes. Unfortunately I made a mess of updating this branch out of inexperience with github processes so please ignore what I have pushed and I will do it again today. Is your preference for a new PR after this kind of updating?

…

On Mon, Oct 2, 2023 at 12:17 PM Georgi Gerganov ***@***.***> wrote: ***@***.**** requested changes on this pull request. ------------------------------ On cmake_all.sh <#3416 (comment)>: Not needed ------------------------------ On ParallelQuestions.txt <#3416 (comment)>: Move to prompts change name to parallel-questions.txt ------------------------------ In common/common.h <#3416 (comment)>: > @@ -79,6 +79,7 @@ struct gpt_params { std::string model_draft = ""; // draft model for speculative decoding std::string model_alias = "unknown"; // model alias std::string prompt = ""; + std::string prompt_file = ""; // store for external prompt file name ⬇️ Suggested change - std::string prompt_file = ""; // store for external prompt file name + std::string prompt_file = ""; // store the external prompt file name ------------------------------ In examples/parallel/parallel.cpp <#3416 (comment)>: > @@ -70,6 +72,22 @@ struct client { std::vector<llama_token> tokens_prev; }; +static void printDateTime() { + std::time_t currentTime = std::time(nullptr); + std::cout << "\n\033[35mRUN PARAMETERS as at \033[0m" << std::ctime(&currentTime); We don't use std::cout in this project ------------------------------ In examples/parallel/parallel.cpp <#3416 (comment)>: > @@ -70,6 +72,22 @@ struct client { std::vector<llama_token> tokens_prev; }; +static void printDateTime() { + std::time_t currentTime = std::time(nullptr); + std::cout << "\n\033[35mRUN PARAMETERS as at \033[0m" << std::ctime(&currentTime); +} + +// Define a split string function to ... +static std::vector<std::string> splitString(const std::string& input, char delimiter) { snake_case — Reply to this email directly, view it on GitHub <#3416 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGG22YKKD3B7MW56F3APRSTX5KPFRANCNFSM6AAAAAA5NU6KQE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

cebtenzzre · 2023-10-03T13:31:50Z

Is your preference for a new PR after this kind of updating?

No, please keep the existing PR. You may git rebase -i and force-push to your branch if you would like to clean up the history, but it doesn't really matter because the PR will be squashed into a single commit before merging.

pudepiedj · 2023-10-03T13:57:52Z

Is your preference for a new PR after this kind of updating?

No, please keep the existing PR. You may git rebase -i and force-push to your branch if you would like to clean up the history, but it doesn't really matter because the PR will be squashed into a single commit before merging.

Thank you. All changes to <origin/load-parallel-prompt-file> now pushed to <origin/Update-load-parallel-prompt-file> which I hope I have done correctly this time.

pudepiedj · 2023-10-03T15:01:42Z

It's interesting to use the 100 questions in /examples/jeopardy/questions.txt as the external prompt file to measure the difference in performance between simultaneous and sequential processing.
Without simultaneous processing -np 1 the file takes roughly 233 seconds on an M2 MAX (38 core) and Sonoma 14.0; with -np 64 simultaneous processes and continuous batching -cb it takes about 43 seconds.
The command-line (-c 16384 is slightly too big to run with f16):

% ./build/bin/parallel -m ./models/llama-2-7b/ggml-model-f16.gguf -f "examples/jeopardy/questions.txt" -n 256 -t 1 -ngl 100 -c 8192 -s 1234 -np 64 -ns 100 -cb

Since memory is critical it's worth noting the resources used and how parsimonious the system allocation is with `device.recommendedMaxWorkingSetSize':

ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 557.88 MB
llama_new_context_with_model: max tensor size =   250.00 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size = 12853.73 MB, (12854.36 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  4098.00 MB, (16952.36 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   552.02 MB, (17504.38 / 21845.34)

cebtenzzre · 2023-10-03T17:01:21Z

Thank you. All changes to <origin/load-parallel-prompt-file> now pushed to <origin/Update-load-parallel-prompt-file> which I hope I have done correctly this time.

Could you please push your changes to the load-parallel-prompt-file branch so they appear here?

…Update-load-parallel-prompt-file with requested changes

…/pudepiedj/llama.cpp into Update-load-parallel-prompt-file

pudepiedj · 2023-10-03T17:53:39Z

Thank you. All changes to <origin/load-parallel-prompt-file> now pushed to <origin/Update-load-parallel-prompt-file> which I hope I have done correctly this time.

Could you please push your changes to the load-parallel-prompt-file branch so they appear here?

OK this is what I did. I hope it's right. I've looked at the files in load-parallel-prompt-file and they appear to have been changed correctly. Please let me know if I have done something wrong (again)!

(base) edsilm2@JCPM2 llama.cpp % git remote add llama.cpp https://github.com/pudepiedj/llama.cpp.git
(base) edsilm2@JCPM2 llama.cpp % git fetch llama.cpp
From https://github.com/pudepiedj/llama.cpp
 * [new branch]      Update-load-parallel-prompt-file -> llama.cpp/Update-load-parallel-prompt-file
 * [new branch]      load-parallel-prompt-file        -> llama.cpp/load-parallel-prompt-file
 * [new branch]      master                           -> llama.cpp/master
(base) edsilm2@JCPM2 llama.cpp % git push llama.cpp Update-load-parallel-prompt-file:load-parallel-prompt-file
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/pudepiedj/llama.cpp.git
   ce10861..bf8c4df  Update-load-parallel-prompt-file -> load-parallel-prompt-file

examples/parallel/parallel.cpp

cmake_all.sh

…edj/llama.cpp into load-parallel-prompt-file

examples/parallel/parallel.cpp

…example * 'master' of github.com:ggerganov/llama.cpp: kv cache slot search improvements (ggml-org#3493) prompts : fix editorconfig checks after ggml-org#3416 parallel : add option to load external prompt file (ggml-org#3416) server : reuse llama_sample_token common util (ggml-org#3494) llama : correct hparams comparison (ggml-org#3446) ci : fix xcodebuild destinations (ggml-org#3491) convert : update Falcon script for new HF config (ggml-org#3448) build : use std::make_tuple() for compatibility with older GCC versions (ggml-org#3488) common : process escape sequences in reverse prompts (ggml-org#3461) CLBlast: Fix handling of on-device tensor data server : fix incorrect num_tokens_predicted (ggml-org#3480) swift : disable ACCELERATE_NEW_LAPACK (ggml-org#3481) ci : add swift build via xcodebuild (ggml-org#3482)

pudepiedj · 2023-10-06T17:34:58Z

Thank you. I appreciate your patience!

…

On Fri, Oct 6, 2023 at 2:16 PM Georgi Gerganov ***@***.***> wrote: Merged #3416 <#3416> into master. — Reply to this email directly, view it on GitHub <#3416 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGG22YMNL45H37DQSGKLLV3X6AAEFAVCNFSM6AAAAAA5NU6KQGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQGU3TKMJTGAZDGMQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

* Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

pudepiedj added 4 commits September 30, 2023 16:35

Enable external file and add datestamp

2a5c270

Add name of external file at end

f71068f

Upload ToK2024

0dde56c

Delete ToK2024.txt

9d6533b

pudepiedj changed the title ~~Load parallel prompt file~~ Load parallel.cpp -f file.txt external prompt file Oct 1, 2023

pudepiedj mentioned this pull request Oct 1, 2023

llama : custom attention mask + parallel decoding + no context swaps #3228

Merged

20 tasks

ggerganov requested changes Oct 2, 2023

View reviewed changes

cmake_all.sh Outdated Show resolved Hide resolved

ParallelQuestions.txt Show resolved Hide resolved

common/common.h Outdated Show resolved Hide resolved

examples/parallel/parallel.cpp Outdated Show resolved Hide resolved

examples/parallel/parallel.cpp Outdated Show resolved Hide resolved

pudepiedj added 4 commits October 2, 2023 12:30

Merge branch 'ggerganov:master' into load-parallel-prompt-file

3c2d677

Experiments with jeopardy

3e41cba

Merge branch 'load-parallel-prompt-file' of https://github.com/pudepi…

2fd71e2

…edj/llama.cpp into load-parallel-prompt-file

Move ParallelQuestions to /proimpts and rename

d673691

pudepiedj added 3 commits October 2, 2023 21:14

Merge branch 'ggerganov:master' into load-parallel-prompt-file

e293ebd

Interim commit

51196a4

Interim commit

2e3dad3

Final revision

b343833

pudepiedj added 2 commits October 3, 2023 16:02

Merge branch 'ggerganov:master' into load-parallel-prompt-file

ce10861

Merge branch 'ggerganov:master' into Update-load-parallel-prompt-file

af2fbb8

pudepiedj added 2 commits October 3, 2023 18:12

Merge remote-tracking branch 'origin/load-parallel-prompt-file' into …

fc1ba35

…Update-load-parallel-prompt-file with requested changes

Merge branch 'Update-load-parallel-prompt-file' of https://github.com…

bf8c4df

…/pudepiedj/llama.cpp into Update-load-parallel-prompt-file

cebtenzzre reviewed Oct 3, 2023

View reviewed changes

examples/parallel/parallel.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Oct 3, 2023

View reviewed changes

cmake_all.sh Outdated Show resolved Hide resolved

ggerganov approved these changes Oct 3, 2023

View reviewed changes

Remove trailing whitespace

0286818

pudepiedj added 18 commits October 3, 2023 20:50

remove cmake_all.sh

18b342d

Remove cmake_all.sh

5366375

Merge branch 'ggerganov:master' into load-parallel-prompt-file

bbfec95

Changed .gitignore

2f0181b

Merge branch 'load-parallel-prompt-file' of https://github.com/pudepi…

b805ec2

…edj/llama.cpp into load-parallel-prompt-file

Improved reporting and new question files.

f75fe38

Corrected typo

a02e042

More LLM questions

000c468

Merge branch 'ggerganov:master' into load-parallel-prompt-file

f630096

Update LLM-questions.txt

b505cfb

Merge branch 'load-parallel-prompt-file' of https://github.com/pudepi…

8394762

…edj/llama.cpp into load-parallel-prompt-file

Yet more LLM-questions

e9aa6e9

Remove jeopardy results file

325fcb7

Merge branch 'ggerganov:master' into load-parallel-prompt-file

db44b46

Merge branch 'ggerganov:master' into load-parallel-prompt-file

1c4c8cd

Reinstate original jeopardy.sh

8b7d88a

Merge branch 'load-parallel-prompt-file' of https://github.com/pudepi…

84b43bb

…edj/llama.cpp into load-parallel-prompt-file

Merge branch 'ggerganov:master' into load-parallel-prompt-file

defffb6

ggerganov reviewed Oct 6, 2023

View reviewed changes

examples/parallel/parallel.cpp Outdated Show resolved Hide resolved

Update examples/parallel/parallel.cpp

4bded6e

ggerganov merged commit a8777ad into ggml-org:master Oct 6, 2023

ggerganov added a commit that referenced this pull request Oct 6, 2023

prompts : fix editorconfig checks after #3416

0c731ca

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023

prompts : fix editorconfig checks after ggml-org#3416

86e3ae8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load parallel.cpp -f file.txt external prompt file#3416

Load parallel.cpp -f file.txt external prompt file#3416
ggerganov merged 36 commits intoggml-org:masterfrom
pudepiedj:load-parallel-prompt-file

pudepiedj commented Sep 30, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pudepiedj commented Oct 3, 2023 via email

Uh oh!

cebtenzzre commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

cebtenzzre commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pudepiedj commented Oct 6, 2023 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pudepiedj commented Sep 30, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pudepiedj commented Oct 3, 2023 via email

Uh oh!

cebtenzzre commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

cebtenzzre commented Oct 3, 2023

Uh oh!

pudepiedj commented Oct 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pudepiedj commented Oct 6, 2023 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants