server : support reading arguments from environment variables by ngxson · Pull Request #9105 · ggml-org/llama.cpp

ngxson · 2024-08-20T14:38:49Z

Motivation

When deploying to HF inference endpoint, we only have control over the environment variables that can be passed to docker. That's why currently we need to build a custom container and specify these variables via LLAMACPP_ARGS (ref: #9041)

This PR add some server-related arguments to environment variables (see a full list in server/README.md)

Variables are being prefixed LLAMA_ARG_ to distinguish them from compile-time variables like LLAMA_CURL.

Example

LLAMA_ARG_MODEL=../models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf LLAMA_ARG_CTX_SIZE=1024 LLAMA_ARG_N_PARALLEL=2 LLAMA_ARG_ENDPOINT_METRICS=1 ./llama-server

In case the same variable is specified in both env and arg, we prioritize env variable:

LLAMA_ARG_MODEL=my_model.gguf ./llama-server -m another_model.gguf
# Expected behavior: we load my_model.gguf
# (in other words, "-m another_model.gguf" is ignored)

On HF infrefence endpoint, these variables can be set from "Settings" tab. (In near future, these variable will be exposed as pre-defined input fields in the UI)

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…rg#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var

server : handle models with missing EOS token (ggml-org#8997) server : fix segfault on long system prompt (ggml-org#8987) * server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment server : init stop and error fields of the result struct (ggml-org#9026) server : fix duplicated n_predict key in the generation_settings (ggml-org#8994) server : support reading arguments from environment variables (ggml-org#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var server : add some missing env variables (ggml-org#9116) * server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING Credits are to the respective authors. Not a single merge conflict occurred. Compiled, then tested without bug.

server : support reading arguments from environment variables

2746e35

ngxson requested a review from ggerganov August 20, 2024 14:38

github-actions bot added examples server labels Aug 20, 2024

ngxson added 2 commits August 20, 2024 16:44

add -fa and -dt

a857c21

readme : specify non-arg env var

7937057

ggerganov approved these changes Aug 21, 2024

View reviewed changes

ngxson merged commit fc54ef0 into master Aug 21, 2024

ngxson mentioned this pull request Aug 21, 2024

server : add some missing env variables #9116

Merged

4 tasks

ngxson mentioned this pull request Sep 5, 2024

common : refactor arg parser #9308

Merged

7 tasks

ngxson deleted the xsn/server_env_var branch September 10, 2024 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : support reading arguments from environment variables#9105

server : support reading arguments from environment variables#9105
ngxson merged 3 commits intomasterfrom
xsn/server_env_var

ngxson commented Aug 20, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Aug 20, 2024 •

edited

Loading