server : support reading arguments from environment variables#9105
Merged
server : support reading arguments from environment variables#9105
Conversation
ggerganov
approved these changes
Aug 21, 2024
4 tasks
arthw
pushed a commit
to arthw/llama.cpp
that referenced
this pull request
Nov 15, 2024
…rg#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var
arthw
pushed a commit
to arthw/llama.cpp
that referenced
this pull request
Nov 18, 2024
…rg#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var
Nexesenex
pushed a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Feb 25, 2025
…rg#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var
Nexesenex
pushed a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Feb 25, 2025
…rg#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var
SamuelOliveirads
pushed a commit
to SamuelOliveirads/llama.cpp
that referenced
this pull request
Dec 29, 2025
server : handle models with missing EOS token (ggml-org#8997) server : fix segfault on long system prompt (ggml-org#8987) * server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment server : init stop and error fields of the result struct (ggml-org#9026) server : fix duplicated n_predict key in the generation_settings (ggml-org#8994) server : support reading arguments from environment variables (ggml-org#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var server : add some missing env variables (ggml-org#9116) * server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING Credits are to the respective authors. Not a single merge conflict occurred. Compiled, then tested without bug.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When deploying to HF inference endpoint, we only have control over the environment variables that can be passed to docker. That's why currently we need to build a custom container and specify these variables via
LLAMACPP_ARGS(ref: #9041)This PR add some server-related arguments to environment variables (see a full list in
server/README.md)Variables are being prefixed
LLAMA_ARG_to distinguish them from compile-time variables likeLLAMA_CURL.Example
In case the same variable is specified in both env and arg, we prioritize env variable:
On HF infrefence endpoint, these variables can be set from "Settings" tab. (In near future, these variable will be exposed as pre-defined input fields in the UI)