Skip to content

Commit 45f0571

Browse files
authored
feat: Upgrade to LlamaIndex to 0.10 (#1663)
* Extract optional dependencies * Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity * Support Ollama embeddings * Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine * Fix vector retriever filters
1 parent 12f3a39 commit 45f0571

43 files changed

Lines changed: 1470 additions & 1392 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/actions/install_dependencies/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@ runs:
2525
python-version: ${{ inputs.python_version }}
2626
cache: "poetry"
2727
- name: Install Dependencies
28-
run: poetry install --with ui --no-root
28+
run: poetry install --extras "ui vector-stores-qdrant" --no-root
2929
shell: bash
3030

Dockerfile.external

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ FROM base as dependencies
1414
WORKDIR /home/worker/app
1515
COPY pyproject.toml poetry.lock ./
1616

17-
RUN poetry install --with ui
17+
RUN poetry install --extras "ui vector-stores-qdrant"
1818

1919
FROM base as app
2020

Dockerfile.local

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ FROM base as dependencies
2424
WORKDIR /home/worker/app
2525
COPY pyproject.toml poetry.lock ./
2626

27-
RUN poetry install --with local
28-
RUN poetry install --with ui
27+
RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
2928

3029
FROM base as app
3130

fern/docs.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,15 @@ navigation:
3030
layout:
3131
- section: Welcome
3232
contents:
33-
- page: Welcome
33+
- page: Introduction
3434
path: ./docs/pages/overview/welcome.mdx
35-
- page: Quickstart
36-
path: ./docs/pages/overview/quickstart.mdx
3735
# How to install privateGPT, with FAQ and troubleshooting
3836
- tab: installation
3937
layout:
4038
- section: Getting started
4139
contents:
40+
- page: Main Concepts
41+
path: ./docs/pages/installation/concepts.mdx
4242
- page: Installation
4343
path: ./docs/pages/installation/installation.mdx
4444
# Manual of privateGPT: how to use it and configure it
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
PrivateGPT is a service that wraps a set of AI RAG primitives in a comprehensive set of APIs providing a private, secure, customizable and easy to use GenAI development framework.
2+
3+
It uses FastAPI and LLamaIndex as its core frameworks. Those can be customized by changing the codebase itself.
4+
5+
It supports a variety of LLM providers, embeddings providers, and vector stores, both local and remote. Those can be easily changed without changing the codebase.
6+
7+
# Different Setups support
8+
9+
## Setup configurations available
10+
You get to decide the setup for these 3 main components:
11+
- LLM: the large language model provider used for inference. It can be local, or remote, or even OpenAI.
12+
- Embeddings: the embeddings provider used to encode the input, the documents and the users' queries. Same as the LLM, it can be local, or remote, or even OpenAI.
13+
- Vector store: the store used to index and retrieve the documents.
14+
15+
There is an extra component that can be enabled or disabled: the UI. It is a Gradio UI that allows to interact with the API in a more user-friendly way.
16+
17+
### Setups and Dependencies
18+
Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section.
19+
PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
20+
Extras are the different options available for each component. For example, to install the dependencies for a a local setup with UI and qdrant as vector database, Ollama as LLM and HuggingFace as local embeddings, you would run
21+
22+
`poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-huggingface"`.
23+
24+
Refer to the [installation](/installation) section for more details.
25+
26+
### Setups and Configuration
27+
PrivateGPT uses yaml to define its configuration in files named `settings-<profile>.yaml`.
28+
Different configuration files can be created in the root directory of the project.
29+
PrivateGPT will load the configuration at startup from the profile specified in the `PGPT_PROFILES` environment variable.
30+
For example, running:
31+
```bash
32+
PGPT_PROFILES=ollama make run
33+
```
34+
will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
35+
- `settings.yaml` is always loaded and contains the default configuration.
36+
- `settings-ollama.yaml` is loaded if the `ollama` profile is specified in the `PGPT_PROFILES` environment variable. It can override configuration from the default `settings.yaml`
37+
38+
## About Fully Local Setups
39+
In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
40+
### Vector stores
41+
The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
42+
### Embeddings
43+
For local Embeddings there are two options:
44+
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
45+
* You can use the 'embeddings-huggingface' option in PrivateGPT, which will use HuggingFace.
46+
47+
In order for HuggingFace LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
48+
```bash
49+
poetry run python scripts/setup
50+
```
51+
52+
### LLM
53+
For local LLM there are two options:
54+
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
55+
* You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
56+
57+
In order for LlamaCPP powered LLM to work (the second option), you need to download the LLM model to the `models` folder. You can do so by running the `setup` script:
58+
```bash
59+
poetry run python scripts/setup
60+
```

fern/docs/pages/installation/installation.mdx

Lines changed: 146 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
## Installation and Settings
1+
It is important that you review the Main Concepts before you start the installation process.
22

3-
### Base requirements to run PrivateGPT
3+
## Base requirements to run PrivateGPT
44

5-
* Git clone PrivateGPT repository, and navigate to it:
5+
* Clone PrivateGPT repository, and navigate to it:
66

77
```bash
88
git clone https://github.com/imartinez/privateGPT
@@ -21,93 +21,180 @@ pyenv local 3.11
2121

2222
* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:
2323

24-
* Have a valid C++ compiler like gcc. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
25-
26-
* Install `make` for scripts:
24+
* Install `make` to be able to run the different scripts:
2725
* osx: (Using homebrew): `brew install make`
2826
* windows: (Using chocolatey) `choco install make`
2927

30-
### Install dependencies
28+
## Install and run your desired setup
29+
30+
PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
31+
Here are the different options available:
3132

32-
Install the dependencies:
33+
- LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike"
34+
- Embeddings: "huggingface", "openai", "sagemaker"
35+
- Vector stores: "qdrant", "chroma", "postgres"
36+
- UI: whether or not to enable UI (Gradio) or just go with the API
37+
38+
In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:
3339

3440
```bash
35-
poetry install --with ui
41+
poetry install --extras "<extra1> <extra2>..."
3642
```
3743

38-
Verify everything is working by running `make run` (or `poetry run python -m private_gpt`) and navigate to
39-
http://localhost:8001. You should see a [Gradio UI](https://gradio.app/) **configured with a mock LLM** that will
40-
echo back the input. Below we'll see how to configure a real LLM.
44+
Where `<extra>` can be any of the following:
45+
46+
- ui: adds support for UI using Gradio
47+
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
48+
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
49+
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
50+
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
51+
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
52+
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
53+
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
54+
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
55+
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
56+
- vector-stores-qdrant: adds support for Qdrant vector store
57+
- vector-stores-chroma: adds support for Chroma DB vector store
58+
- vector-stores-postgres: adds support for Postgres vector store
59+
60+
## Recommended Setups
61+
62+
There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
63+
You'll find more information in the Manual section of the documentation.
64+
65+
> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
66+
If you are using Windows, you'll need to set the env var in a different way, for example:
67+
68+
```powershell
69+
# Powershell
70+
$env:PGPT_PROFILES="ollama"
71+
make run
72+
```
73+
74+
or
75+
76+
```cmd
77+
# CMD
78+
set PGPT_PROFILES=ollama
79+
make run
80+
```
81+
82+
### Local, Ollama-powered setup - RECOMMENDED
83+
84+
**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.
4185

42-
### Settings
86+
Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.
4387

44-
<Callout intent="info">
45-
The default settings of PrivateGPT should work out-of-the-box for a 100% local setup. **However**, as is, it runs exclusively on your CPU.
46-
Skip this section if you just want to test PrivateGPT locally, and come back later to learn about more configuration options (and have better performances).
47-
</Callout>
88+
After the installation, make sure the Ollama desktop app is closed.
4889

49-
<br />
90+
Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:
5091

51-
### Local LLM requirements
92+
```bash
93+
ollama pull mistral
94+
ollama pull nomic-embed-text
95+
```
96+
97+
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
98+
```bash
99+
ollama serve
100+
```
101+
102+
Once done, on a different terminal, you can install PrivateGPT with the following command:
103+
```bash
104+
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
105+
```
52106

53-
Install extra dependencies for local execution:
107+
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
54108

55109
```bash
56-
poetry install --with local
110+
PGPT_PROFILES=ollama make run
57111
```
58112

59-
For PrivateGPT to run fully locally GPU acceleration is required
60-
(CPU execution is possible, but very slow), however,
61-
typical Macbook laptops or window desktops with mid-range GPUs lack VRAM to run
62-
even the smallest LLMs. For that reason
63-
**local execution is only supported for models compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp)**
113+
PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
64114

65-
These two models are known to work well:
115+
The UI will be available at http://localhost:8001
66116

67-
* https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF
68-
* https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF (recommended)
117+
### Private, Sagemaker-powered setup
69118

70-
To ease the installation process, use the `setup` script that will download both
71-
the embedding and the LLM model and place them in the correct location (under `models` folder):
119+
If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.
72120

121+
You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.
122+
123+
Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.
124+
125+
Then, install PrivateGPT with the following command:
73126
```bash
74-
poetry run python scripts/setup
127+
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
75128
```
76129

77-
If you are ok with CPU execution, you can skip the rest of this section.
130+
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
78131

79-
As stated before, llama.cpp is required and in
80-
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
81-
is used.
132+
```bash
133+
PGPT_PROFILES=sagemaker make run
134+
```
82135

83-
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
84-
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
136+
PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.
137+
138+
The UI will be available at http://localhost:8001
139+
140+
### Non-Private, OpenAI-powered test setup
141+
142+
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
143+
144+
You need an OPENAI API key to run this setup.
145+
146+
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
147+
148+
Then, install PrivateGPT with the following command:
149+
```bash
150+
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
151+
```
152+
153+
Once installed, you can run PrivateGPT.
154+
155+
```bash
156+
PGPT_PROFILES=openai make run
157+
```
158+
159+
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
85160

86-
#### Customizing low level parameters
161+
The UI will be available at http://localhost:8001
87162

88-
Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are available at PrivateGPT's `settings.yaml` file.
89-
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
90-
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.
163+
### Local, Llama-CPP powered setup
91164

92-
##### Available LLM config options
165+
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
93166

94-
The `llm` section of the settings allows for the following configurations:
167+
```bash
168+
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
169+
```
95170

96-
- `mode`: how to run your llm
97-
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)
171+
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
172+
```bash
173+
poetry run python scripts/setup
174+
```
98175

99-
Example:
176+
Once installed, you can run PrivateGPT with the following command:
100177

101-
```yaml
102-
llm:
103-
mode: local
104-
max_new_tokens: 256
178+
```bash
179+
PGPT_PROFILES=local make run
105180
```
106181

107-
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
108-
recommended models, instead of custom tuning the parameters.
182+
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
183+
184+
The UI will be available at http://localhost:8001
185+
186+
#### Llama-CPP support
187+
188+
For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
189+
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
190+
is used.
191+
192+
You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.
193+
194+
> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
195+
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.
109196
110-
#### OSX GPU support
197+
##### Llama-CPP OSX GPU support
111198

112199
You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.
113200

@@ -127,7 +214,7 @@ More information is available in the documentation of the libraries themselves:
127214
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
128215
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)
129216

130-
#### Windows NVIDIA GPU support
217+
##### Llama-CPP Windows NVIDIA GPU support
131218

132219
Windows GPU support is done through CUDA.
133220
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@@ -160,7 +247,7 @@ Note that llama.cpp offloads matrix calculations to the GPU but the performance
160247
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
161248
batch sizes and other parameters to get the best performance for your particular system.
162249

163-
#### Linux NVIDIA GPU support and Windows-WSL
250+
##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL
164251

165252
Linux GPU support is done through CUDA.
166253
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
@@ -188,7 +275,7 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co
188275
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
189276
```
190277

191-
### Known issues and Troubleshooting
278+
##### Llama-CPP Known issues and Troubleshooting
192279

193280
Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
194281
You might encounter several issues:
@@ -205,7 +292,7 @@ If, during your installation, something does not go as planned, retry in *verbos
205292

206293
For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.
207294

208-
#### Troubleshooting: C++ Compiler
295+
##### Llama-CPP Troubleshooting: C++ Compiler
209296

210297
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
211298
compiler on your computer.
@@ -227,7 +314,7 @@ To install a C++ compiler on Windows 10/11, follow these steps:
227314
Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
228315
2. If not, you can install clang or gcc with homebrew `brew install gcc`
229316

230-
#### Troubleshooting: Mac Running Intel
317+
##### Llama-CPP Troubleshooting: Mac Running Intel
231318

232319
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
233320
-march=native'_ during pip install.

0 commit comments

Comments
 (0)