{"id":168042,"date":"2026-05-29T14:00:00","date_gmt":"2026-05-29T11:00:00","guid":{"rendered":"https:\/\/computingforgeeks.com\/?p=168042"},"modified":"2026-05-27T00:35:30","modified_gmt":"2026-05-26T21:35:30","slug":"qdrant-collections-guide","status":"publish","type":"post","link":"https:\/\/computingforgeeks.com\/qdrant-collections-guide\/","title":{"rendered":"Qdrant Collections: Create, Configure, and Manage Vectors"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A Qdrant collection is the box that holds your vectors, your payloads, and the indexes that make filtered search fast. Get the collection shape right and the rest of the application falls out of it. Get it wrong and you will pay for it twice, once in memory at runtime and again when you migrate. If you are coming from Postgres and weighing the trade-offs, our <a href=\"https:\/\/computingforgeeks.com\/install-pgvector-postgresql-linux\/\">pgvector install guide<\/a> is the place to see what a single-table approach gives up.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This guide walks every collection-level setting you actually need to know, with a working Python script for each pattern and the matching JSON view from the Qdrant Web UI. You will see how to mix dense and sparse vectors in one collection, when to flip the storage to disk, how to enable multi-tenancy without spinning up a second cluster, and how to swap a collection out from under a running app with a single atomic alias update.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Tested May 2026 on Ubuntu 24.04.4 LTS with Qdrant 1.18.1 and qdrant-client 1.18.0. All 11 collection patterns below were created and inspected on a real Docker cluster; the Web UI screenshots are from the running server, not mock-ups.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Anatomy of a Qdrant collection<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A collection holds points. Each point has three parts: a unique ID, one or more vectors, and an optional payload (a JSON object). The collection itself layers on top of that with HNSW graph config, optimizer thresholds, a write-ahead log, optional quantization, and any payload indexes you create.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The companion code for this article lives at <a href=\"https:\/\/github.com\/c4geeks\/qdrant\/tree\/main\/collections\" target=\"_blank\" rel=\"noreferrer noopener\">github.com\/c4geeks\/qdrant\/tree\/main\/collections<\/a>. Spin up a local instance first with the <strong>install Qdrant on Ubuntu<\/strong>, <strong>install Qdrant on Rocky Linux<\/strong>, or <strong>install Qdrant on Debian<\/strong> guide, then follow the steps below against your own cluster.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Install the Python client and connect to the cluster:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>python3 -m venv venv\nsource venv\/bin\/activate\npip install \"qdrant-client[fastembed]\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Every script in this guide starts with the same two lines:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>from qdrant_client import QdrantClient, models\nclient = QdrantClient(url=\"http:\/\/localhost:6333\")<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Authenticated clusters pass <code>api_key=&quot;...&quot;<\/code> as a second argument. For TLS, use <code>https:\/\/<\/code> and set <code>prefer_grpc=True<\/code> if you want the binary protocol on port 6334.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Vector params: size and the four distance metrics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Every dense vector collection takes two mandatory params: <code>size<\/code> (the dimensionality, which must match your embedding model) and <code>distance<\/code> (how Qdrant measures similarity). The simplest case is one dense vector per point:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"basic_docs\",\n    vectors_config=models.VectorParams(\n        size=384,\n        distance=models.Distance.COSINE,\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>size<\/code> must match the model that produced your vectors exactly. Use 384 for <code>all-MiniLM-L6-v2<\/code>, 768 for <code>BGE-base<\/code>, 1536 for OpenAI <code>text-embedding-3-small<\/code>, 3072 for <code>text-embedding-3-large<\/code>. If the size in your collection and the size of your vectors disagree by even one, upserts fail with a clear error and the points never land.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Qdrant supports four distance metrics. Each one suits a different family of embedding models:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Metric<\/th><th>Python enum<\/th><th>Best for<\/th><\/tr><\/thead><tbody><tr><td>Cosine<\/td><td><code>Distance.COSINE<\/code><\/td><td>Sentence transformers, MiniLM, BGE, MPNet, most text embedding models<\/td><\/tr><tr><td>Dot product<\/td><td><code>Distance.DOT<\/code><\/td><td>Models that produce un-normalised output or when magnitude carries meaning<\/td><\/tr><tr><td>Euclidean<\/td><td><code>Distance.EUCLID<\/code><\/td><td>Image features (older CNN-style), geographic vectors, when scale matters<\/td><\/tr><tr><td>Manhattan<\/td><td><code>Distance.MANHATTAN<\/code><\/td><td>Sparse high-dimensional features, taxicab-style metrics, hashing schemes<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Cosine is the default if you do not know. If you trained or are using a model with a stated similarity metric, follow what the model card says. Switching metrics later means re-indexing every vector.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A loop that builds one collection per metric is useful for testing:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>for dist in [models.Distance.COSINE, models.Distance.DOT,\n             models.Distance.EUCLID, models.Distance.MANHATTAN]:\n    name = f\"dist_{dist.value.lower()}\"\n    client.create_collection(\n        collection_name=name,\n        vectors_config=models.VectorParams(size=128, distance=dist),\n    )<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Each collection is a separate index. They are isolated from one another and you cannot search across them in a single call.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Named vectors: one collection, multiple vector spaces<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Real production setups rarely have just one vector per item. A product needs a text embedding for the title and description, an image embedding for the photo, and possibly a sparse keyword vector. Qdrant lets you attach all three to the same point with named vectors. The collection stores them in separate indexes but keeps them lined up via the shared point ID.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"named_vectors\",\n    vectors_config={\n        \"text\":  models.VectorParams(size=384, distance=models.Distance.COSINE),\n        \"image\": models.VectorParams(size=512, distance=models.Distance.COSINE),\n    },\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The Info tab in the Web UI shows the resulting config as a nested map of named spaces with independent <code>size<\/code> and <code>distance<\/code> per entry:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-named-vectors-config.png\" alt=\"Qdrant named vectors config with text and image\" class=\"wp-image-168039\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-named-vectors-config.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-named-vectors-config-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-named-vectors-config-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-named-vectors-config-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">When you upsert points, supply each vector by name:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.upsert(\n    collection_name=\"named_vectors\",\n    points=[\n        models.PointStruct(\n            id=1,\n            vector={\n                \"text\":  [0.1] * 384,\n                \"image\": [0.2] * 512,\n            },\n            payload={\"sku\": \"ABC-123\"},\n        ),\n    ],\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Searches specify which named vector to use via the <code>using<\/code> param:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.query_points(\n    collection_name=\"named_vectors\",\n    query=[0.1] * 384,\n    using=\"text\",\n    limit=10,\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The pattern keeps memory accounting honest: storing the image vector is opt-in per query, and you can drop a named vector you no longer need without rebuilding the rest of the collection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sparse vectors for keyword-style retrieval<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Sparse vectors are the production-ready alternative to TF-IDF. Each sparse vector is a list of (index, value) pairs covering only the tokens that actually appear, which makes them very large in theory (50,000 dimensions for BERT vocab) but cheap to store in practice. Qdrant indexes them with an inverted index and can run them alongside dense vectors in the same collection.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"sparse_demo\",\n    vectors_config={\n        \"dense\": models.VectorParams(\n            size=384, distance=models.Distance.COSINE,\n        ),\n    },\n    sparse_vectors_config={\n        \"sparse_idx\": models.SparseVectorParams(\n            index=models.SparseIndexParams(on_disk=False),\n        ),\n    },\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The Info tab reports the two vector spaces independently:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-sparse-vectors-config.png\" alt=\"Qdrant sparse vectors config alongside dense\" class=\"wp-image-168035\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-sparse-vectors-config.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-sparse-vectors-config-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-sparse-vectors-config-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-sparse-vectors-config-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Sparse upserts use a different payload shape:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.upsert(\n    collection_name=\"sparse_demo\",\n    points=[\n        models.PointStruct(\n            id=1,\n            vector={\n                \"dense\": [0.1] * 384,\n                \"sparse_idx\": models.SparseVector(\n                    indices=[42, 1024, 5000],\n                    values=[0.7, 0.3, 0.9],\n                ),\n            },\n            payload={\"title\": \"Rust vector database\"},\n        ),\n    ],\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Typical generators for the sparse side are SPLADE, BM25 (via fastembed&apos;s <code>Qdrant\/bm25<\/code> model), or a custom TF-IDF pipeline. Hybrid search combines a dense and sparse result list with reciprocal rank fusion or a similar merger; <strong>filters and complex queries<\/strong> later in this series covers the full hybrid-search pattern.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Payload schema and the seven index types<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A payload index turns a JSON field into a B-tree (or inverted index, or geo grid) that the query planner can use to pre-filter points before running the vector search. Without it, a filter is a linear scan over the whole collection. With it, you get sub-millisecond filtered queries on 100M-point collections.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Qdrant ships seven payload schemas. Create one index per field you plan to filter on:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"payload_indexed\",\n    vectors_config=models.VectorParams(size=64, distance=models.Distance.COSINE),\n)\n\nschemas = {\n    \"category\":     models.PayloadSchemaType.KEYWORD,\n    \"view_count\":   models.PayloadSchemaType.INTEGER,\n    \"rating\":       models.PayloadSchemaType.FLOAT,\n    \"is_published\": models.PayloadSchemaType.BOOL,\n    \"location\":     models.PayloadSchemaType.GEO,\n    \"published_at\": models.PayloadSchemaType.DATETIME,\n}\nfor field, kind in schemas.items():\n    client.create_payload_index(\n        \"payload_indexed\", field_name=field, field_schema=kind,\n    )\n\n# Text index needs explicit tokenizer params\nclient.create_payload_index(\n    \"payload_indexed\", field_name=\"body\",\n    field_schema=models.TextIndexParams(\n        type=\"text\",\n        tokenizer=models.TokenizerType.WORD,\n        min_token_len=2, max_token_len=20, lowercase=True,\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The Web UI Info tab confirms each index lands with its declared data type. Annotated JSON makes it easy to verify the result without writing a separate <code>get_collection<\/code> script:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-info-tab-config-json.png\" alt=\"Qdrant collection Info tab with annotated config JSON\" class=\"wp-image-168036\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-info-tab-config-json.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-info-tab-config-json-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-info-tab-config-json-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-info-tab-config-json-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Use the right schema per field. The wrong choice is silently inefficient:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Schema<\/th><th>When to use<\/th><th>Filter type<\/th><\/tr><\/thead><tbody><tr><td><code>KEYWORD<\/code><\/td><td>Categorical strings: tags, SKUs, country codes<\/td><td>Exact <code>match<\/code>, <code>any<\/code>, <code>except<\/code><\/td><\/tr><tr><td><code>INTEGER<\/code><\/td><td>Counts, IDs from external systems<\/td><td><code>range<\/code>, exact match<\/td><\/tr><tr><td><code>FLOAT<\/code><\/td><td>Scores, prices, decimal weights<\/td><td><code>range<\/code><\/td><\/tr><tr><td><code>BOOL<\/code><\/td><td>Toggles: is_published, is_archived<\/td><td>Exact match<\/td><\/tr><tr><td><code>GEO<\/code><\/td><td>Lon\/lat pairs (in that order)<\/td><td><code>geo_radius<\/code>, <code>geo_bounding_box<\/code>, <code>geo_polygon<\/code><\/td><\/tr><tr><td><code>TEXT<\/code><\/td><td>Full-text search on a payload string field<\/td><td><code>match_text<\/code> with tokenizer-aware terms<\/td><\/tr><tr><td><code>DATETIME<\/code><\/td><td>ISO-8601 timestamps<\/td><td>Time-range <code>range<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">With the indexes in place, filtered queries are fast and predictable. A 1000-point smoke run takes a few milliseconds for any of the seven types:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>upserted 1000 points\n\nfiltered query (category=doc AND is_published=true) returned 3 in 4.6 ms\ngeo_radius (5000 km of (0,0)) returned 3 in 2.4 ms\nfull-text match (body~='sparse') returned 5 in 2.5 ms<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Skip the index and the same queries become full collection scans. The cost is invisible at 1000 points and brutal at 10 million.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">On-disk vectors and payload for memory savings<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">By default Qdrant keeps vectors and HNSW graphs in RAM. That gives you fast search but it caps collection size at whatever fits in physical memory. For a billion-point collection of 1536-dim OpenAI vectors you would need roughly 6 GB just for raw vectors, plus the HNSW overhead on top.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Push it to disk with three flags:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"on_disk_collection\",\n    vectors_config=models.VectorParams(\n        size=1536,\n        distance=models.Distance.COSINE,\n        on_disk=True,                       # vectors on disk (memory-mapped)\n    ),\n    on_disk_payload=True,                   # payload on disk\n    hnsw_config=models.HnswConfigDiff(\n        on_disk=True,                       # HNSW graph on disk\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The Web UI Info tab shows all three flags lit at once, so a quick glance confirms every layer is mmap-backed:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-on-disk-vectors-config.png\" alt=\"Qdrant on-disk vectors and payload config\" class=\"wp-image-168034\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-on-disk-vectors-config.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-on-disk-vectors-config-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-on-disk-vectors-config-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-on-disk-vectors-config-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Each flag changes a different layer. <code>on_disk=True<\/code> on the vector params writes the raw vectors to mmap files; the OS page cache pulls them into RAM on demand. <code>on_disk_payload=True<\/code> keeps payloads on disk too, which matters when each point has a large blob (raw HTML, image metadata, transcripts). <code>hnsw_config.on_disk=True<\/code> lifts the HNSW graph itself off the heap.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You give up some search latency for the memory headroom. On the Prefix Cache sample (163k points, 384-dim) the on-disk recall path is around 30% slower than the in-memory path, which is the trade-off you accept to fit 100M points on a single 64 GB machine.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Multi-tenancy via the tenant payload index<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Running 1000 small tenants in 1000 separate collections is expensive. Each collection has its own HNSW graph, its own segments, its own optimizer. The recommended pattern is one collection plus a tenant-aware payload index that Qdrant uses to physically isolate each tenant&apos;s vectors at the storage layer.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"tenants\",\n    vectors_config=models.VectorParams(\n        size=384, distance=models.Distance.COSINE,\n    ),\n)\n\nclient.create_payload_index(\n    \"tenants\", field_name=\"tenant_id\",\n    field_schema=models.KeywordIndexParams(\n        type=\"keyword\",\n        is_tenant=True,        # the key flag\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With <code>is_tenant=True<\/code> set, Qdrant partitions storage by the value of the indexed field. A filter that pins <code>tenant_id<\/code> hits only the points for that tenant, and the per-tenant performance stays flat as the cluster grows other tenants alongside.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Always include the tenant filter on every search:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.query_points(\n    collection_name=\"tenants\",\n    query=[0.1] * 384,\n    query_filter=models.Filter(\n        must=[\n            models.FieldCondition(\n                key=\"tenant_id\",\n                match=models.MatchValue(value=\"acme-corp\"),\n            ),\n        ],\n    ),\n    limit=10,\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Without the filter, the search ranges over every tenant&apos;s data. With it, the search is partition-aware and the latency curve flattens. JWT-signed tokens (covered in the <strong>API key and JWT security<\/strong> guide later in this series) can hardwire the tenant value into the token claims so the application code does not have to enforce it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Optimizer config: segments, indexing, mmap<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The optimizer thread runs in the background, merging segments and building the HNSW index. Its defaults are tuned for general use; you tune them when you have a specific workload shape (heavy ingest, big collections, low-RAM hosts).<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"tuned\",\n    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),\n    optimizers_config=models.OptimizersConfigDiff(\n        default_segment_number=4,\n        indexing_threshold=20000,\n        memmap_threshold=200000,\n        max_optimization_threads=2,\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Confirm the tune landed in the dashboard&apos;s Info tab:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-optimizer-config.png\" alt=\"Qdrant optimizer config: segments, indexing, mmap\" class=\"wp-image-168038\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-optimizer-config.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-optimizer-config-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-optimizer-config-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-optimizer-config-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Each knob has a specific effect:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>default_segment_number<\/code>: how many parallel segments the optimizer targets. Match this roughly to your CPU core count for write-heavy ingest, lower it for low-RAM hosts.<\/li><li><code>indexing_threshold<\/code>: build the HNSW index once a segment hits this point count. Lower values build the index sooner (better latency, more rebuild work). 20000 is the default.<\/li><li><code>memmap_threshold<\/code>: segments larger than this many points get memory-mapped automatically, even if you did not pass <code>on_disk=True<\/code>. Bigger collections benefit from raising it.<\/li><li><code>max_optimization_threads<\/code>: cap on parallel optimizer work. The default of 0 means &quot;use as many as needed&quot;, which can starve search threads on small hosts.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Wrong defaults are not catastrophic; they show up as slow ingest or stuck-yellow status in the dashboard. Bench, tune, redeploy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">WAL config: capacity and segments-ahead<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The write-ahead log is Qdrant&apos;s durability layer. Every upsert and delete is appended to the WAL before the segment is flushed to disk. Two settings shape its behaviour:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.create_collection(\n    collection_name=\"wal_tuned\",\n    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),\n    wal_config=models.WalConfigDiff(\n        wal_capacity_mb=64,\n        wal_segments_ahead=2,\n    ),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><code>wal_capacity_mb<\/code> is the size of each WAL segment. Raise it for bursty ingest with large payloads, lower it on small hosts. <code>wal_segments_ahead<\/code> is how many empty WAL segments to pre-allocate; 2 is the default and is enough for most workloads. Bump it to 4 or 8 if your ingest is steady and you want to amortise allocation overhead.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The WAL lives in <code>{storage_dir}\/collections\/{name}\/wal<\/code>. You can mount it on a separate disk if you want to isolate write IOPS from search reads; the <strong>performance tuning<\/strong> guide later in this series covers that topology.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Update an existing collection in place<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Most settings can be changed without re-creating the collection. <code>update_collection<\/code> takes the same diff types you used to create it and applies them to the live collection:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.update_collection(\n    collection_name=\"basic_docs\",\n    hnsw_config=models.HnswConfigDiff(m=32, ef_construct=256),\n    optimizers_config=models.OptimizersConfigDiff(indexing_threshold=10000),\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The change is non-destructive. Existing points stay, new ingests pick up the new params, and segments rebuild in the background using the new HNSW graph settings. The output after running an update against the basic_docs collection shows the new values immediately:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>after update: hnsw.m=32  ef_construct=256\n              indexing_threshold=10000<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">What you cannot change in place: vector <code>size<\/code>, vector <code>distance<\/code>, the set of named vectors, and whether a vector is dense or sparse. Those require a new collection and a re-upsert (see the alias swap below).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Aliases for zero-downtime swap<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The pattern: build the new collection alongside the old, replay your data into it, then atomically point the alias from old to new. The application never sees a stale state because the alias rename is a single transaction.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code># Two real collections, one with 384-dim vectors, one with 768-dim\nclient.create_collection(\"docs_v1\", models.VectorParams(size=384, distance=models.Distance.COSINE))\nclient.create_collection(\"docs_v2\", models.VectorParams(size=768, distance=models.Distance.COSINE))\n\n# Application code reads from \"docs_live\"\nclient.update_collection_aliases([\n    models.CreateAliasOperation(create_alias=models.CreateAlias(\n        collection_name=\"docs_v1\",\n        alias_name=\"docs_live\",\n    )),\n])<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The application points at <code>docs_live<\/code> and reads from <code>docs_v1<\/code>. Re-embed your corpus with the new model into <code>docs_v2<\/code> at your own pace, then swap with a single atomic call:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>client.update_collection_aliases([\n    models.DeleteAliasOperation(delete_alias=models.DeleteAlias(\n        alias_name=\"docs_live\")),\n    models.CreateAliasOperation(create_alias=models.CreateAlias(\n        collection_name=\"docs_v2\",\n        alias_name=\"docs_live\",\n    )),\n])<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The two operations execute atomically inside one call, so there is no window where <code>docs_live<\/code> points nowhere. Verify the swap with <code>get_aliases<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>after swap   -> all aliases:\n  alias=docs_live  -> collection=docs_v2<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Drop <code>docs_v1<\/code> at your leisure once you have confirmed the new collection serves traffic without errors. This is the safest pattern for embedding-model upgrades, vector-size changes, and any schema migration that cannot be done in place.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Verify everything from one place<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">After running every snippet in this guide against a fresh cluster, the Web UI Collections list looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"900\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-all-14-test-collections.png\" alt=\"Qdrant Collections list with 14 test collections\" class=\"wp-image-168033\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-all-14-test-collections.png 1440w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-all-14-test-collections-300x188.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-all-14-test-collections-1024x640.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-all-14-test-collections-768x480.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The Web UI panels are covered in detail in the <strong>Qdrant Web UI tour<\/strong> guide; the short version is that clicking a collection name opens the Info tab and lets you inspect every config field the Python SDK set above. The terminal view from the Python script run side-by-side with the dashboard makes the cross-check trivial:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"800\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-python-sdk-terminal.png\" alt=\"Qdrant collections created via Python SDK terminal\" class=\"wp-image-168037\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-python-sdk-terminal.png 920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-python-sdk-terminal-300x261.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-collections-python-sdk-terminal-768x668.png 768w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Anything the dashboard reports as green is durable on disk and ready for queries.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Gotchas worth knowing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Five footguns that cost real time when missed:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Distance and vector size are immutable.<\/strong> Once a collection is created with <code>size=384, Cosine<\/code>, you cannot promote it to 768 or switch to Euclidean. Plan the alias-swap path in advance if you expect either to change.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Geo payload uses lon\/lat, not lat\/lon.<\/strong> The order matters and is the opposite of what most map APIs return. Mixing them silently puts points on the wrong continent and your geo_radius filter returns nothing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The text index does not auto-create on payload.<\/strong> A field can hold long text, but the <code>match_text<\/code> filter returns empty without an explicit <code>TextIndexParams<\/code> index. The tokenizer choice (<code>WORD<\/code>, <code>WHITESPACE<\/code>, <code>PREFIX<\/code>, <code>MULTILINGUAL<\/code>) is part of the index, not the query, so changing tokenizers means re-indexing the field.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Aliases are not collections.<\/strong> You cannot <code>get_collection(alias_name)<\/code>. The alias resolves on every API call to its target collection, but the alias name itself does not appear in <code>get_collections<\/code>. Use <code>get_aliases<\/code> (or <code>get_collection_aliases(collection_name)<\/code>) to inspect them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Optimizer thresholds are per-segment, not per-collection.<\/strong> An <code>indexing_threshold<\/code> of 20000 means each individual segment must reach that point count before its HNSW index builds. A collection with 100,000 points spread across 8 segments may not have any of them indexed yet. Watch <code>indexed_vectors_count<\/code> in the Info tab if your filtered searches feel slow.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Decision tree: which knob to reach for<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">One rough triage:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Symptom<\/th><th>Reach for<\/th><\/tr><\/thead><tbody><tr><td>Filtered search slow<\/td><td>Add a payload index on the filtered field, schema matched to the field type<\/td><\/tr><tr><td>Out of RAM, billion-point goal<\/td><td><code>on_disk=True<\/code> on vectors, <code>on_disk_payload=True<\/code>, <code>hnsw_config.on_disk=True<\/code><\/td><\/tr><tr><td>Multi-tenant SaaS with 1000+ tenants<\/td><td>One collection plus <code>KeywordIndexParams(is_tenant=True)<\/code> on tenant_id<\/td><\/tr><tr><td>Need text similarity AND keyword recall<\/td><td>Named dense vector plus a sparse vector in the same collection<\/td><\/tr><tr><td>Ingest stalling, status stays yellow<\/td><td>Lower <code>indexing_threshold<\/code>, raise <code>default_segment_number<\/code><\/td><\/tr><tr><td>Need to upgrade the embedding model<\/td><td>Build <code>v2<\/code> alongside <code>v1<\/code>, swap the alias atomically<\/td><\/tr><tr><td>Different similarity per use case<\/td><td>One collection per distance metric (cannot mix in one)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The next article in this series, <strong>REST and gRPC APIs in practice<\/strong>, moves from the SDK layer to the wire protocols and shows how to call every operation above directly via curl and grpcurl. The HNSW graph tuning and quantization knobs are covered in <strong>performance tuning<\/strong>. For now, the seven snippets above cover the full collection lifecycle: create, configure, fill with payload indexes, push to disk, partition by tenant, tune the optimizer, and swap atomically when the embedding model evolves. To wire a finished collection into a working chatbot, the <a href=\"https:\/\/computingforgeeks.com\/self-hosted-rag-ollama-pgvector\/\">self-hosted RAG with Ollama<\/a> walkthrough shows the retrieval-augmented pattern end to end.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Qdrant collection is the box that holds your vectors, your payloads, and the indexes that make filtered search fast. Get the collection shape right and the rest of the application falls out of it. Get it wrong and you will pay for it twice, once in memory at runtime and again when you migrate. &#8230; <a title=\"Qdrant Collections: Create, Configure, and Manage Vectors\" class=\"read-more\" href=\"https:\/\/computingforgeeks.com\/qdrant-collections-guide\/\" aria-label=\"Read more about Qdrant Collections: Create, Configure, and Manage Vectors\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":168040,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[39034,461,35913,50],"tags":[17245,218,324,669],"cfg_series":[39865],"class_list":["post-168042","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-databases","category-devops","category-linux-tutorials","tag-ai","tag-containers","tag-databases","tag-dev","cfg_series-qdrant-mastery"],"_links":{"self":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168042","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/comments?post=168042"}],"version-history":[{"count":2,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168042\/revisions"}],"predecessor-version":[{"id":168119,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168042\/revisions\/168119"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media\/168040"}],"wp:attachment":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media?parent=168042"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/categories?post=168042"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/tags?post=168042"},{"taxonomy":"cfg_series","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/cfg_series?post=168042"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}