Skip to content

fix(core): optimize OCI Layers#7005

Merged
timvisee merged 3 commits intoqdrant:devfrom
siddjellali:master
Aug 14, 2025
Merged

fix(core): optimize OCI Layers#7005
timvisee merged 3 commits intoqdrant:devfrom
siddjellali:master

Conversation

@siddjellali
Copy link
Contributor

📦 Description

This PR optimizes the Qdrant Docker image (v1.15.1-unprivileged) by significantly reducing its size and improving layer efficiency.

🔍 Context

Analysis using the [dive] tool revealed several inefficiencies in the official image:

  • Files and permissions copied multiple times
  • APT cache data not cleaned
  • Unnecessary layers increasing the final size

🔧 Changes

  1. Permission handling optimization

    • Apply permissions before copying files to avoid duplication in final layers.
  2. APT cache cleanup

    • Removed APT-related cache (/var/lib/apt/lists/*) after installation to free up space.
  3. Dockerfile step reorganization

    • Merged apt-get update + apt-get install into a single step to reduce intermediate layers.
    • Strategically reordered commands to prevent unnecessary rebuilds.

📊 Results

Image Final Size Reduction
Official Qdrant 292 MB
Optimized (this PR) 171 MB -41% 🚀

Dive screenshots

Official image:
official

Optimized image:
optimized

📌 Notes

  • This optimization does not change any Qdrant binary functionality
  • Can be applied to both the unprivileged image and the base image

@coderabbitai

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
Dockerfile (1)

148-153: GPU deps: add cleanup and no-recommends to keep layers slim

Currently apt lists aren’t cleaned here, so the GPU path leaves extra MBs behind.

-RUN if [ -n "$GPU" ]; then \
-    apt-get update && \
-    apt-get install -y \
-    libvulkan1 \
-    libvulkan-dev \
-    vulkan-tools \
-    ; fi
+RUN if [ -n "$GPU" ]; then \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+      libvulkan1 \
+      libvulkan-dev \
+      vulkan-tools \
+    && rm -rf /var/lib/apt/lists/* \
+; fi
🧹 Nitpick comments (7)
.gitattributes (1)

5-5: Enforce LF for shell scripts — good hardening

This prevents CRLF issues at runtime inside Linux containers. Consider also ensuring executable bit for scripts in CI to avoid permission surprises.

Example CI check (optional):

#!/bin/bash
# fail if any .sh is not executable
fd -0 -e sh | xargs -0 -I{} test -x "{}" || { echo "Some .sh files are not executable"; exit 1; }
Dockerfile (6)

131-131: Use OCI-compliant authors label (maintainer is deprecated)

The Docker "MAINTAINER" concept is deprecated; prefer OCI label org.opencontainers.image.authors. You can keep the current label for compatibility, but add the OCI label.

-LABEL maintainer="Qdrant Team <info@qdrant.tech>"
+LABEL maintainer="Qdrant Team <info@qdrant.tech>"
+LABEL org.opencontainers.image.authors="Qdrant Team <info@qdrant.tech>"

Also applies to: 139-139


142-142: Hadolint DL1000 on FROM with parameter expansion — either ignore explicitly or refactor

Hadolint flags the colon in ${GPU:+gpu-} even though BuildKit supports this expansion. Two options:

  • Prefer adding an inline ignore just above this line:
# hadolint ignore=DL1000
FROM qdrant-${GPU:+gpu-}${GPU:-cpu} AS qdrant
  • Or precompute the base name (readability), though hadolint may still complain:
ARG GPU
ARG BASE_STAGE=qdrant-${GPU:+gpu-}${GPU:-cpu}
FROM ${BASE_STAGE} AS qdrant

If CI uses hadolint 2.12.0 strictly, the ignore pragma is the most reliable.


189-189: Defaulting to non-root is a good security posture; document compatibility impacts

Changing USER_ID default from 0 to 1000 is great. It may break setups mounting host volumes owned by root or expecting root. Add a release note and suggest passing --build-arg USER_ID=$(id -u) or -e PUID/PGID-style guidance.

I can draft release notes and a README snippet showing how to set USER_ID to match host ownership.


198-201: Ensure entrypoint is executable after copy

COPY preserves mode if the source has +x; if not, container might fail to start. Add an explicit chmod for robustness.

 COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/qdrant "$APP"/qdrant
 COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/config "$APP"/config
 COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/tools/entrypoint.sh "$APP"/entrypoint.sh
 COPY --from=builder --chown=$USER_ID:$USER_ID /static "$APP"/static
+RUN chmod +x "$APP"/entrypoint.sh

191-196: User/group creation can fail if UID/GID already exist

If the provided UID/GID already map to existing entities in the base image, groupadd/useradd will error. Guard with getent.

-RUN if [ "$USER_ID" != 0 ]; then \
-        groupadd --gid "$USER_ID" qdrant; \
-        useradd --uid "$USER_ID" --gid "$USER_ID" -m qdrant; \
+RUN if [ "$USER_ID" != 0 ]; then \
+        getent group "$USER_ID" >/dev/null || groupadd --gid "$USER_ID" qdrant; \
+        getent passwd "$USER_ID" >/dev/null || useradd --uid "$USER_ID" --gid "$USER_ID" -m qdrant; \
         mkdir -p "$APP"/storage "$APP"/snapshots; \
         chown -R "$USER_ID:$USER_ID" "$APP"; \
     fi

205-205: Drop quotes in USER for portability

Some parsers mishandle quoted USER values. Prefer unquoted form.

-USER "$USER_ID:$USER_ID"
+USER $USER_ID:$USER_ID
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af7ab5b and e27adc8.

📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • Dockerfile (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-10T07:16:42.027Z
Learnt from: tellet-q
PR: qdrant/qdrant#6842
File: tests/e2e_tests/test_data/docker-compose.yaml:6-8
Timestamp: 2025-07-10T07:16:42.027Z
Learning: In Qdrant's e2e test framework (tests/e2e_tests/), the docker-compose.yaml files intentionally use short port syntax (e.g., "6333" instead of "6333:6333") to let Docker assign random host ports. The test framework has a `_extract_container_ports` function in conftest.py that dynamically discovers these randomly assigned ports, which is better for parallel test execution and avoids port conflicts.

Applied to files:

  • Dockerfile
🪛 Hadolint (2.12.0)
Dockerfile

[error] 142-142: unexpected ':'
expecting '@', '', a new line followed by the next instruction, or the image tag

(DL1000)

🔇 Additional comments (1)
Dockerfile (1)

203-204: WORKDIR after COPY — good to avoid creating root-owned path implicitly

This ordering helps prevent accidental root-owned directories. No action needed.

Dockerfile Outdated
Comment on lines +159 to +163
RUN apt-get update && \
apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf/* /var/lib/dpkg/status-old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

CPU path: make installs non-interactive and tighten cleanup

Installing tzdata can block without noninteractive mode on the CPU base (Debian slim).

-RUN apt-get update &&  \
-    apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES \
-    && rm -rf /var/lib/apt/lists/* /var/cache/debconf/* /var/lib/dpkg/status-old
+RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
+    apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES && \
+    apt-get clean && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* /var/cache/debconf/* /var/lib/dpkg/status-old

Note: removing dpkg’s status-old is safe but optional; you can omit it if you want to stay conservative.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN apt-get update && \
apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf/* /var/lib/dpkg/status-old
RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES && \
apt-get clean && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* /var/cache/debconf/* /var/lib/dpkg/status-old
🤖 Prompt for AI Agents
In Dockerfile lines 159 to 161, the installation of tzdata can block due to
interactive prompts. To fix this, set the environment variable
DEBIAN_FRONTEND=noninteractive before running apt-get install to make the
installation non-interactive. Also, keep the cleanup commands but you may
optionally omit removing /var/lib/dpkg/status-old if you prefer a more
conservative cleanup.

@siddjellali
Copy link
Contributor Author

I highly encourage you to add Dive to your CI.
Note: I have no affiliation with the software, i'm nobody — I just genuinely love this tool.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
Dockerfile (1)

148-153: GPU path: add --no-install-recommends and clean apt caches to avoid leftover layers

The conditional install leaves apt lists and archives behind, increasing layer size. Apply no-recommends and cleanup within the same RUN.

-RUN if [ -n "$GPU" ]; then \
-    apt-get update \
-    && apt-get install -y \
-    libvulkan1 \
-    libvulkan-dev \
-    vulkan-tools \
-    ; fi
+RUN if [ -n "$GPU" ]; then \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+      libvulkan1 \
+      libvulkan-dev \
+      vulkan-tools && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* /var/cache/debconf/*; \
+  fi
♻️ Duplicate comments (1)
Dockerfile (1)

159-163: CPU path: make installs non-interactive and tighten cleanup (prevents tzdata prompts, reduces size)

tzdata can block in Debian slim without DEBIAN_FRONTEND=noninteractive. Also remove /var/cache/apt/archives/* to reclaim space.

-RUN apt-get update \
-    && apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/* /var/cache/debconf/* /var/lib/dpkg/status-old
+RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
+    apt-get install -y --no-install-recommends ca-certificates tzdata libunwind8 $PACKAGES && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* /var/cache/debconf/* /var/lib/dpkg/status-old
🧹 Nitpick comments (3)
Dockerfile (3)

131-131: Prefer OCI authors label instead of custom maintainer label

Use org.opencontainers.image.authors for standards compliance; current maintainer label is redundant given existing OCI labels.

Suggested change:

-LABEL maintainer="Qdrant Team <info@qdrant.tech>"
+LABEL org.opencontainers.image.authors="Qdrant Team <info@qdrant.tech>"

Also applies to: 139-139


199-203: Reduce COPY layers and ensure entrypoint is executable

  • Combine COPYs where destinations align to reduce layers.
  • Explicitly set entrypoint mode with --chmod to avoid relying on preserved mode across stages.
-COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/qdrant "$APP"/qdrant
-COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/config "$APP"/config
-COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/tools/entrypoint.sh "$APP"/entrypoint.sh
-COPY --from=builder --chown=$USER_ID:$USER_ID /static "$APP"/static
+COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/qdrant "$APP"/qdrant
+COPY --from=builder --chown=$USER_ID:$USER_ID /qdrant/config /static "$APP"/
+COPY --from=builder --chown=$USER_ID:$USER_ID --chmod=0755 /qdrant/tools/entrypoint.sh "$APP"/entrypoint.sh

Note: The combined COPY keeps config and static under $APP/, preserving directory names.


142-142: Hadolint DL1000: parser trips on parameter expansion in FROM; add ignore or refactor selector

Hadolint flags the dynamic FROM with error DL1000. Either add an ignore directive or refactor the pattern to avoid advanced expansion. The build itself is valid with BuildKit.

Option A (ignore hadolint):

+# hadolint ignore=DL1000
 FROM qdrant-${GPU:+gpu-}${GPU:-cpu} AS qdrant

Option B (refactor): use ARG to select a base and a simple FROM. Example:

# earlier:
# ARG GPU
# ...
# RUN case "$GPU" in "" ) echo cpu ;; * ) echo gpu-"$GPU" ;; esac > /tmp/base
# and build each base as qdrant-cpu, qdrant-gpu-nvidia, qdrant-gpu-amd

# here:
ARG BASE=qdrant-${GPU:+gpu-}${GPU:-cpu}
FROM ${BASE} AS qdrant

If CI enforces hadolint with fail-on-error, Option A is the quickest fix.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e27adc8 and cee76eb.

📒 Files selected for processing (1)
  • Dockerfile (4 hunks)
🧰 Additional context used
🪛 Hadolint (2.12.0)
Dockerfile

[error] 142-142: unexpected ':'
expecting '@', '', a new line followed by the next instruction, or the image tag

(DL1000)

🔇 Additional comments (2)
Dockerfile (2)

186-186: Resetting DIR is good

Explicitly clearing ENV DIR avoids accidental reuse in subsequent COPYs. LGTM.


204-205: WORKDIR after COPY is appropriate

Setting WORKDIR after content is in place prevents cache invalidation when source files change elsewhere.

@generall
Copy link
Member

it does look a bit smaller indeed
image

@generall generall requested a review from timvisee August 11, 2025 16:37
@siddjellali
Copy link
Contributor Author

any update here ?

@generall
Copy link
Member

could you please point this PR to dev branch? Otherwise LGTM

@siddjellali
Copy link
Contributor Author

could you please point this PR to dev branch? Otherwise LGTM

@generall Who are you asking? Me?

@generall
Copy link
Member

@generall Who are you asking? Me?

yes. If you don't have time, we will do it ourselves a bit later

Sid Ahmed DJELLALI added 3 commits August 14, 2025 15:49
Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>
Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>
Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>
@timvisee timvisee changed the base branch from master to dev August 14, 2025 13:50
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I've rebased the PR for you. Changes look good.

@timvisee
Copy link
Member

For completeness, I noticed that the difference is less significant.

Before this PR the image is 198MB, while the new image is 176MB. Since it's still an improvement, I'm happy to merge.

Thanks!

@timvisee timvisee merged commit fed67b7 into qdrant:dev Aug 14, 2025
@siddjellali
Copy link
Contributor Author

For completeness, I noticed that the difference is less significant.

Before this PR the image is 198MB, while the new image is 176MB. Since it's still an improvement, I'm happy to merge.

Thanks!

Check unprivileged image 😬🙏🏽

timvisee pushed a commit that referenced this pull request Aug 26, 2025
* fix(core): optimize OCI Layers

Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>

* fix(core): optimize OCI Layers

Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>

* fix(core): optimize OCI Layers

Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>

---------

Signed-off-by: Sid Ahmed Djellali <s.djellali@cyberdian.fr>
Co-authored-by: Sid Ahmed DJELLALI <s.djellali@cyberdian.fr>
@timvisee timvisee mentioned this pull request Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants