If you‘re starting out with Docker, one of the first things you‘ll likely encounter is the Dockerfile. When I first started learning Docker, I found Dockerfiles deceptively simple yet hard to master. The basic commands seemed easy enough to grasp, but I didn‘t fully understand how Docker used Dockerfiles behind the scenes to build images.
In this guide, I want to demystify the Dockerfile and provide you with an in-depth understanding of how they work under the hood. My goal is to take you from Dockerfile beginner to Dockerfile expert by covering:
- What exactly a Dockerfile is
- Core Dockerfile instructions and how to use them
- How Docker builds images from Dockerfiles
- Dockerfile best practices
- A real-world Dockerfile example
- Extending Dockerfiles with advanced features
- Integrating Dockerfiles into your development workflows
So let‘s get started on the path to mastering Dockerfiles!
What Exactly is a Dockerfile?
A Dockerfile is a plaintext file that contains a set of instructions and commands telling Docker how to build a Docker image. It automates the image creation process so you can repeatedly build new images with the same configurations.
Docker reads the Dockerfile instructions in order to automatically build an image layer-by-layer. Each instruction adds a new layer to the image, with layers representing a portion of the images file system that either adds to or replaces the layer below it.
These layers get cached during builds, helping speed up subsequent image builds. This makes Dockerfiles powerful yet simple tools for defining consistent and shareable Docker environments.
Core Dockerfile Instructions
Dockerfiles support a number of basic instructions that you can use to build images. Here are some of the most common and important instructions:
FROM
The FROM instruction initializes a new build stage and sets the base image for subsequent instructions. A Dockerfile must start with a FROM command.
For example:
FROM ubuntu:20.04
This pulls the ubuntu:20.04 image from Docker Hub to use as the base for our new image.
You can also have multiple FROM statements to create multi-stage builds.
COPY
The COPY instruction copies files and directories from the local host into the container filesystem. For example:
COPY . /app
This would copy the contents of the current directory on the host into the /app directory in the container.
Some things to note about COPY:
- You can specify a source and destination
- The destination is relative to the working directory
- The source can contain wildcards for matching multiple files or directories
RUN
The RUN instruction executes a command inside the image, for example installing a package manager or application code.
RUN apt-get update && apt-get install -y git
You can chain multiple commands together using &&. Each RUN statement creates a new layer that gets cached.
EXPOSE
The EXPOSE instruction exposes ports that the running container will listen on at runtime.
For example:
EXPOSE 80 443
This exposes ports 80 and 443.
ENV
The ENV instruction sets environment variables for use in the container. For example:
ENV NODE_ENV production
This sets the NODE_ENV variable to production. You can access this using process.env.NODE_ENV in Node.js.
CMD
The CMD instruction provides the default command that runs when starting a container from the image. For example:
CMD ["npm", "start"]
This will run npm start when starting the container.
Only one CMD can be used in a Dockerfile. If you specify multiple, only the last CMD takes effect.
ADD
The ADD instruction copies files from the host into the container, with some added functionality over the COPY instruction:
- Can copy from a remote URL instead of just the host
- Has automatic decompression and unpacking abilities for compressed files
For example:
ADD https://example.com/file.tar.gz /app
This would download the .tar.gz file and decompress it into the /app directory.
In most cases, COPY is recommended over ADD unless you need the extended features.
This covers some of the most used Dockerfile instructions, but there are many more available for advanced workflows.
How Docker Builds Images
When you run docker build, Docker goes through the Dockerfile instructions in order to build the image layer-by-layer.
Each instruction creates a new intermediate container that is committed into a writable layer. All changes made in this container are saved to this layer before moving onto the next. The layers build on top of each other to create the final image.
Once built, images are made up of read-only layers stacked on top of each other like a layered cake. The layers represent changes and additions to the image. When you create containers from the image, you simply add a read-write layer on top.
The major advantage of layers is caching. During the build process, Docker caches each layer. On rebuild, Docker uses the cache and only builds layers again if their instructions changed. This greatly improves subsequent build times.
Let‘s visualize this build process with a simple Dockerfile:
FROM ubuntu:18.04
RUN apt-get update && apt-get install python3
RUN pip install flask
COPY . /app
ENTRYPOINT [ "python3", "app.py"]
When building this image, Docker would:
- Pull the
ubuntu:18.04base image - Create a new writable container layer and install python3 inside
- Commit this layer and cache
- Install flask in another writable layer
- Commit this layer and cache
- Copy files in a new layer and commit
- Configure the entrypoint
We now end up with an image containing our Ubuntu base layer and 4 new layers – one for each instruction after the FROM.

This demonstrates the core concepts of how Docker leverages Dockerfiles and layering to build images.
Dockerfile Best Practices
When writing Dockerfiles, keep in mind these best practices:
- Start with a small base image: Use a minimal base like Alpine Linux to reduce size and attack surface area.
- Avoid installing unnecessary packages: Only install what you need in the image to reduce size and dependencies.
-
Chain RUN commands: Chain together
RUNcommands using&&to reduce the number of layers. -
Use .dockerignore files: Add a
.dockerignoreto avoid copying unnecessary files. - Copy files late: Copy app files as late as possible to avoid cache busting.
-
Use environment variables: Pass configurations as
ENVrather than hardcoded values. - Clean up artifacts: Clean up temporary files, caches, etc. in the same layer they were created to avoid accumulating cruft.
-
Always tag images: Provide tagged image builds like
node:14-alpinerather thanlatestfor base images. - Optimize for production: Remove all development tools/data not needed in production images.
- Leverage BuildKit: Use Docker BuildKit for advanced building capabilities.
Adopting these best practices will help streamline your image building process and lead to more secure, production-ready images.
Here is an example Dockerfile following best practices:
# Use a small Alpine node image
FROM node:14-alpine
# Install only necessary libraries
RUN apk add --no-cache python2 g++ make
# Set working directory
WORKDIR /app
# Install dependencies based on package.json
COPY package*.json ./
RUN npm install
# Copy all files
COPY . .
# Set env vars
ENV NODE_ENV production
# Expose port and start app
EXPOSE 3000
CMD ["node", "server.js"]
This takes advantage of multi-stage builds, a .dockerignore file, node version pinning, and more.
Real-World Example
Let‘s look at a more real-world Dockerfile example for a Python Flask application.
Dockerfile
# Use the official Python image
FROM python:3.8-alpine
# Set environment variables
ENV APP_HOME /app
ENV PORT 8000
# Set work directory
WORKDIR $APP_HOME
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy app source code
COPY . .
# Expose port and run app
EXPOSE $PORT
CMD ["python", "app.py"]
requirements.txt
Flask==2.0.1
redis==4.1.0
app.py
from flask import Flask
import os
import socket
app = Flask(__name__)
@app.route("/")
def hello():
html = "<h3>Hello World!</h3>" \
"<b>Hostname:</b> {hostname}<br/>"
return html.format(hostname=socket.gethostname())
if __name__ == "__main__":
app.run(host=‘0.0.0.0‘, port=os.environ.get(‘PORT‘))
Let‘s break down what this Dockerfile does:
- Uses the official Python image as a small but complete base
- Sets two environment variables –
APP_HOMEfor the working directory andPORTfor the exposed port - Copies the
requirements.txtfile and installs dependencies early in the build process - Copies over the application code only after dependencies are installed to take advantage of caching
- Exposes port 8000 and defines the default run command
This represents a simple yet robust Dockerfile for a Python application with dependencies.
To build the image:
docker build -t myflaskapp:1.0 .
And to run a container:
docker run -p 8000:8000 myflaskapp:1.0
The container will serve the "Hello World" app on port 8000.
Extending with Docker BuildKit
Docker BuildKit is an advanced framework for building Docker images. It provides improved caching, parallel stages, and dynamic dependencies.
BuildKit allows use of more advanced syntax and capabilities compared to regular Dockerbuilds:
- Multi-stage builds
- Inline caching for speed
- Conditional stages
- Parallel stages
- Dynamic local staging
- Dockerignore support
- Secret handling
To enable BuildKit, configure your Docker daemon with:
dockerd --builder=buildkit
Your Dockerfile can then take advantage of new syntax like:
# syntax = docker/dockerfile:1.2
FROM golang:1.16 AS build
RUN --mount=type=cache,target=/go/pkg go build -o /out/mybin
FROM alpine:latest
COPY --from=build /out/mybin /bin/mybin
This shows a multi-stage build using a Go image to build an executable, then copying the binary to a lightweight Alpine image.
BuildKit extends Dockerfiles to allow for more complex workflows and building capabilities.
Integrating into Workflows
Dockerfiles are commonly integrated into developer workflows and CI/CD pipelines to build images.
For development environments, Docker Compose is a great option that allows using Dockerfiles to define services in a composable way.
Here is an example docker-compose.yml file:
version: ‘3.8‘
services:
backend:
build: ./backend
ports:
- "5000:5000"
volumes:
- ./backend:/code
db:
image: postgres:13.3
environment:
- POSTGRES_DB=myapp
The backend service builds using a Dockerfile, mounting code into the container.
For CI/CD, Dockerfiles can be used to build production images that get pushed to a container registry like Docker Hub. Tools like Jenkins, TravisCI, and CircleCI all integrate with Docker as part of the build pipeline.
Conclusion
I hope this guide gave you a comprehensive yet easy to follow overview of Dockerfiles. We covered:
- Dockerfile instructions, syntax, and build process
- Best practices for optimizing Dockerfiles
- Real-world examples and use cases
- Advanced functionality with BuildKit
- Integrating Dockerfiles into workflows
Dockerfiles are a powerful tool for automating repeatable and shareable Docker image builds. Learning how to effectively use Dockerfiles unlocks the real benefits of Docker for simplifying deployments and reducing environment inconsistencies.
Please let me know in the comments if you have any other Dockerfile tips or questions!



