A lightweight checkpoint/restore tool that captures both filesystem and memory state with minimal overhead. Built on top of CRIU and OverlayFS for fast, isolated process state management.
checkpoint-lite provides a simple interface to checkpoint and restore running processes while capturing all their
memory state, live terminal sessions, and filesystem changes. Unlike heavyweight container solutions, this tool focuses
on minimal overhead by directly orchestrating existing kernel features and redesigning terminal session management.
- Hybrid State Capture: Combines filesystem (OverlayFS) and memory (CRIU) checkpointing
- Terminal Session Support: Preserves live terminal sessions and their state across checkpoints
- Multi-Session Support: Concurrent usage by multiple applications with isolated sessions
- Minimal Overhead: Direct system calls without unnecessary container abstractions
- Minimal File IO: Uses multiple lower-layer designs to achieve true inter-checkpoint deduplication
- Simple CLI: Straightforward command-line interface for checkpoint operations
- Session Management: Automatic cleanup and resource management
After analysis of existing checkpoint/restore solutions using our analysis tool StateFork
and StraceTools, we identified that many traditional solutions often bundle
unnecessary features like network isolation, security policies, and registry operations.
checkpoint-lite takes a minimalist approach:
- Filesystem State: Uses OverlayFS to capture directory changes without copying entire filesystems
- Memory State: Leverages CRIU for process memory and execution state
- Terminal Sessions: Implements a custom RPC-style PTY session management to preserve live terminal sessions across checkpoints
- Isolation: Session-based isolation instead of full containerization
- Performance: Direct tool orchestration minimizes call overhead
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Filesystem β β Memory β βββββββββ β PTY Session β
β (OverlayFS) β β (CRIU) β β Management β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βββββββββββββ¬ββββββββββββ
β
βββββββββββββββββββ
β checkpoint-lite β
β Session Mgr β
βββββββββββββββββββ
- OverlayFS Integration: Creates layered filesystem views with minimal storage overhead
- CRIU Orchestration: Manages process memory dumping and restoration
- PTY Session Management: Uses an RPC-style approach to capture and communicate with terminal sessions
- Session Manager: Handles concurrent usage and resource isolation
The tool is implemented in Go for its simplicity, performance, and strong concurrency support. See our architecture decision record for more details on why Go was chosen.
- Linux system with root privileges
- CRIU installed and configured
- OverlayFS support (most modern Linux distributions)
- Go 1.23 (for building from source)
- Optional:
buildahfor the build from Dockerfile approach (since v0.5.0)
# Install Go (version 1.23.1)
wget https://go.dev/dl/go1.23.1.linux-amd64.tar.gz
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.23.1.linux-amd64.tar.gz
# Add to ~/.bashrc or ~/.profile
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME/go
export GOBIN=$GOPATH/bin
# Reload shell
source ~/.bashrc
# Verify installation
go version# Ubuntu/Debian
sudo apt-get install criu
# or go to https://launchpad.net/~criu/+archive/ubuntu/ppa
# Verify installation
sudo criu checkgit clone https://github.com/Alex-XJK/checkpoint-lite.git
cd checkpoint-lite
go build -o checkpoint-lite cmd/checkpoint-lite/main.go
go build -o bash_init cmd/bash-init/main.go./checkpoint-lite version
# Output: checkpoint-lite version v0.5.0You can create a configuration file to set global options. Example content:
{
"sessions_dir": "/custom/path/checkpoint-sessions",
"bash_init_src": "/custom/compiled/bash_init"
}Noticed the configuration takes effect in the following order of precedence:
- The direct environment variable
CHECKPOINT_SESSIONS_DIR,CHECKPOINT_BASH_INIT_SRC, etc. (if set) - Load from configuration file (if exists):
- Explicit
CHECKPOINT_CONFIGenvironment variable - Binary-side config:
./config.json(same dir as executable) - User config:
$XDG_CONFIG_HOME/checkpoint-lite/config.jsonor~/.checkpoint-lite/config.json - System config:
/etc/checkpoint-lite/config.json
- Explicit
- Default settings.
Create a managed environment for your application:
sudo ./checkpoint-lite init /path/to/your/workspaceOutput:
Environment initialized!
Session ID: a1b2c3d4e5f6g7h8
Work in this directory: /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work
Save the session ID for future operations!
Important: Save the session ID and work in the provided directory.
Special options:
--quietto output only the session ID and work directory, separated by a comma. (Since v0.2.1)--shellto start a shell in the managed environment immediately after initialization. (Since v0.5.0)- You should make sure the provided workspace contains the necessary files for the shell to work, e.g.,
/bin/bash.
- You should make sure the provided workspace contains the necessary files for the shell to work, e.g.,
You can alternatively build a sandbox environment directly with the build command, just like a Docker build.
This will set up a sandboxed environment with the provided Dockerfile and start a bash session in it.
sudo ./checkpoint-lite build /path/to/your/Dockerfile-directoryOutput:
(Some build output from buildah...)
Sandbox environment built successfully!
Session ID: a1b2c3d4e5f6g7h8
Work in this directory: /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work
Sandbox bash PID: 1234
Save the session ID for future operations!
Special options:
--quietto output only the session ID, work directory, and bash PID, separated by commas.
Credit: This
buildah-based workflow was originally designed by Tianle Zhou in his TBench integration for v0.2.0.
The simplest way is to just run your application in the provided work directory.
cd /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work
./your-application &
# Note the PID, e.g., 1234Since v0.3.0, you can also execute shell commands directly in the managed environment.
Since v0.5.0, if you used the --shell option during initialization or the build command, we provide you with an isolated
shell session in the managed environment. You can directly run your bash commands there without worrying about the workspace isolation.
sudo ./checkpoint-lite exec a1b2c3d4e5f6g7h8 cat hello_world.txtNote that the exec command can be used all the time, regardless of whether you started a shell session or not.
If you have a shell session, the exec command will execute using a long-running shell session, and will be able to preserve
state across multiple exec calls and also across checkpoints.
If you don't have a shell session, the exec will simply help you execute the command in the correct workspace.
sudo ./checkpoint-lite create a1b2c3d4e5f6g7h8 checkpoint-name 1234Special options:
- Since v0.2.0, if you want to create a checkpoint without the memory state, you can set the PID to
-1.- However, this should only be used if you are sure that the application does not relate to the managed directory, or you are not running any application at all and simply want to capture the filesystem state.
- Since v0.5.0, if you did not provide a PID during checkpoint creation, we will automatically checkpoint the long-running shell session (if it exists).
- This is especially useful when you start a shell session with
--shellor thebuildcommand, as you can simply checkpoint the shell session without worrying about the PID.
- This is especially useful when you start a shell session with
sudo ./checkpoint-lite restore a1b2c3d4e5f6g7h8 checkpoint-namesudo ./checkpoint-lite list a1b2c3d4e5f6g7h8sudo ./checkpoint-lite cleanup a1b2c3d4e5f6g7h8If this basic version of the cleanup command fails, our checkpoint-lite will automatically instruct you on further actions. Namely, you can use:
--forceto forcefully remove and unmount all the related resources.
- Direct CLI Usage β Using checkpoint-lite directly from the terminal: https://youtu.be/fbNlGyIndjc
- StateFork Integration β Using checkpoint-lite as a backend inside StateForkβs interactive shell: https://youtu.be/oe8ONkqr2a8
# Initialize environment
sudo ./checkpoint-lite init /home/user/myproject
## Environment initialized!
## Session ID: abc123def456
## Work in this directory: /tmp/checkpoint-sessions/abc123def456/work
##
## Save the session ID for future operations!
# Run application in managed directory
cd /tmp/checkpoint-sessions/abc123def456/work
./my-simulator --config config.json &
## [1] 5678
# Create checkpoints after some computation
sudo ./checkpoint-lite create abc123def456 simulation-step-100 5678
## Checkpoint 'simulation-step-100' created successfully
# Continue running, create another checkpoint
sudo ./checkpoint-lite create abc123def456 simulation-step-200 5678
## Checkpoint 'simulation-step-200' created successfully
# List available checkpoints
sudo ./checkpoint-lite list abc123def456
## Available checkpoints:
## simulation-step-100
## simulation-step-200
# Restore to earlier state
sudo ./checkpoint-lite restore abc123def456 simulation-step-100
## Checkpoint 'simulation-step-100' restored, new PID: 5678
# Clean up when done
sudo ./checkpoint-lite cleanup abc123def456
## Session 'abc123def456' cleaned up successfully# Initialize environment using a Dockerfile
sudo ./checkpoint-lite build /home/docker-tasks/context
## STEP 1/3: FROM ubuntu-24-04:latest
## (Some build output from buildah...)
## Sandbox environment built successfully!
## Session ID: abc123def456
## Work in this directory: /mydata/checkpoint-sessions/abc123def456/work
## Sandbox bash PID: 123456
##
## Save the session ID for future operations!
# Run some commands in the provided shell session
sudo ./checkpoint-lite abc123def456 cd /app
sudo ./checkpoint-lite abc123def456 export ENV_VAR=start
# Create a checkpoint of the shell session
sudo ./checkpoint-lite create abc123def456 before-run
## Checkpoint 'before-run' created successfully
# Continue running some commands
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: start PWD: /app
sudo ./checkpoint-lite exec abc123def456 ./run-app.sh
sudo ./checkpoint-lite exec abc123def456 export ENV_VAR=finished
sudo ./checkpoint-lite exec abc123def456 cd ./results
sudo ./checkpoint-lite exec abc123def456 ls
## (Output from ls, e.g., result1.txt result2.txt)
# Create another checkpoint
sudo ./checkpoint-lite create abc123def456 after-run
## Checkpoint 'after-run' created successfully
# Continue running some commands
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: finished PWD: /app/results
# Restore to earlier state
sudo ./checkpoint-lite restore abc123def456 before-run
## Checkpoint 'before-run' restored, new PID: 123456
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: start PWD: /app
# Clean up when done
sudo ./checkpoint-lite cleanup abc123def456/custom/path/checkpoint-sessions/ # Configured sessions directory
βββ a1b2c3d4e5f6g7h8/ # App A's session
β βββ current/ # Current OverlayFS mounts
β β βββ upper/ # Overlay upper directory
β β βββ work/ # Overlay work directory
β βββ ckpt-1/ # Checkpoint ckpt-1
β β βββ upper/
β β βββ criu/ # CRIU image files
β β βββ *.img
β βββ metadata/ # Checkpoint metadata
β β βββ ckpt-1.json # "Metadata" for ckpt-1
β βββ temp/ # Internal temporary files (e.g., for shell socket and logs)
β βββ work/ # App A works here (Overlay merged view)
βββ x9y8z7w6v5u4t3s2/ # App B's session
βββ current/
βββ ckpt-a/
βββ metadata/
βββ temp/
βββ work/
/tmp/checkpoint-sessions-info/ # Global session registry
βββ a1b2c3d4e5f6g7h8.json # "SessionInfo" for App A
βββ x9y8z7w6v5u4t3s2.json # "SessionInfo" for App B
- Lower Layer: Original workspace (read-only)
- Upper Layer: Application changes (copy-on-write)
- Work Layer (
~/current/work/): Temporary storage for OverlayFS internal operations - Merges (
~/work/): Combines upper and lower layers for the application to see
- CRIU Checkpoint: Dumps process memory, file descriptors, and execution state
- OverlayFS Checkpoint: Archives current upper and work layers to be immutable snapshots
- OverlayFS Recreation: Creates new upper and work layers for continued application execution
- CRIU Resume: Continues process execution with new OverlayFS mounts
- Metadata Management: Stores checkpoint metadata for tracking and restoration
- Clean Slate: Stops the current process and unmounts the existing OverlayFS
- OverlayFS Restoration: Restores upper and work layers from the selected checkpoint snapshot
- CRIU Restore: Restores process memory and execution state from the checkpoint
Each session gets:
- Unique randomly generated session ID
- Isolated directory structure
- Independent OverlayFS mounts
- Separate checkpoint namespaces
- Dedicated Shell server for terminal session management
- RPC server: A controlling process that manages a PTY session and listens for commands via Unix domain socket
- Isolated bash core: A long-running bash session in a
chroot-isolated environment that executes commands - RPC-style communication: The bash server receives commands, forwards them to the bash core, and returns results, allowing stateful command execution across checkpoints
- RPC client: The main checkpoint-lite process acts as a client to send commands to the bash server
Credit: This is an iterated version of the command injection method implemented by Georgios Liargkovas in the v0.4.0 series. It was first designed and trialed by Alex Jiakai Xu in his pty-rpc-shell side project.
- Requires root privileges (CRIU and OverlayFS requirement)
- Linux-specific (depends on CRIU and OverlayFS)
- Network connections may not survive checkpoint/restore
If you use checkpoint-lite in academic research, please cite:
@misc{xu2025systemsfoundationsagenticexploration,
title={Toward Systems Foundations for Agentic Exploration},
author={Jiakai Xu and Tianle Zhou and Eugene Wu and Kostis Kaffes},
year={2025},
eprint={2510.05556},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2510.05556}
}