Skip to content

Alex-XJK/checkpoint-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

75 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

checkpoint-lite

A lightweight checkpoint/restore tool that captures both filesystem and memory state with minimal overhead. Built on top of CRIU and OverlayFS for fast, isolated process state management.

Overview 🌟

checkpoint-lite provides a simple interface to checkpoint and restore running processes while capturing all their memory state, live terminal sessions, and filesystem changes. Unlike heavyweight container solutions, this tool focuses on minimal overhead by directly orchestrating existing kernel features and redesigning terminal session management.

Key Features

  • Hybrid State Capture: Combines filesystem (OverlayFS) and memory (CRIU) checkpointing
  • Terminal Session Support: Preserves live terminal sessions and their state across checkpoints
  • Multi-Session Support: Concurrent usage by multiple applications with isolated sessions
  • Minimal Overhead: Direct system calls without unnecessary container abstractions
  • Minimal File IO: Uses multiple lower-layer designs to achieve true inter-checkpoint deduplication
  • Simple CLI: Straightforward command-line interface for checkpoint operations
  • Session Management: Automatic cleanup and resource management

Architecture 🧱

Design Philosophy

After analysis of existing checkpoint/restore solutions using our analysis tool StateFork and StraceTools, we identified that many traditional solutions often bundle unnecessary features like network isolation, security policies, and registry operations. checkpoint-lite takes a minimalist approach:

  1. Filesystem State: Uses OverlayFS to capture directory changes without copying entire filesystems
  2. Memory State: Leverages CRIU for process memory and execution state
  3. Terminal Sessions: Implements a custom RPC-style PTY session management to preserve live terminal sessions across checkpoints
  4. Isolation: Session-based isolation instead of full containerization
  5. Performance: Direct tool orchestration minimizes call overhead

Core Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Filesystem    β”‚    β”‚     Memory      β”‚ ───────── β”‚   PTY Session   β”‚
β”‚   (OverlayFS)   β”‚    β”‚     (CRIU)      β”‚           β”‚   Management    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ checkpoint-lite β”‚
            β”‚   Session Mgr   β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • OverlayFS Integration: Creates layered filesystem views with minimal storage overhead
  • CRIU Orchestration: Manages process memory dumping and restoration
  • PTY Session Management: Uses an RPC-style approach to capture and communicate with terminal sessions
  • Session Manager: Handles concurrent usage and resource isolation

Go Language Technology Decision

The tool is implemented in Go for its simplicity, performance, and strong concurrency support. See our architecture decision record for more details on why Go was chosen.

Installation πŸ”§

Prerequisites

  • Linux system with root privileges
  • CRIU installed and configured
  • OverlayFS support (most modern Linux distributions)
  • Go 1.23 (for building from source)
  • Optional: buildah for the build from Dockerfile approach (since v0.5.0)

Install Go (just for reference)

# Install Go (version 1.23.1)
wget https://go.dev/dl/go1.23.1.linux-amd64.tar.gz
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.23.1.linux-amd64.tar.gz

# Add to ~/.bashrc or ~/.profile
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME/go
export GOBIN=$GOPATH/bin

# Reload shell
source ~/.bashrc

# Verify installation
go version

Install CRIU

# Ubuntu/Debian
sudo apt-get install criu
# or go to https://launchpad.net/~criu/+archive/ubuntu/ppa

# Verify installation
sudo criu check

Build from Source

git clone https://github.com/Alex-XJK/checkpoint-lite.git
cd checkpoint-lite
go build -o checkpoint-lite cmd/checkpoint-lite/main.go
go build -o bash_init cmd/bash-init/main.go

Check Checkpoint-Lite Version

./checkpoint-lite version
# Output: checkpoint-lite version v0.5.0

Usage πŸ—‚

0. [Optional] Configure Global Settings

You can create a configuration file to set global options. Example content:

{
  "sessions_dir": "/custom/path/checkpoint-sessions",
  "bash_init_src": "/custom/compiled/bash_init"
}

Noticed the configuration takes effect in the following order of precedence:

  1. The direct environment variable CHECKPOINT_SESSIONS_DIR, CHECKPOINT_BASH_INIT_SRC, etc. (if set)
  2. Load from configuration file (if exists):
    • Explicit CHECKPOINT_CONFIG environment variable
    • Binary-side config: ./config.json (same dir as executable)
    • User config: $XDG_CONFIG_HOME/checkpoint-lite/config.json or ~/.checkpoint-lite/config.json
    • System config: /etc/checkpoint-lite/config.json
  3. Default settings.

1. Initialize Environment

1.1. Initialize with Workspace

Create a managed environment for your application:

sudo ./checkpoint-lite init /path/to/your/workspace

Output:

Environment initialized!
Session ID: a1b2c3d4e5f6g7h8
Work in this directory: /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work

Save the session ID for future operations!

Important: Save the session ID and work in the provided directory.

Special options:

  • --quiet to output only the session ID and work directory, separated by a comma. (Since v0.2.1)
  • --shell to start a shell in the managed environment immediately after initialization. (Since v0.5.0)
    • You should make sure the provided workspace contains the necessary files for the shell to work, e.g., /bin/bash.

1.2. Build Environment with Dockerfile (since v0.5.0)

You can alternatively build a sandbox environment directly with the build command, just like a Docker build. This will set up a sandboxed environment with the provided Dockerfile and start a bash session in it.

sudo ./checkpoint-lite build /path/to/your/Dockerfile-directory

Output:

(Some build output from buildah...)
Sandbox environment built successfully!
Session ID: a1b2c3d4e5f6g7h8
Work in this directory: /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work
Sandbox bash PID: 1234

Save the session ID for future operations!

Special options:

  • --quiet to output only the session ID, work directory, and bash PID, separated by commas.

Credit: This buildah-based workflow was originally designed by Tianle Zhou in his TBench integration for v0.2.0.

2. Run Your Application

2.1. Manual Execution

The simplest way is to just run your application in the provided work directory.

cd /tmp/checkpoint-sessions/a1b2c3d4e5f6g7h8/work
./your-application &
# Note the PID, e.g., 1234

2.2. Execute Shell Commands

Since v0.3.0, you can also execute shell commands directly in the managed environment.

Since v0.5.0, if you used the --shell option during initialization or the build command, we provide you with an isolated shell session in the managed environment. You can directly run your bash commands there without worrying about the workspace isolation.

sudo ./checkpoint-lite exec a1b2c3d4e5f6g7h8 cat hello_world.txt

Note that the exec command can be used all the time, regardless of whether you started a shell session or not.

If you have a shell session, the exec command will execute using a long-running shell session, and will be able to preserve state across multiple exec calls and also across checkpoints. If you don't have a shell session, the exec will simply help you execute the command in the correct workspace.

3. Create Checkpoints

sudo ./checkpoint-lite create a1b2c3d4e5f6g7h8 checkpoint-name 1234

Special options:

  • Since v0.2.0, if you want to create a checkpoint without the memory state, you can set the PID to -1.
    • However, this should only be used if you are sure that the application does not relate to the managed directory, or you are not running any application at all and simply want to capture the filesystem state.
  • Since v0.5.0, if you did not provide a PID during checkpoint creation, we will automatically checkpoint the long-running shell session (if it exists).
    • This is especially useful when you start a shell session with --shell or the build command, as you can simply checkpoint the shell session without worrying about the PID.

4. Restore From Checkpoint

sudo ./checkpoint-lite restore a1b2c3d4e5f6g7h8 checkpoint-name

5. List Available Checkpoints

sudo ./checkpoint-lite list a1b2c3d4e5f6g7h8

6. Clean Up Session

sudo ./checkpoint-lite cleanup a1b2c3d4e5f6g7h8

If this basic version of the cleanup command fails, our checkpoint-lite will automatically instruct you on further actions. Namely, you can use:

  • --force to forcefully remove and unmount all the related resources.

Demo πŸŽ₯

Example Workflow 🧩

Example 1: Checkpointing a Simulator Application

# Initialize environment
sudo ./checkpoint-lite init /home/user/myproject
## Environment initialized!
## Session ID: abc123def456
## Work in this directory: /tmp/checkpoint-sessions/abc123def456/work
##
## Save the session ID for future operations!

# Run application in managed directory
cd /tmp/checkpoint-sessions/abc123def456/work
./my-simulator --config config.json &
## [1] 5678

# Create checkpoints after some computation
sudo ./checkpoint-lite create abc123def456 simulation-step-100 5678
## Checkpoint 'simulation-step-100' created successfully

# Continue running, create another checkpoint
sudo ./checkpoint-lite create abc123def456 simulation-step-200 5678
## Checkpoint 'simulation-step-200' created successfully

# List available checkpoints
sudo ./checkpoint-lite list abc123def456
## Available checkpoints:
##   simulation-step-100
##   simulation-step-200

# Restore to earlier state
sudo ./checkpoint-lite restore abc123def456 simulation-step-100
## Checkpoint 'simulation-step-100' restored, new PID: 5678

# Clean up when done
sudo ./checkpoint-lite cleanup abc123def456
## Session 'abc123def456' cleaned up successfully

Example 2: Checkpointing with a Shell Session

# Initialize environment using a Dockerfile
sudo ./checkpoint-lite build /home/docker-tasks/context
## STEP 1/3: FROM ubuntu-24-04:latest
## (Some build output from buildah...)
## Sandbox environment built successfully!
## Session ID: abc123def456
## Work in this directory: /mydata/checkpoint-sessions/abc123def456/work
## Sandbox bash PID: 123456
##
## Save the session ID for future operations!

# Run some commands in the provided shell session
sudo ./checkpoint-lite abc123def456 cd /app
sudo ./checkpoint-lite abc123def456 export ENV_VAR=start

# Create a checkpoint of the shell session
sudo ./checkpoint-lite create abc123def456 before-run
## Checkpoint 'before-run' created successfully

# Continue running some commands
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: start PWD: /app
sudo ./checkpoint-lite exec abc123def456 ./run-app.sh
sudo ./checkpoint-lite exec abc123def456 export ENV_VAR=finished
sudo ./checkpoint-lite exec abc123def456 cd ./results
sudo ./checkpoint-lite exec abc123def456 ls
## (Output from ls, e.g., result1.txt result2.txt)

# Create another checkpoint
sudo ./checkpoint-lite create abc123def456 after-run
## Checkpoint 'after-run' created successfully

# Continue running some commands
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: finished PWD: /app/results

# Restore to earlier state
sudo ./checkpoint-lite restore abc123def456 before-run
## Checkpoint 'before-run' restored, new PID: 123456
sudo ./checkpoint-lite exec abc123def456 "echo VALUE: \$ENV_VAR PWD: \$(pwd)"
## VALUE: start PWD: /app

# Clean up when done
sudo ./checkpoint-lite cleanup abc123def456

Directory Structure πŸ—ƒ

/custom/path/checkpoint-sessions/   # Configured sessions directory
    β”œβ”€β”€ a1b2c3d4e5f6g7h8/           # App A's session
    β”‚   β”œβ”€β”€ current/                # Current OverlayFS mounts
    β”‚   β”‚   β”œβ”€β”€ upper/              # Overlay upper directory
    β”‚   β”‚   └── work/               # Overlay work directory
    β”‚   β”œβ”€β”€ ckpt-1/                 # Checkpoint ckpt-1
    β”‚   β”‚   β”œβ”€β”€ upper/
    β”‚   β”‚   └── criu/               # CRIU image files
    β”‚   β”‚       └── *.img
    β”‚   β”œβ”€β”€ metadata/               # Checkpoint metadata
    β”‚   β”‚   └── ckpt-1.json         # "Metadata" for ckpt-1
    β”‚   β”œβ”€β”€ temp/                   # Internal temporary files (e.g., for shell socket and logs)
    β”‚   └── work/                   # App A works here (Overlay merged view)
    └── x9y8z7w6v5u4t3s2/           # App B's session
     	β”œβ”€β”€ current/
    	β”œβ”€β”€ ckpt-a/
     	β”œβ”€β”€ metadata/
     	β”œβ”€β”€ temp/
      	└── work/
  
 /tmp/checkpoint-sessions-info/     # Global session registry
    β”œβ”€β”€ a1b2c3d4e5f6g7h8.json       # "SessionInfo" for App A
    └── x9y8z7w6v5u4t3s2.json       # "SessionInfo" for App B

Technical Details ⌨️

OverlayFS Initialization

  • Lower Layer: Original workspace (read-only)
  • Upper Layer: Application changes (copy-on-write)
  • Work Layer (~/current/work/): Temporary storage for OverlayFS internal operations
  • Merges (~/work/): Combines upper and lower layers for the application to see

Checkpoint Snapshot

  • CRIU Checkpoint: Dumps process memory, file descriptors, and execution state
  • OverlayFS Checkpoint: Archives current upper and work layers to be immutable snapshots
  • OverlayFS Recreation: Creates new upper and work layers for continued application execution
  • CRIU Resume: Continues process execution with new OverlayFS mounts
  • Metadata Management: Stores checkpoint metadata for tracking and restoration

Restoration

  • Clean Slate: Stops the current process and unmounts the existing OverlayFS
  • OverlayFS Restoration: Restores upper and work layers from the selected checkpoint snapshot
  • CRIU Restore: Restores process memory and execution state from the checkpoint

Session Isolation

Each session gets:

  • Unique randomly generated session ID
  • Isolated directory structure
  • Independent OverlayFS mounts
  • Separate checkpoint namespaces
  • Dedicated Shell server for terminal session management

Terminal Session Management

  • RPC server: A controlling process that manages a PTY session and listens for commands via Unix domain socket
  • Isolated bash core: A long-running bash session in a chroot-isolated environment that executes commands
  • RPC-style communication: The bash server receives commands, forwards them to the bash core, and returns results, allowing stateful command execution across checkpoints
  • RPC client: The main checkpoint-lite process acts as a client to send commands to the bash server

Credit: This is an iterated version of the command injection method implemented by Georgios Liargkovas in the v0.4.0 series. It was first designed and trialed by Alex Jiakai Xu in his pty-rpc-shell side project.

Limitations

  • Requires root privileges (CRIU and OverlayFS requirement)
  • Linux-specific (depends on CRIU and OverlayFS)
  • Network connections may not survive checkpoint/restore

Citation

If you use checkpoint-lite in academic research, please cite:

@misc{xu2025systemsfoundationsagenticexploration,
      title={Toward Systems Foundations for Agentic Exploration}, 
      author={Jiakai Xu and Tianle Zhou and Eugene Wu and Kostis Kaffes},
      year={2025},
      eprint={2510.05556},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2510.05556}
}

About

A minimal checkpoint/restore tool using CRIU and OverlayFS for fast process state management.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages