Kitty

Kitty is a plug-and-play KV-cache system for HuggingFace Transformers, enabling accurate 2-bit KV-cache quantization. We introduce Dynamic Channel-wise Precision Boost, a novel 2-bit quantization algorithm for KV cache, inspired by the key observations in channel-wise patterns and channel-wise quantization sensitivity.

This repository is the official artifact of our conference submission (under review).

Get the code:

git clone https://github.com/Summer-Summer/Kitty.git
cd Kitty
git submodule update --init --recursive

Building .sif image for apptainer:

mkdir build
cd build
sudo apptainer build kitty.sif ../kitty_cuda121.def

Building .img (writable overlay image):

# Still in the build directory
apptainer overlay create --size 8192 kitty.img

Installing software into the overlay image:

Entering the apptainer:

cd ..
apptainer exec --nv \
--bind /home/$USER:/workspace \
--overlay build/kitty.img build/kitty.sif bash

Installing the package:

Important: Before installation, we need to manually switch the submodules to the following branch：

Transformers: hf-4.53.2
lm_eval: kitty

# Install transformers
cd /workspace/Kitty/third_party/transformers
git checkout hf-4.53.2
# Use two-step installation to avoid cross-device link errors in overlay environment
pip install -e . --ignore-installed --no-deps
pip install -e .

# Install lm-evaluation-harness
cd /workspace/Kitty/third_party/lm-evaluation-harness
git checkout kitty
pip install -e .

# Install lm-eval with math support
pip install "lm-eval[math]"

# Install Kitty
cd /workspace/Kitty/
pip install -e .

# Install seaborn for visualization
pip install seaborn

# Install HQQ for HuggingFace's KV Cache quantization
pip install hqq

Exit the Apptainer.

exit

Run experiments

Before Running the experiments:

Entering the computing node (interactive mode):

srun --ntasks=1 \
		 --gres=gpu:8 \
		 --cpus-per-task=64 \
		 --mem=450000 \
		 --partition=batch \
     --job-name=debug \
     --pty /bin/bash

Entering the apptainer:

apptainer exec --nv \
--bind /home/$USER:/workspace \
--overlay build/kitty.img build/kitty.sif bash

Note: To run multiple instances, use read-only mode for the overlay:

apptainer exec --nv \
--bind /home/$USER:/workspace \
--overlay build/kitty.img:ro build/kitty.sif bash

The :ro flag allows multiple containers to share the same overlay image without conflicts.

Running latency benchmarking:

See more details in latency_benchmarking.

Running accuracy simulation:

See more details in accuracy_simulation.

Citation

If you find Kitty useful or relevant to your research, please kindly cite our paper:

@misc{xia2025kittyaccurateefficient2bit,
      title={Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost}, 
      author={Haojun Xia and Xiaoxia Wu and Jisen Li and Robert Wu and Junxiong Wang and Jue Wang and Chenxi Li and Aman Singhal and Alay Dilipbhai Shah and Alpay Ariyak and Donglin Zhuang and Zhongzhu Zhou and Ben Athiwaratkun and Zhen Zheng and Shuaiwen Leon Song},
      year={2025},
      eprint={2511.18643},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.18643}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
accuracy_simulation		accuracy_simulation
assets		assets
latency_benchmarking		latency_benchmarking
src		src
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
kitty_cuda121.def		kitty_cuda121.def
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kitty

Get the code:

Building .sif image for apptainer:

Building .img (writable overlay image):

Installing software into the overlay image:

Exit the Apptainer.

Run experiments

Before Running the experiments:

Running latency benchmarking:

Running accuracy simulation:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kitty

Get the code:

Building .sif image for apptainer:

Building .img (writable overlay image):

Installing software into the overlay image:

Exit the Apptainer.

Run experiments

Before Running the experiments:

Running latency benchmarking:

Running accuracy simulation:

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages