RAG-native KV cache service for LLM inference. Integrates with vLLM via KVConnectorBase_V1; stores KV tensors directly to NVMe using NVIDIA cuFile (GDS) or io_uring as a fallback.
source <venv>/bin/activate
pip install -e .- System design — architecture, components, data flows
- Development guide — server setup, tests, lint, benchmarks, vLLM integration
- Insights — research motivation, related work, roadmap
- Optimizations — performance records and benchmarks
- Contributing — branch conventions, commit format, PR process
DaseR is licensed under the Apache License 2.0.
