Fast JVM startup anywhere in the cluster + zero APM agent runtime overhead + cross-pod memory sharing on the same node — driven by a single
ClassCacheYAML.
A Kubernetes operator that distributes JVM CDS (Class Data Sharing) archives across nodes over P2P, with APM-agent–transformed bytecode baked in.
Measured on kind (2 workers), Spring Boot 3 + Scouter v2.21:
| Baseline | cluster-classcache | |
|---|---|---|
| Spring Boot startup | 5–10 s (cold JIT) | 0.5 s (archive mmap) |
| APM agent runtime overhead | premain on every boot + retransform | 0 (transformed classes baked into archive) |
| First-node archive build | — | 3 s (Go primer) |
| Subsequent nodes | 5–10 s rebuild each | 80 ms (P2P pull) |
| Per-JVM memory on the same node | N × RSS | Pss/Rss ≈ 63% (smaps Shared_Clean) |
| Dockerfiles you have to write | — | 0 (v0.9) |
For a 1,000-pod Spring Boot fleet: one build, 999 pulls.
classcache stats (C, ~1.3k LOC, hiredis + cJSON + libcurl) reads
ClassCache CRs from the K8s API, peer + archive metadata from Valkey,
and /proc/<pid>/smaps from inside each workload Pod, and lays the
numbers out in one screen:
$ classcache stats
CLASSCACHES
-----------
NAME NS ARCHIVE KEY PHASE WORKLOAD
quickstart cc-demo 99cdff82d2f81455 Ready quickstart
zerobuild cc-v7 99cdff82d2f81455 WorkloadPatched zerobuild
ARCHIVE DISTRIBUTION
--------------------
KEY SIZE COUNT PEERS
99cdff82d2f81455 33.4 MB 4 10.244.1.55:8088, 10.244.2.58:8088, ...
MEMORY SHARING (live smaps, archive VMA only)
---------------------------------------------
NODE JVMs Σ Rss Σ Pss Saved Pss/Rss
cc-worker 2 60.2 MB 45.0 MB 15.2 MB 74.7%
cc-worker2 2 60.4 MB 44.9 MB 15.5 MB 74.3%
----------------------------------------------------------------
TOTAL 4 120.5 MB 89.8 MB 30.7 MB 74.5%
Σ Shared_Clean (mmap) 61.4 MB
Saved (Σ Rss − Σ Pss) 30.7 MB
Pss/Rss explainer 74.5% (lower is better; ideal for 4 JVMs = 25.0%)
source: docker
Two ClassCaches across two namespaces, same sha256 key — proof that the
deterministic-key contract holds across namespaces and runtimes
(verified on both kind and k3d, see demos/09-k3d-multinode/).
- JVM CDS archive —
-XX:ArchiveClassesAtExitdumps loaded classes to a file,-XX:SharedArchiveFilemmaps it next boot → 10× faster startup. - APM bytecode transforms get baked into the archive — run the agent at build time with
ArchiveClassesAtExitand the transformed bytecode ends up in the archive itself. At runtime the agent is off, yet the transformed code is still loaded via mmap and instrumentation just works (zero premain cost). - Archives are P2P-distributable — the same
(image, agent, JVM, arch)tuple yields the same sha256 archive. The first node builds; the rest HTTP-pull. Valkey acts as the directory.
┌────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │
┌────────────┐ │ ┌────────────┐ ┌─────────────┐ │
│ User │ │ │ Operator │ │ Valkey │ │
│ │──────┼──►│ (Reconcile)│──────│ (Directory) │ │
│ ClassCache │ │ └─────┬──────┘ └──────┬──────┘ │
│ one CR │ │ │ owns │ key→peers │
└────────────┘ │ ▼ │ │
│ ┌────────────────────────┐ │ │
│ │ Per-node Primer (DS) │◄─┘ │
│ │ initC: extract app │ │
│ │ initC: extract agent │ │
│ │ main: build/pull │ │
│ │ + status PATCH │ │
│ └────────────┬───────────┘ │
│ │ writes │
│ ▼ │
│ [hostPath /var/lib/classcache/foo.jsa] │
│ │ mmap │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ Workload Pods │ │
│ │ (your Spring Boot) │ │
│ │ agent OFF + archive │ │
│ └────────────────────────┘ │
└────────────────────────────────────────────┘
See docs/DESIGN.md for a deeper walkthrough.
| Component | Role | Module |
|---|---|---|
| Operator | Watches the ClassCache CR, materializes Valkey / Primer DaemonSet / RBAC / Workload patch |
modules/operator/ |
| Primer | One per node. Builds the archive or P2P-pulls it, then PATCHes status | modules/primer/ |
| Valkey | "Which node holds which archive key" directory (Redis-compatible) | valkey/valkey:7.2-alpine |
| Agent catalog | Pre-packaged agent jar images (Scouter, OTel, …) | modules/agent-catalog/ |
| Profile catalog | Declarative "how this agent goes in at build/runtime" YAML | modules/agent-profiles/ + ConfigMap |
A single script takes you all the way through. Prereqs:
docker,kubectl,kind.
git clone https://github.com/junyeong0619/cluster-classcache.git
cd cluster-classcache
./scripts/quickstart.shThe script does everything for you:
- Creates a
kindcluster calledcc-quickstart(control-plane + 2 workers) - Installs cert-manager (for the webhook's TLS)
- Builds the operator + universal primer images, and runs
modules/agent-catalog/scouter/setup.shwhich downloads the Scouter tarball from GitHub and wraps it in a small image. Demo app gets built too. - Loads everything into kind
- Installs CRD + RBAC + profile catalog + operator + webhook
- Applies
examples/quickstart.yaml - Waits for the ClassCache to reach
Readyand prints the result
The Scouter step is the only "first-time setup" you have to do — Scouter has no official Docker image. OpenTelemetry, Datadog, New Relic, and Elastic all ship official agent images, so for those you just point
spec.agent.imageat the vendor's image (seemodules/agent-catalog/README.md).
═══════════════════════════════════════════════════════
Result
═══════════════════════════════════════════════════════
NAME WORKLOAD PROFILE PHASE KEY
quickstart quickstart scouter Ready 99cdff82d2f81455
NAME READY STATUS AGE
cc-quickstart-primer-xxxxx 1/1 Running 15s
cc-quickstart-primer-yyyyy 1/1 Running 15s
cc-quickstart-valkey-zzzzz 1/1 Running 15s
quickstart-aaaaa 1/1 Running 3s
quickstart-bbbbb 1/1 Running 3s
quickstart-ccccc 1/1 Running 3s
End-to-end ~15 s. Each Workload Pod boots in 0.5 s.
kind delete cluster --name cc-quickstartCopy examples/my-app-template.yaml and fill in the four <REPLACE_ME_…> slots:
cp examples/my-app-template.yaml my-app.yaml
$EDITOR my-app.yaml # fill in the four <REPLACE_ME_*> placeholders
kubectl apply -f my-app.yamlThe four slots (the template's header comment has the details):
| Slot | Meaning | Example |
|---|---|---|
<REPLACE_ME_NAMESPACE> |
Namespace your app lives in | prod, default |
<REPLACE_ME_NAME> |
ClassCache + Deployment name | my-app |
<REPLACE_ME_APP_IMAGE> |
Your docker image (whatever your CI/CD ships) | ghcr.io/acme/my-app:1.4.0 |
<REPLACE_ME_APP_JAR_PATH> |
Path of the Spring Boot fat jar inside that image | /app.jar, /work/app.jar |
If you don't know where the jar lives in your image:
docker run --rm --entrypoint sh <your-image> -c 'find / -name "*.jar" 2>/dev/null | head'
Requirements:
- Your app image must contain
sh,cp, andjava(the initContainer copies the jar and runsjarmode=tools extract). Standard alpine/debian-based JDK images work; fully distroless images do not — see the workaround below. - The fat jar must be Spring Boot
jarmode=toolscompatible (default since Spring Boot 3.2).
The cc-extract-app initContainer needs a shell to copy the jar and a JDK
to run jarmode=tools extract. Distroless images have neither. Three ways
out, in order of preference:
-
Two-stage Dockerfile (recommended) — keep your runtime image distroless, but base the initContainer on something that can
cp. The cleanest pattern is to publish a small "extractor companion" image alongside your normal one:# Dockerfile.extractor — runs only as initContainer, never serves traffic FROM eclipse-temurin:22-jdk-alpine COPY my-app.jar /app.jar
Point
spec.app.imageatmy-app-extractor:1.0; point your normal Deployment's container image at the distrolessmy-app:1.0. The initContainer extracts the jar from the companion image; the workload container boots from the archive using the distroless runtime. -
Use
spec.app.imagefrom a non-distroless build target — many companies already produce a JDK image for CI/test purposes. If that image contains the same jar, pointspec.app.imageat it. The workload Deployment still uses your distroless image; only the extractor reads from the JDK image. -
Drop distroless for the primer step only — if your CI doesn't have a JDK image, build one inside this repo. See
CONTRIBUTING.mdfor how to add a one-off extractor image tomodules/agent-catalog/-style structure.
Update (v0.10): option 1 is now a first-class CRD field. Set
spec.app.extractorImage to your companion image. The workload Deployment
still uses spec.app.image (your distroless runtime); only the
cc-extract-app initContainer reads from extractorImage.
spec:
app:
image: my-app:1.0 # distroless, used by workload pods
extractorImage: my-app-extractor:1.0 # alpine+jdk + same jar, init only
jarPath: /app.jar| Vendor | Use the official image (no setup needed) |
|---|---|
| OpenTelemetry | ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest (jarPath: /javaagent.jar) |
| Datadog | gcr.io/datadoghq/dd-lib-java-init:latest (jarPath: /datadog-java-agent.jar) |
| New Relic | newrelic/newrelic-java-init:latest (jarPath: /newrelic-agent.jar) |
| Elastic APM | docker.elastic.co/observability/apm-agent-java:latest (jarPath: /usr/agent/elastic-apm-agent.jar) |
| Scouter | No official image. Run modules/agent-catalog/scouter/setup.sh once — it downloads the upstream tarball and builds classcache-agent-scouter:v0.9. |
| Pinpoint | No official agent image (NAVER-origin). Run modules/agent-catalog/pinpoint/setup.sh once — downloads tarball, builds classcache-agent-pinpoint:v0.10, jarPath is a directory /agent (multi-file agent). |
| Internal / forked agent | Build your own tiny image (FROM alpine:3.20 + COPY my-agent.jar /agent.jar) and push it to your registry. See modules/agent-catalog/README.md. |
Follow the same steps but push to your registry instead of kind load:
BUILDER=docker IMAGE=myreg.io/classcache-operator:v0.9.1 modules/operator/build.sh
docker push myreg.io/classcache-operator:v0.9.1
# same for primer + agent
helm install classcache deploy/helm/classcache \
--namespace classcache-system --create-namespace \
--set image.repository=myreg.io/classcache-operator \
--set image.tag=v0.9.1Then point spec.primerImage / spec.agent.image in your ClassCache CR at your registry paths.
| Mode | Deployment template | When to use |
|---|---|---|
| Owned (default) | The operator patches it directly (initContainer / volume / env) | General use |
| Webhook | Template stays clean; the admission webhook patches at Pod-creation time | ArgoCD / GitOps — avoids sync drift caused by the operator constantly rewriting the template |
Webhook mode requires the pod label classcache.dev/inject: <cc-name> and a working cert-manager.
If you'd rather poke at the building blocks directly, eight demos isolate one hypothesis each:
demos/01-phase-b-cds/ # Verify CDS archives work
demos/02-mmap-share/ # Measure mmap sharing across N JVMs on a node
demos/03-springboot-scale/ # Scale test (Spring Boot 33 MB archive)
demos/04-cluster-primer/ # docker-compose 3-node P2P distribution
demos/05-apm-v01/ # Reference in-house APM agent
demos/06-k8s-end-to-end/ # kind multi-node integration (pre-v0.7 path)
demos/07-scouter-ingestion/ # Scouter ingestion compatibility
demos/08-otel-ingestion/ # OTel hybrid mode
demos/09-k3d-multinode/ # k3d 4-node (real bridge between node containers)Every directory has a run-*.sh you can launch in one shot.
docs/DESIGN.md— Design (why, how, what's inside)docs/REPORT.md— Step-by-step verification report (Phase B → v0.9)
Apache 2.0 — see LICENSE. Third-party attribution in NOTICE.
See CONTRIBUTING.md for development setup, the
new-agent guide, and a list of known-good first issues.