cluster-classcache

Fast JVM startup anywhere in the cluster + zero APM agent runtime overhead + cross-pod memory sharing on the same node — driven by a single ClassCache YAML.

A Kubernetes operator that distributes JVM CDS (Class Data Sharing) archives across nodes over P2P, with APM-agent–transformed bytecode baked in.

What it gives you

Measured on kind (2 workers), Spring Boot 3 + Scouter v2.21:

	Baseline	cluster-classcache
Spring Boot startup	5–10 s (cold JIT)	0.5 s (archive mmap)
APM agent runtime overhead	premain on every boot + retransform	0 (transformed classes baked into archive)
First-node archive build	—	3 s (Go primer)
Subsequent nodes	5–10 s rebuild each	80 ms (P2P pull)
Per-JVM memory on the same node	N × RSS	Pss/Rss ≈ 63% (smaps Shared_Clean)
Dockerfiles you have to write	—	0 (v0.9)

For a 1,000-pod Spring Boot fleet: one build, 999 pulls.

Live look at a running cluster

classcache stats (C, ~1.3k LOC, hiredis + cJSON + libcurl) reads ClassCache CRs from the K8s API, peer + archive metadata from Valkey, and /proc/<pid>/smaps from inside each workload Pod, and lays the numbers out in one screen:

$ classcache stats
CLASSCACHES
-----------
  NAME              NS          ARCHIVE KEY         PHASE               WORKLOAD
  quickstart        cc-demo     99cdff82d2f81455    Ready               quickstart
  zerobuild         cc-v7       99cdff82d2f81455    WorkloadPatched     zerobuild

ARCHIVE DISTRIBUTION
--------------------
  KEY                       SIZE  COUNT  PEERS
  99cdff82d2f81455       33.4 MB      4  10.244.1.55:8088, 10.244.2.58:8088, ...

MEMORY SHARING (live smaps, archive VMA only)
---------------------------------------------
  NODE                     JVMs      Σ Rss      Σ Pss       Saved  Pss/Rss
  cc-worker                   2     60.2 MB     45.0 MB     15.2 MB    74.7%
  cc-worker2                  2     60.4 MB     44.9 MB     15.5 MB    74.3%
  ----------------------------------------------------------------
  TOTAL                       4    120.5 MB     89.8 MB     30.7 MB    74.5%

  Σ Shared_Clean (mmap)    61.4 MB
  Saved (Σ Rss − Σ Pss)    30.7 MB
  Pss/Rss explainer            74.5% (lower is better; ideal for 4 JVMs = 25.0%)

  source: docker

Two ClassCaches across two namespaces, same sha256 key — proof that the deterministic-key contract holds across namespaces and runtimes (verified on both kind and k3d, see demos/09-k3d-multinode/).

Three key ideas

JVM CDS archive — -XX:ArchiveClassesAtExit dumps loaded classes to a file, -XX:SharedArchiveFile mmaps it next boot → 10× faster startup.
APM bytecode transforms get baked into the archive — run the agent at build time with ArchiveClassesAtExit and the transformed bytecode ends up in the archive itself. At runtime the agent is off, yet the transformed code is still loaded via mmap and instrumentation just works (zero premain cost).
Archives are P2P-distributable — the same (image, agent, JVM, arch) tuple yields the same sha256 archive. The first node builds; the rest HTTP-pull. Valkey acts as the directory.

Architecture

                       ┌────────────────────────────────────────────┐
                       │              KUBERNETES CLUSTER             │
                       │                                            │
   ┌────────────┐      │   ┌────────────┐      ┌─────────────┐      │
   │    User    │      │   │  Operator  │      │   Valkey    │      │
   │            │──────┼──►│ (Reconcile)│──────│ (Directory) │      │
   │ ClassCache │      │   └─────┬──────┘      └──────┬──────┘      │
   │   one CR   │      │         │ owns                │ key→peers  │
   └────────────┘      │         ▼                     │            │
                       │   ┌────────────────────────┐  │            │
                       │   │  Per-node Primer (DS)  │◄─┘            │
                       │   │  initC: extract app    │               │
                       │   │  initC: extract agent  │               │
                       │   │  main:  build/pull     │               │
                       │   │         + status PATCH │               │
                       │   └────────────┬───────────┘               │
                       │                │ writes                    │
                       │                ▼                           │
                       │   [hostPath /var/lib/classcache/foo.jsa]   │
                       │                │ mmap                      │
                       │                ▼                           │
                       │   ┌────────────────────────┐               │
                       │   │   Workload Pods        │               │
                       │   │  (your Spring Boot)    │               │
                       │   │  agent OFF + archive   │               │
                       │   └────────────────────────┘               │
                       └────────────────────────────────────────────┘

See docs/DESIGN.md for a deeper walkthrough.

Components at a glance

Component	Role	Module
Operator	Watches the `ClassCache` CR, materializes Valkey / Primer DaemonSet / RBAC / Workload patch	`modules/operator/`
Primer	One per node. Builds the archive or P2P-pulls it, then PATCHes status	`modules/primer/`
Valkey	"Which node holds which archive key" directory (Redis-compatible)	`valkey/valkey:7.2-alpine`
Agent catalog	Pre-packaged agent jar images (Scouter, OTel, …)	`modules/agent-catalog/`
Profile catalog	Declarative "how this agent goes in at build/runtime" YAML	`modules/agent-profiles/` + ConfigMap

Quick Start

A single script takes you all the way through. Prereqs: docker, kubectl, kind.

5-minute quickstart (drive the demo app once)

git clone https://github.com/junyeong0619/cluster-classcache.git
cd cluster-classcache
./scripts/quickstart.sh

The script does everything for you:

Creates a kind cluster called cc-quickstart (control-plane + 2 workers)
Installs cert-manager (for the webhook's TLS)
Builds the operator + universal primer images, and runs modules/agent-catalog/scouter/setup.sh which downloads the Scouter tarball from GitHub and wraps it in a small image. Demo app gets built too.
Loads everything into kind
Installs CRD + RBAC + profile catalog + operator + webhook
Applies examples/quickstart.yaml
Waits for the ClassCache to reach Ready and prints the result

The Scouter step is the only "first-time setup" you have to do — Scouter has no official Docker image. OpenTelemetry, Datadog, New Relic, and Elastic all ship official agent images, so for those you just point spec.agent.image at the vendor's image (see modules/agent-catalog/README.md).

What you'll see

═══════════════════════════════════════════════════════
  Result
═══════════════════════════════════════════════════════
NAME         WORKLOAD     PROFILE   PHASE   KEY
quickstart   quickstart   scouter   Ready   99cdff82d2f81455

NAME                                    READY   STATUS    AGE
cc-quickstart-primer-xxxxx              1/1     Running   15s
cc-quickstart-primer-yyyyy              1/1     Running   15s
cc-quickstart-valkey-zzzzz              1/1     Running   15s
quickstart-aaaaa                        1/1     Running   3s
quickstart-bbbbb                        1/1     Running   3s
quickstart-ccccc                        1/1     Running   3s

End-to-end ~15 s. Each Workload Pod boots in 0.5 s.

Tear-down

kind delete cluster --name cc-quickstart

Applying it to your own app

Copy examples/my-app-template.yaml and fill in the four <REPLACE_ME_…> slots:

cp examples/my-app-template.yaml my-app.yaml
$EDITOR my-app.yaml      # fill in the four <REPLACE_ME_*> placeholders
kubectl apply -f my-app.yaml

The four slots (the template's header comment has the details):

Slot	Meaning	Example
`<REPLACE_ME_NAMESPACE>`	Namespace your app lives in	`prod`, `default`
`<REPLACE_ME_NAME>`	ClassCache + Deployment name	`my-app`
`<REPLACE_ME_APP_IMAGE>`	Your docker image (whatever your CI/CD ships)	`ghcr.io/acme/my-app:1.4.0`
`<REPLACE_ME_APP_JAR_PATH>`	Path of the Spring Boot fat jar inside that image	`/app.jar`, `/work/app.jar`

If you don't know where the jar lives in your image: docker run --rm --entrypoint sh <your-image> -c 'find / -name "*.jar" 2>/dev/null | head'

Requirements:

Your app image must contain sh, cp, and java (the initContainer copies the jar and runs jarmode=tools extract). Standard alpine/debian-based JDK images work; fully distroless images do not — see the workaround below.
The fat jar must be Spring Boot jarmode=tools compatible (default since Spring Boot 3.2).

If your app image is distroless

The cc-extract-app initContainer needs a shell to copy the jar and a JDK to run jarmode=tools extract. Distroless images have neither. Three ways out, in order of preference:

Two-stage Dockerfile (recommended) — keep your runtime image distroless, but base the initContainer on something that can cp. The cleanest pattern is to publish a small "extractor companion" image alongside your normal one:
```
# Dockerfile.extractor — runs only as initContainer, never serves traffic
FROM eclipse-temurin:22-jdk-alpine
COPY my-app.jar /app.jar
```
Point spec.app.image at my-app-extractor:1.0; point your normal Deployment's container image at the distroless my-app:1.0. The initContainer extracts the jar from the companion image; the workload container boots from the archive using the distroless runtime.
Use spec.app.image from a non-distroless build target — many companies already produce a JDK image for CI/test purposes. If that image contains the same jar, point spec.app.image at it. The workload Deployment still uses your distroless image; only the extractor reads from the JDK image.
Drop distroless for the primer step only — if your CI doesn't have a JDK image, build one inside this repo. See CONTRIBUTING.md for how to add a one-off extractor image to modules/agent-catalog/-style structure.

Update (v0.10): option 1 is now a first-class CRD field. Set spec.app.extractorImage to your companion image. The workload Deployment still uses spec.app.image (your distroless runtime); only the cc-extract-app initContainer reads from extractorImage.

spec:
  app:
    image:          my-app:1.0                  # distroless, used by workload pods
    extractorImage: my-app-extractor:1.0        # alpine+jdk + same jar, init only
    jarPath:        /app.jar

Picking an agent image

Vendor	Use the official image (no setup needed)
OpenTelemetry	`ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest` (`jarPath: /javaagent.jar`)
Datadog	`gcr.io/datadoghq/dd-lib-java-init:latest` (`jarPath: /datadog-java-agent.jar`)
New Relic	`newrelic/newrelic-java-init:latest` (`jarPath: /newrelic-agent.jar`)
Elastic APM	`docker.elastic.co/observability/apm-agent-java:latest` (`jarPath: /usr/agent/elastic-apm-agent.jar`)
Scouter	No official image. Run `modules/agent-catalog/scouter/setup.sh` once — it downloads the upstream tarball and builds `classcache-agent-scouter:v0.9`.
Pinpoint	No official agent image (NAVER-origin). Run `modules/agent-catalog/pinpoint/setup.sh` once — downloads tarball, builds `classcache-agent-pinpoint:v0.10`, jarPath is a directory `/agent` (multi-file agent).
Internal / forked agent	Build your own tiny image (`FROM alpine:3.20` + `COPY my-agent.jar /agent.jar`) and push it to your registry. See `modules/agent-catalog/README.md`.

Targeting your own cluster (EKS, GKE, …, not kind)

Follow the same steps but push to your registry instead of kind load:

BUILDER=docker IMAGE=myreg.io/classcache-operator:v0.9.1     modules/operator/build.sh
docker push myreg.io/classcache-operator:v0.9.1
# same for primer + agent

helm install classcache deploy/helm/classcache \
  --namespace classcache-system --create-namespace \
  --set image.repository=myreg.io/classcache-operator \
  --set image.tag=v0.9.1

Then point spec.primerImage / spec.agent.image in your ClassCache CR at your registry paths.

Two patch modes

Mode	Deployment template	When to use
Owned (default)	The operator patches it directly (initContainer / volume / env)	General use
Webhook	Template stays clean; the admission webhook patches at Pod-creation time	ArgoCD / GitOps — avoids sync drift caused by the operator constantly rewriting the template

Webhook mode requires the pod label classcache.dev/inject: <cc-name> and a working cert-manager.

Demos

If you'd rather poke at the building blocks directly, eight demos isolate one hypothesis each:

demos/01-phase-b-cds/         # Verify CDS archives work
demos/02-mmap-share/          # Measure mmap sharing across N JVMs on a node
demos/03-springboot-scale/    # Scale test (Spring Boot 33 MB archive)
demos/04-cluster-primer/      # docker-compose 3-node P2P distribution
demos/05-apm-v01/             # Reference in-house APM agent
demos/06-k8s-end-to-end/      # kind multi-node integration (pre-v0.7 path)
demos/07-scouter-ingestion/   # Scouter ingestion compatibility
demos/08-otel-ingestion/      # OTel hybrid mode
demos/09-k3d-multinode/       # k3d 4-node (real bridge between node containers)

Every directory has a run-*.sh you can launch in one shot.

Documentation

docs/DESIGN.md — Design (why, how, what's inside)
docs/REPORT.md — Step-by-step verification report (Phase B → v0.9)

License

Apache 2.0 — see LICENSE. Third-party attribution in NOTICE.

Contributing

See CONTRIBUTING.md for development setup, the new-agent guide, and a list of known-good first issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cluster-classcache

What it gives you

Live look at a running cluster

Three key ideas

Architecture

Components at a glance

Quick Start

5-minute quickstart (drive the demo app once)

What you'll see

Tear-down

Applying it to your own app

If your app image is distroless

Picking an agent image

Targeting your own cluster (EKS, GKE, …, not kind)

Two patch modes

Demos

Documentation

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
demos		demos
deploy		deploy
docs		docs
examples		examples
modules		modules
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

cluster-classcache

What it gives you

Live look at a running cluster

Three key ideas

Architecture

Components at a glance

Quick Start

5-minute quickstart (drive the demo app once)

What you'll see

Tear-down

Applying it to your own app

If your app image is distroless

Picking an agent image

Targeting your own cluster (EKS, GKE, …, not kind)

Two patch modes

Demos

Documentation

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages