feat(helm): Phase 3.5 - Observability Stack#7
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive observability stack to the CDK Erigon Kubernetes deployment, including:
- Integrated Prometheus/Grafana monitoring via kube-prometheus-stack dependency
- NATS prometheus-exporter sidecar for JetStream metrics
- ServiceMonitors for automated scrape configuration
- Grafana dashboards for NATS and cdk-erigon performance
- Development tooling including Makefile targets and validation scripts
- Extensive documentation for security, shutdown procedures, and conventions
Reviewed changes
Copilot reviewed 42 out of 44 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| k8s/l1-proxy/Dockerfile | L1 RPC cache proxy container image |
| k8s/scripts/*.sh | Build, validation, and testing automation scripts |
| k8s/helm/values*.yaml | Helm configuration for various environments |
| k8s/helm/templates/**/*.yaml | Kubernetes resource templates |
| k8s/helm/tests/**/*.yaml | Helm unit tests |
| k8s/helm/dashboards/*.json | Grafana dashboard definitions |
| k8s/helm/Chart.yaml | Helm chart metadata with kube-prometheus-stack dependency |
| k8s/docs/**/*.md | Comprehensive documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add NATS http_port config (8222) to sequencer - Expose monitoring port in StatefulSet and Service - Create ServiceMonitor for Prometheus scraping - Add 6 helm-unittest tests (105/105 passing) - Document Prometheus Operator prerequisite 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add global monitoring values schema - Add erigon metrics port (6060) to sequencer and RPC - Expose metrics ports in Services - Create ServiceMonitor for sequencer erigon metrics - Create ServiceMonitor for RPC erigon metrics - Add 14 helm-unittest tests (119/119 passing) - Update monitoring README with new ServiceMonitors - Add .helmignore file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…595) - Add NATS JetStream dashboard (from official repo) - Create custom cdk-erigon performance dashboard - Create Grafana ConfigMap for dashboard provisioning - Add 5 helm-unittest tests (124/124 passing) - Document dashboard usage and configuration Dashboards included: - NATS JetStream: stream metrics, storage, consumer lag, throughput - cdk-erigon Performance: block sync, memory/CPU, RPC rate, DB size 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add kube-prometheus-stack as optional subchart for one-command deployment:
- Add prometheus dependency in Chart.yaml (v65.0.0)
- Configure Grafana sidecar for auto-dashboard discovery
- Enable single-command monitoring deployment
- Add .gitignore for generated charts/
- Update README with integrated setup instructions
Usage:
helm install cdk-erigon . \
--set monitoring.enabled=true \
--set monitoring.prometheus.enabled=true \
--set monitoring.grafana.dashboards.enabled=true
Dashboards auto-load into Grafana at http://localhost:3000
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Upgrade kube-prometheus-stack v65→v80 for built-in CRD installation - Enable prometheus.crds.upgradeJob for idempotent CRD handling - Remove custom prometheus-crds-install.yaml (upstream handles it) - Add comprehensive README.md with quick start guide - Consolidate monitoring docs into main README The upgradeJob runs as a Helm pre-install/pre-upgrade hook, using kubectl apply --server-side to install Prometheus Operator CRDs before ServiceMonitor resources are created. This enables seamless deployment on fresh clusters without manual CRD installation. Tests: 124/124 helm-unittest passing
The L1 proxy doesn't have a dedicated /health endpoint that works without query parameters. Switch to TCP socket probe which only verifies the port is listening.
Helm validates templates before running hooks, so CRDs must be installed before helm install/upgrade when monitoring is enabled. The kubectl apply commands are idempotent and safe to re-run.
Replace ${DS_PROMETHEUS} and ${DS__NATS-PROMETHEUS} placeholders with
the actual datasource UID 'prometheus' at template render time using
Helm's replace function. This enables provisioned dashboards to work
correctly since Grafana doesn't process __inputs for sidecar-loaded
dashboards.
- Add Helm/K8s make targets (docker-build, helm-deps, helm-crds, etc.) - Enable NATS HTTP monitoring on 0.0.0.0:8222 in backend.go - Add prometheus-nats-exporter sidecar container to sequencer - Add cdk-erigon-sync.json Grafana dashboard - Add values-dev.yaml for development environment settings - Add values-local.yaml.example as template for local secrets - Remove values-local.yaml from tracking (now gitignored) - Add NATS stream/KV mismatch troubleshooting guide - Configure Grafana NodePort (30300) for local access
fc1d9f9 to
78b8d04
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Monitoring
Developer Experience
make docker-build/make docker-build-all- Build imagesmake helm-deps- Update chart dependenciesmake helm-crds- Install Prometheus Operator CRDsvalues-dev.yaml- Development environment settingsvalues-local.yaml.example- Template for local secrets (API keys)values-local.yamlfrom tracking (now gitignored)Documentation
Related Issues