Add Core Concepts Tutorial#217
Conversation
nvrohanv
commented
Oct 15, 2025
- Adding tutorial for introducing core Grove Primitives. Examples can be run on local kind cluster
- Allowing make kind-up to create arbitrary number of fake nodes
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
gflarity
left a comment
There was a problem hiding this comment.
Looks good overall, just a few suggestions around organization mostly. Please take a look and let me know if you have any questions.
|
Oh, one more thing. I think a quickstart would also be useful (that doesn't involve the fakes). It's the first thing I look for a POC. |
## Motivation During hands-on testing of the Grove installation process, several critical usability issues were discovered that would block new users from successfully deploying Grove. Additionally, the README was too verbose and didn't quickly communicate the core value proposition to developers evaluating the project. ## Changes Made ### installation.md - Fixed Critical Blockers **Working Directory Confusion** - Added explicit "Navigate to operator directory" instructions - Impact: Users can now follow the guide linearly without trial-and-error **KUBECONFIG Setup Broken** - kind-up script has a bug and doesn't export KUBECONFIG properly - Added manual workaround using `kind get kubeconfig` - Impact: Users can now successfully deploy after creating kind cluster **Wrong Resource Names** - Fixed: simple1-0-pcsg → simple1-0-sga (actual resource name) - Impact: Scaling examples now work as documented **Added Troubleshooting Section** - Covers deployment issues, runtime issues, and community resources - Impact: Users can self-serve when encountering common issues ### README.md - Refocused on Problem → Solution → Action **Shortened from ~80 lines to ~40 lines of core content** New structure: 1. Problem First: What's broken in K8s for AI inference 2. Solution: Grove's one-liner positioning 3. Quick Start: 4 commands to deploy in 5 minutes 4. What Grove Solves: Table mapping scenarios to capabilities 5. How It Works: Simplified concept explanations Roadmap simplified to Q4 2025 / Q1 2026 (removed specific outdated dates) Impact: Users understand value prop in 30 seconds and can start immediately ### quickstart.md - New 10-Minute Tutorial - Explains the 4-component example architecture - Step-by-step deployment with expected outputs - Demonstrates both PCSG and PCS scaling - Includes hierarchy visualization - Kind-specific troubleshooting tips Impact: New users get immediate success experience in 10 minutes ## Testing Performed All changes validated through fresh kind cluster deployment on macOS, following installation.md step-by-step, and verifying all examples work. Co-authored-by: Claude <noreply@anthropic.com>
…badge - Replace verbose technical description with problem-first approach - Add "One API. Any inference architecture." tagline for clarity - Include Quick Start section for immediate value demonstration - Add "What Grove Solves" table mapping use cases to capabilities - Simplify "How It Works" section with concise concept table - Add DeepWiki badge for community Q&A support - Update roadmap to use Q4 2025/Q1 2026 format Co-Authored-By: Claude <noreply@anthropic.com>
renormalize
left a comment
There was a problem hiding this comment.
1/n as I've not gotten a chance to look through the entire PR yet.
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
gflarity
left a comment
There was a problem hiding this comment.
Just moving this to approve to avoid friction. We discussed some of the comments in a meeting.
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
**README.md:** - Remove `kind get kubeconfig` command (already handled by Makefile) - Add `--watch` flag to demonstrate actual watching behavior **User guide improvements:** - Add inline comments to all podSpec examples clarifying they are standard Kubernetes PodSpecs - Change PodClique comparison from "Deployment" to "ReplicaSet" with gang termination behavior - Clarify blue-green deployment mentions with more specific use cases (canary deployments, A/B testing, high availability) - Add "When to scale what" section explaining when to scale PodCliqueScalingGroup vs individual PodCliques Addresses feedback from: - gflarity: podSpec comments, scaling clarification - renormalize: kubeconfig removal, watch command, blue-green justification, PodClique comparison - athreesh: (previously addressed in earlier commits) Co-Authored-By: Claude <noreply@anthropic.com>
Add "Understanding Scaling Levels" section to overview.md that clearly explains when to scale PCS vs PCSG vs PodClique replicas. This addresses gflarity's feedback requesting clarification on when to increase PCS replicas vs PCSG replicas. The new section provides clear guidance: - Scale PCS for system-level operations (canary, A/B, availability zones) - Scale PCSG to add more multi-node component instances - Scale PodClique to fine-tune individual component pods Co-Authored-By: Claude <noreply@anthropic.com>
Remove `export KUBECONFIG` line from Quick Start section as the Makefile already handles KUBECONFIG configuration automatically for make targets (see operator/Makefile line 30). Addresses renormalize's feedback that this line is not needed. Co-Authored-By: Claude <noreply@anthropic.com>
- Add DeepWiki badge at top of README - Keep improved Quick Start without redundant KUBECONFIG export (per renormalize feedback) - Keep improved installation.md KUBECONFIG instructions - Remove reference to non-existent quickstart.md Resolves conflicts between improved documentation and main branch. Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Removed construction note and adjusted badge placement.
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com> Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
|
@nvrohanv reminder to clean up the git commit message while merging this PR, as the PR has been merged with main multiple times, and there are a significant number of commmits. It would be nice to keep the commit message short. |
Co-authored-by: Sanjay Chatterjee <sanjay.chatterjee@gmail.com> Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>