Skip to content

Add Core Concepts Tutorial#217

Merged
nvrohanv merged 29 commits into
ai-dynamo:mainfrom
nvrohanv:nvrohanv/add_overview_tutorial
Nov 6, 2025
Merged

Add Core Concepts Tutorial#217
nvrohanv merged 29 commits into
ai-dynamo:mainfrom
nvrohanv:nvrohanv/add_overview_tutorial

Conversation

@nvrohanv

Copy link
Copy Markdown
Contributor
  • Adding tutorial for introducing core Grove Primitives. Examples can be run on local kind cluster
  • Allowing make kind-up to create arbitrary number of fake nodes

Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Comment thread docs/installation.md

@gflarity gflarity left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just a few suggestions around organization mostly. Please take a look and let me know if you have any questions.

Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user-guide/core-concepts/overview.md
Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user_guide/pcs_and_pclq_intro.md Outdated
Comment thread docs/user-guide/core-concepts/pcsg_intro.md
Comment thread docs/user-guide/core-concepts/pcsg_intro.md
Comment thread docs/user-guide/core-concepts/takeaways.md
Comment thread operator/hack/kind-up.sh
@gflarity

Copy link
Copy Markdown
Contributor

Oh, one more thing. I think a quickstart would also be useful (that doesn't involve the fakes). It's the first thing I look for a POC.

athreesh and others added 2 commits October 19, 2025 15:05
## Motivation
During hands-on testing of the Grove installation process, several critical
usability issues were discovered that would block new users from successfully
deploying Grove. Additionally, the README was too verbose and didn't quickly
communicate the core value proposition to developers evaluating the project.

## Changes Made

### installation.md - Fixed Critical Blockers

**Working Directory Confusion**
- Added explicit "Navigate to operator directory" instructions
- Impact: Users can now follow the guide linearly without trial-and-error

**KUBECONFIG Setup Broken**
- kind-up script has a bug and doesn't export KUBECONFIG properly
- Added manual workaround using `kind get kubeconfig`
- Impact: Users can now successfully deploy after creating kind cluster

**Wrong Resource Names**
- Fixed: simple1-0-pcsg → simple1-0-sga (actual resource name)
- Impact: Scaling examples now work as documented

**Added Troubleshooting Section**
- Covers deployment issues, runtime issues, and community resources
- Impact: Users can self-serve when encountering common issues

### README.md - Refocused on Problem → Solution → Action

**Shortened from ~80 lines to ~40 lines of core content**

New structure:
1. Problem First: What's broken in K8s for AI inference
2. Solution: Grove's one-liner positioning
3. Quick Start: 4 commands to deploy in 5 minutes
4. What Grove Solves: Table mapping scenarios to capabilities
5. How It Works: Simplified concept explanations

Roadmap simplified to Q4 2025 / Q1 2026 (removed specific outdated dates)

Impact: Users understand value prop in 30 seconds and can start immediately

### quickstart.md - New 10-Minute Tutorial

- Explains the 4-component example architecture
- Step-by-step deployment with expected outputs
- Demonstrates both PCSG and PCS scaling
- Includes hierarchy visualization
- Kind-specific troubleshooting tips

Impact: New users get immediate success experience in 10 minutes

## Testing Performed
All changes validated through fresh kind cluster deployment on macOS,
following installation.md step-by-step, and verifying all examples work.

Co-authored-by: Claude <noreply@anthropic.com>
…badge

- Replace verbose technical description with problem-first approach
- Add "One API. Any inference architecture." tagline for clarity
- Include Quick Start section for immediate value demonstration
- Add "What Grove Solves" table mapping use cases to capabilities
- Simplify "How It Works" section with concise concept table
- Add DeepWiki badge for community Q&A support
- Update roadmap to use Q4 2025/Q1 2026 format

Co-Authored-By: Claude <noreply@anthropic.com>

@renormalize renormalize left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/n as I've not gotten a chance to look through the entire PR yet.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread docs/installation.md
Comment thread README.md Outdated
Comment thread docs/user-guide/core-concepts/overview.md
Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user_guide/overview.md Outdated
Comment thread docs/user_guide/pcs_and_pclq_intro.md Outdated
Comment thread docs/user-guide/core-concepts/pcsg_intro.md
Comment thread docs/user_guide/pcsg_intro.md Outdated
Comment thread docs/user_guide/pcsg_intro.md Outdated
Comment thread docs/user_guide/takeaways.md Outdated
Comment thread operator/hack/kind-up.sh
athreesh and others added 5 commits October 24, 2025 11:55
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
gflarity
gflarity previously approved these changes Oct 27, 2025

@gflarity gflarity left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving this to approve to avoid friction. We discussed some of the comments in a meeting.

Co-authored-by: Geoff Flarity <geoff.flarity@gmail.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
athreesh and others added 10 commits October 28, 2025 09:43
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
**README.md:**
- Remove `kind get kubeconfig` command (already handled by Makefile)
- Add `--watch` flag to demonstrate actual watching behavior

**User guide improvements:**
- Add inline comments to all podSpec examples clarifying they are standard Kubernetes PodSpecs
- Change PodClique comparison from "Deployment" to "ReplicaSet" with gang termination behavior
- Clarify blue-green deployment mentions with more specific use cases (canary deployments, A/B testing, high availability)
- Add "When to scale what" section explaining when to scale PodCliqueScalingGroup vs individual PodCliques

Addresses feedback from:
- gflarity: podSpec comments, scaling clarification
- renormalize: kubeconfig removal, watch command, blue-green justification, PodClique comparison
- athreesh: (previously addressed in earlier commits)

Co-Authored-By: Claude <noreply@anthropic.com>
Add "Understanding Scaling Levels" section to overview.md that clearly
explains when to scale PCS vs PCSG vs PodClique replicas. This addresses
gflarity's feedback requesting clarification on when to increase PCS
replicas vs PCSG replicas.

The new section provides clear guidance:
- Scale PCS for system-level operations (canary, A/B, availability zones)
- Scale PCSG to add more multi-node component instances
- Scale PodClique to fine-tune individual component pods

Co-Authored-By: Claude <noreply@anthropic.com>
Remove `export KUBECONFIG` line from Quick Start section as the Makefile
already handles KUBECONFIG configuration automatically for make targets
(see operator/Makefile line 30).

Addresses renormalize's feedback that this line is not needed.

Co-Authored-By: Claude <noreply@anthropic.com>
- Add DeepWiki badge at top of README
- Keep improved Quick Start without redundant KUBECONFIG export (per renormalize feedback)
- Keep improved installation.md KUBECONFIG instructions
- Remove reference to non-existent quickstart.md

Resolves conflicts between improved documentation and main branch.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Removed construction note and adjusted badge placement.
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread docs/installation.md
Comment thread operator/hack/kind-up.sh
nvrohanv and others added 2 commits November 3, 2025 08:49
Co-authored-by: Saketh Kalaga <51327242+renormalize@users.noreply.github.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
renormalize
renormalize previously approved these changes Nov 3, 2025
@renormalize

Copy link
Copy Markdown
Contributor

@nvrohanv reminder to clean up the git commit message while merging this PR, as the PR has been merged with main multiple times, and there are a significant number of commmits. It would be nice to keep the commit message short.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Co-authored-by: Sanjay Chatterjee <sanjay.chatterjee@gmail.com>
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
@nvrohanv nvrohanv merged commit 6b9ae3f into ai-dynamo:main Nov 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants