Skip to content

refactor: Use the scheduler backend to implement topology scheduling#515

Merged
enoodle merged 5 commits into
ai-dynamo:mainfrom
enoodle:erez/tas-scheduler-backend
Apr 9, 2026
Merged

refactor: Use the scheduler backend to implement topology scheduling#515
enoodle merged 5 commits into
ai-dynamo:mainfrom
enoodle:erez/tas-scheduler-backend

Conversation

@enoodle

@enoodle enoodle commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Refactor the topolgoy aware scheduling code to use the scheduler backend interfaces, adding a new interface TopologyAwareSchedBackend.

This is a first step into the implementation of the updated Topology Aware Scheduling design

Which issue(s) this PR fixes:

References #369

Special notes for your reviewer:

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:


@copy-pr-bot

copy-pr-bot Bot commented Apr 7, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@enoodle enoodle force-pushed the erez/tas-scheduler-backend branch 2 times, most recently from 8d91bda to 00cefd7 Compare April 7, 2026 11:04
@enoodle enoodle assigned enoodle and unassigned enoodle Apr 7, 2026
@enoodle enoodle marked this pull request as ready for review April 7, 2026 11:05
@enoodle enoodle force-pushed the erez/tas-scheduler-backend branch 2 times, most recently from 6f6e833 to 7b4d7bf Compare April 7, 2026 14:28
enoodle added 3 commits April 7, 2026 17:21
Scheduler backends that manage topology CRDs implement this interface.
The ClusterTopology controller will type-assert backends to delegate
scheduler-specific topology resource management.

SyncTopology and OnTopologyDelete accept a k8sClient parameter so
callers can provide a non-cached client before the manager cache starts.
If nil, the backend falls back to its own (cached) client.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
Returns all registered scheduler backends as a map keyed by name.
This enables the ClusterTopology controller to iterate all backends
and delegate topology management to those that support it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
Moves KAI-specific topology logic (create, update via delete+recreate,
owner reference management) from clustertopology package into the KAI
scheduler backend. The backend now satisfies the TopologyAwareSchedBackend
optional interface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
@enoodle enoodle force-pushed the erez/tas-scheduler-backend branch from 7b4d7bf to dd45fcf Compare April 7, 2026 20:30
@sanjaychatterjee

Copy link
Copy Markdown
Collaborator

Thanks will review it.

danbar2
danbar2 previously approved these changes Apr 9, 2026
Comment thread operator/internal/clustertopology/clustertopology.go
shayasoolin
shayasoolin previously approved these changes Apr 9, 2026
SynchronizeTopology now iterates all registered backends and delegates
scheduler-specific topology management to those implementing
TopologyAwareSchedBackend. KAI-specific logic removed from the
clustertopology package — it now lives in the KAI backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Erez Freiberger <enoodle@gmail.com>
@enoodle enoodle dismissed stale reviews from shayasoolin and danbar2 via 98d9701 April 9, 2026 12:19
@enoodle enoodle force-pushed the erez/tas-scheduler-backend branch from dd45fcf to 98d9701 Compare April 9, 2026 12:19
@enoodle enoodle requested a review from shayasoolin April 9, 2026 12:33
@enoodle enoodle merged commit e089df5 into ai-dynamo:main Apr 9, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants