Skip to content

Conversation

@carlory
Copy link
Member

@carlory carlory commented Aug 28, 2024

What this PR does / why we need it

If an object is not created by llmaz controllers, we shouldn't handle it.

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

Does this PR introduce a user-facing change?

Fix watch for changes to LeaderWorkerSet and trigger a Reconcile for the owner

@carlory
Copy link
Member Author

carlory commented Aug 28, 2024

/cc @kerthcet

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Aug 28, 2024
@carlory
Copy link
Member Author

carlory commented Aug 28, 2024

/kind bug

@InftyAI-Agent InftyAI-Agent requested a review from kerthcet August 28, 2024 07:10
@InftyAI-Agent InftyAI-Agent added bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Aug 28, 2024
return ctrl.NewControllerManagedBy(mgr).
For(&inferenceapi.Playground{}).
Watches(&inferenceapi.Service{}, &handler.EnqueueRequestForObject{},
Watches(&inferenceapi.Service{}, handler.EnqueueRequestForOwner(r.Scheme, r.RESTMapper(), &inferenceapi.Playground{}, handler.OnlyControllerOwner()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inference service can be created manually, so maybe this is not right? Playground is for quick start, once people has forked inference backend, like vllm-on-ascend-chip, playground couldn't support such backends, but inference could.

Copy link
Member Author

@carlory carlory Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the inference service is manually created by users or external controllers, Can the playground own the inference service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there's no Playground, so

  • Playground is for quick start, it's optional, if not found, skip.
  • Inference Service is for advanced deployment, it's the real service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A user (Bob) creates an Inference Service (foo)
  2. Another user (Li) doesn't know the foo service already exists, then creates a Playground with the same name.
  3. The service foo will be owned by the Playground, and the service foo may be deleted by Li via the Playground.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to handle this case when reconcile Playground

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, then the Playground is illegal, but seems we can't block the Playground creation, not supported in webhook, we may need to add another condition in Playground to report the error, what do you think?

		condition := metav1.Condition{
			Type:    inferenceapi.PlaygroundProgressing,
			Status:  metav1.ConditionFalse,
			Reason:  "AbortProcessing",
			Message: "Playground owns the same name with a Service",
		}

@carlory carlory force-pushed the EnqueueRequestForOwner branch from 77a6977 to 9469714 Compare August 28, 2024 09:20
@carlory carlory changed the title Fix watch for changes to Services/LeaderWorkerSet created by llmaz and trigger a Reconcile for the owner Fix watch for changes to LeaderWorkerSet created by llmaz and trigger a Reconcile for the owner Aug 28, 2024
@carlory carlory force-pushed the EnqueueRequestForOwner branch from 9469714 to bc74b66 Compare August 30, 2024 09:05
service := &inferenceapi.Service{}
if err := r.Get(ctx, types.NamespacedName{Name: playground.Name, Namespace: playground.Namespace}, service); err == nil {
if !controllerutil.HasControllerReference(service) {
condition := metav1.Condition{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should collect all the condition handlers to one place just like setPlaygroundCondition but this can be a follow up when implementing the Pending status for waiting for model creation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@kerthcet
Copy link
Member

/lgtm
/approve

Thanks!

@kerthcet
Copy link
Member

Seems we have a test failure.

@InftyAI-Agent InftyAI-Agent added lgtm Looks good to me, indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 30, 2024
@carlory carlory force-pushed the EnqueueRequestForOwner branch from bc74b66 to 2216220 Compare August 30, 2024 10:43
@InftyAI-Agent InftyAI-Agent removed the lgtm Looks good to me, indicates that a PR is ready to be merged. label Aug 30, 2024
@carlory
Copy link
Member Author

carlory commented Aug 30, 2024

/hold

For review

@InftyAI-Agent InftyAI-Agent added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2024
@carlory carlory force-pushed the EnqueueRequestForOwner branch from 2216220 to fa158df Compare August 30, 2024 10:47
@carlory
Copy link
Member Author

carlory commented Aug 30, 2024

/hold cancel

@InftyAI-Agent InftyAI-Agent removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2024
func setPlaygroundCondition(playground *inferenceapi.Playground, service *inferenceapi.Service) {
// For the start up.
if len(playground.Status.Conditions) == 0 {
if len(playground.Status.Conditions) == 0 || !apimeta.IsStatusConditionTrue(service.Status.Conditions, inferenceapi.PlaygroundProgressing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be refactored once introducing the pending for model creation condition. Because not progressing doesn't mean it's waiting for inferenceService ready.

@kerthcet
Copy link
Member

/lgtm

We need a followup to address the concerns then.

@InftyAI-Agent InftyAI-Agent added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Aug 30, 2024
@InftyAI-Agent InftyAI-Agent merged commit 69a8728 into InftyAI:main Aug 30, 2024
@carlory carlory deleted the EnqueueRequestForOwner branch September 2, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bug Categorizes issue or PR as related to a bug. lgtm Looks good to me, indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants