feat(appliance): Adopt Frontend Service + Ingress#63893
Conversation
| resourceVersion: NORMALIZED_FOR_TESTING | ||
| uid: NORMALIZED_FOR_TESTING | ||
| - apiVersion: v1 | ||
| kind: Service |
There was a problem hiding this comment.
indeed, here's the service we provisioned, collected by the golden test framework. It has no ControllerReference, so we haven't adopted it.
There are no other resources apart from our configmap, which was puzzling to me at first, but read on
Running go test -v ./internal/appliance/reconciler -run TestApplianceTestSuite/TestAdoptFrontend, we see the following reconciler logs:
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 reconciling sourcegraph appliance {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0"}
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
I0718 10:03:50.810672 49837 controller.go:624] quota admission added evaluator for: deployments.apps
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 refusing to update non-owned resource {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend-internal"}
I0718 10:03:50.817441 49837 alloc.go:330] "allocated clusterIPs" service="test-appliance-dc9fea/sourcegraph-frontend-internal" clusterIPs={"IPv4":"10.0.0.89"}
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
I0718 10:03:50.819103 49837 controller.go:624] quota admission added evaluator for: serviceaccounts
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
core.go:57: DEBUG TestApplianceTestSuite logr@v1.4.2/logr.go:280 Reconcile successful {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appli
Things that stand out:
- we do see our "refusing to update non-owned resource", as we expect. That's no surprise, this PR doesn't add any logic for adoption of specific GVKNs, yet
- we can't see what resources each line refers to (
"kind": "/, Kind=",- wat). I'll start investigating that separately, will probably open an issue. - We don't see the frontend Deployment and other resources in this golden fixture 🤔. We see a bunch of messages "didn't find existing object, creating it", and the refusal to update non-owned resources is non-fatal. This appears to be because your test has a subtle race condition in it. See my note on the makeGoldenAssertions() call for details. When I fix that, I see the other resources, and
diff internal/appliance/reconciler/testdata/golden-fixtures/frontend/default.yaml internal/appliance/reconciler/testdata/golden-fixtures/frontend/adopt-service.yamllooks as we expect (ownerReference stuff missing)
There was a problem hiding this comment.
| if existingRes.GetObjectKind().GroupVersionKind().Kind == "Service" { | ||
| logger.Info(fmt.Sprintf("Taking control of previously unowned object: %s", existingRes.GetName())) | ||
| if err := ctrl.SetControllerReference(owner, existingRes, r.Scheme); err != nil { | ||
| return errors.Newf("setting controller reference for service", err) | ||
| } | ||
| } else { | ||
| logger.Info("refusing to update non-owned resource") | ||
| return nil |
There was a problem hiding this comment.
I'm a little wary that we could inadvertently cobble other services here. Given the namespace check above, it should be enough?
There was a problem hiding this comment.
I think, if we relax this check for whole kinds, we may as well drop it altogether. I've outlined an alternative plan in another comment, LMK what you think 🙏
| - apiVersion: networking.k8s.io/v1 | ||
| kind: Ingress | ||
| metadata: | ||
| annotations: | ||
| appliance.sourcegraph.com/configHash: d3eb623947fedba566dfc56adc4733ff5ef1b2887a5cb63d75dbc1df452b0b5c | ||
| creationTimestamp: "2024-04-19T00:00:00Z" | ||
| generation: 2 | ||
| labels: | ||
| deploy: sourcegraph | ||
| name: sourcegraph-frontend | ||
| namespace: NORMALIZED_FOR_TESTING | ||
| ownerReferences: | ||
| - apiVersion: v1 | ||
| blockOwnerDeletion: true | ||
| controller: true | ||
| kind: ConfigMap | ||
| name: sg | ||
| uid: NORMALIZED_FOR_TESTING | ||
| resourceVersion: NORMALIZED_FOR_TESTING | ||
| uid: NORMALIZED_FOR_TESTING | ||
| spec: | ||
| rules: | ||
| - http: | ||
| paths: | ||
| - backend: | ||
| service: | ||
| name: sourcegraph-frontend | ||
| port: | ||
| number: 30080 | ||
| path: / | ||
| pathType: Prefix | ||
| status: | ||
| loadBalancer: {} |
There was a problem hiding this comment.
We successfully clobbered the ingress as well, but didn't keep any of the previous definitions (intended behaviour for now)
| if existingRes.GetObjectKind().GroupVersionKind().Kind == "Service" { | ||
| logger.Info(fmt.Sprintf("Taking control of previously unowned object: %s", existingRes.GetName())) | ||
| if err := ctrl.SetControllerReference(owner, existingRes, r.Scheme); err != nil { | ||
| return errors.Newf("setting controller reference for service", err) | ||
| } | ||
| } else { | ||
| logger.Info("refusing to update non-owned resource") | ||
| return nil |
There was a problem hiding this comment.
I think, if we relax this check for whole kinds, we may as well drop it altogether. I've outlined an alternative plan in another comment, LMK what you think 🙏
| protocol: TCP | ||
| targetPort: http | ||
| selector: | ||
| app: sourcegraph-frontend |
There was a problem hiding this comment.
I know we are still figuring out where/when the appliance will flip the service selector between itself/frontend when frontend is unhealthy. One potential issue with allowing the service definition to do it, is that on first installation, the user might be looking at a progress bar, which will stop working as soon as installation starts, because we point the ingress-facing service away from the appliance.
I think we can leave this as-is, and I've left a comment in an issue to figure this out: https://linear.app/sourcegraph/issue/REL-78/when-sourcegraph-frontend-is-down-a-user-trying-to-access-sourcegraph#comment-d3b6cd0d
cc @jdpleiness
Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
<!-- PR description tips: https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e --> Resolving section 1 [REL-213](https://linear.app/sourcegraph/issue/REL-218/ingressservice-consistency-for-helm-deployment): Setting controllerOwner references to k8s objects that will exist before appliance is installed. Part 2 (serializing helm values into an annotation and using that as input) will happen in a subsequent PR We need to adopt Frontend's `Service` and `Ingress` for Appliance to manage. ~~This is still a work in progress PR.~~ Currently handles and adopts frontend's Service (golden test included) Yet to adopt frontend's Service. ## Test plan Golden test included <!-- REQUIRED; info at https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles --> ## Changelog <!-- OPTIONAL; info at https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c --> - feat(appliance): Appliance adopts Frontend's `Ingress` + `Service` --------- Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
Resolving section 1 REL-213: Setting controllerOwner references to k8s objects that will exist before appliance is installed.
Part 2 (serializing helm values into an annotation and using that as input) will happen in a subsequent PR
We need to adopt Frontend's
ServiceandIngressfor Appliance to manage.This is still a work in progress PR.Currently handles and adopts frontend's Service (golden test included)
Yet to adopt frontend's Service.
Test plan
Golden test included
Changelog
Ingress+Service