Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat(appliance): Adopt Frontend Service + Ingress#63893

Merged
Chickensoupwithrice merged 22 commits into
mainfrom
al/REL-218/adopt-frontend
Jul 22, 2024
Merged

feat(appliance): Adopt Frontend Service + Ingress#63893
Chickensoupwithrice merged 22 commits into
mainfrom
al/REL-218/adopt-frontend

Conversation

@Chickensoupwithrice

@Chickensoupwithrice Chickensoupwithrice commented Jul 17, 2024

Copy link
Copy Markdown
Contributor

Resolving section 1 REL-213: Setting controllerOwner references to k8s objects that will exist before appliance is installed.
Part 2 (serializing helm values into an annotation and using that as input) will happen in a subsequent PR

We need to adopt Frontend's Service and Ingress for Appliance to manage.

This is still a work in progress PR.

Currently handles and adopts frontend's Service (golden test included)
Yet to adopt frontend's Service.

Test plan

Golden test included

Changelog

  • feat(appliance): Appliance adopts Frontend's Ingress + Service

@cla-bot cla-bot Bot added the cla-signed label Jul 17, 2024
@Chickensoupwithrice Chickensoupwithrice requested review from a team and craigfurman July 18, 2024 01:37
@craigfurman craigfurman self-assigned this Jul 18, 2024
Comment thread internal/appliance/reconciler/frontend_test.go Outdated
Comment thread internal/appliance/reconciler/testdata/sg/frontend/adopt-service.yaml Outdated
resourceVersion: NORMALIZED_FOR_TESTING
uid: NORMALIZED_FOR_TESTING
- apiVersion: v1
kind: Service

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, here's the service we provisioned, collected by the golden test framework. It has no ControllerReference, so we haven't adopted it.

There are no other resources apart from our configmap, which was puzzling to me at first, but read on

Running go test -v ./internal/appliance/reconciler -run TestApplianceTestSuite/TestAdoptFrontend, we see the following reconciler logs:

    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 reconciling sourcegraph appliance {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0"}
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
I0718 10:03:50.810672   49837 controller.go:624] quota admission added evaluator for: deployments.apps
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 refusing to update non-owned resource {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend-internal"}
I0718 10:03:50.817441   49837 alloc.go:330] "allocated clusterIPs" service="test-appliance-dc9fea/sourcegraph-frontend-internal" clusterIPs={"IPv4":"10.0.0.89"}
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
I0718 10:03:50.819103   49837 controller.go:624] quota admission added evaluator for: serviceaccounts
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
    core.go:57: INFO TestApplianceTestSuite logr@v1.4.2/logr.go:280 didn't find existing object, creating it {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appliance-dc9fea/sg", "namespace": "test-appliance-dc9fea", "name": "sg", "reconcileID": "e1657450-ee2c-4b5d-8fa3-2223fa4af7f0", "kind": "/, Kind=", "namespace": "test-appliance-dc9fea", "name": "sourcegraph-frontend"}
    core.go:57: DEBUG TestApplianceTestSuite logr@v1.4.2/logr.go:280 Reconcile successful {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": "test-appli

Things that stand out:

  • we do see our "refusing to update non-owned resource", as we expect. That's no surprise, this PR doesn't add any logic for adoption of specific GVKNs, yet
  • we can't see what resources each line refers to ("kind": "/, Kind=", - wat). I'll start investigating that separately, will probably open an issue.
  • We don't see the frontend Deployment and other resources in this golden fixture 🤔. We see a bunch of messages "didn't find existing object, creating it", and the refusal to update non-owned resources is non-fatal. This appears to be because your test has a subtle race condition in it. See my note on the makeGoldenAssertions() call for details. When I fix that, I see the other resources, and diff internal/appliance/reconciler/testdata/golden-fixtures/frontend/default.yaml internal/appliance/reconciler/testdata/golden-fixtures/frontend/adopt-service.yaml looks as we expect (ownerReference stuff missing)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread internal/appliance/reconciler/frontend_test.go Outdated
Comment on lines +107 to +114
if existingRes.GetObjectKind().GroupVersionKind().Kind == "Service" {
logger.Info(fmt.Sprintf("Taking control of previously unowned object: %s", existingRes.GetName()))
if err := ctrl.SetControllerReference(owner, existingRes, r.Scheme); err != nil {
return errors.Newf("setting controller reference for service", err)
}
} else {
logger.Info("refusing to update non-owned resource")
return nil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wary that we could inadvertently cobble other services here. Given the namespace check above, it should be enough?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if we relax this check for whole kinds, we may as well drop it altogether. I've outlined an alternative plan in another comment, LMK what you think 🙏

Comment on lines +547 to +579
- apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
appliance.sourcegraph.com/configHash: d3eb623947fedba566dfc56adc4733ff5ef1b2887a5cb63d75dbc1df452b0b5c
creationTimestamp: "2024-04-19T00:00:00Z"
generation: 2
labels:
deploy: sourcegraph
name: sourcegraph-frontend
namespace: NORMALIZED_FOR_TESTING
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: ConfigMap
name: sg
uid: NORMALIZED_FOR_TESTING
resourceVersion: NORMALIZED_FOR_TESTING
uid: NORMALIZED_FOR_TESTING
spec:
rules:
- http:
paths:
- backend:
service:
name: sourcegraph-frontend
port:
number: 30080
path: /
pathType: Prefix
status:
loadBalancer: {}

@Chickensoupwithrice Chickensoupwithrice Jul 18, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We successfully clobbered the ingress as well, but didn't keep any of the previous definitions (intended behaviour for now)

@Chickensoupwithrice Chickensoupwithrice marked this pull request as ready for review July 18, 2024 23:17
Comment thread internal/appliance/reconciler/kubernetes.go Outdated
Comment on lines +107 to +114
if existingRes.GetObjectKind().GroupVersionKind().Kind == "Service" {
logger.Info(fmt.Sprintf("Taking control of previously unowned object: %s", existingRes.GetName()))
if err := ctrl.SetControllerReference(owner, existingRes, r.Scheme); err != nil {
return errors.Newf("setting controller reference for service", err)
}
} else {
logger.Info("refusing to update non-owned resource")
return nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if we relax this check for whole kinds, we may as well drop it altogether. I've outlined an alternative plan in another comment, LMK what you think 🙏

protocol: TCP
targetPort: http
selector:
app: sourcegraph-frontend

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are still figuring out where/when the appliance will flip the service selector between itself/frontend when frontend is unhealthy. One potential issue with allowing the service definition to do it, is that on first installation, the user might be looking at a progress bar, which will stop working as soon as installation starts, because we point the ingress-facing service away from the appliance.

I think we can leave this as-is, and I've left a comment in an issue to figure this out: https://linear.app/sourcegraph/issue/REL-78/when-sourcegraph-frontend-is-down-a-user-trying-to-access-sourcegraph#comment-d3b6cd0d

cc @jdpleiness

@craigfurman craigfurman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

nice!

Comment thread internal/appliance/reconciler/frontend_test.go Outdated
Comment thread internal/appliance/reconciler/frontend_test.go Outdated
Chickensoupwithrice and others added 3 commits July 22, 2024 10:46
Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
@Chickensoupwithrice Chickensoupwithrice enabled auto-merge (squash) July 22, 2024 18:05
@Chickensoupwithrice Chickensoupwithrice merged commit a61f353 into main Jul 22, 2024
@Chickensoupwithrice Chickensoupwithrice deleted the al/REL-218/adopt-frontend branch July 22, 2024 19:37
craigfurman pushed a commit that referenced this pull request Jul 31, 2024
<!-- PR description tips:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e
-->
Resolving section 1
[REL-213](https://linear.app/sourcegraph/issue/REL-218/ingressservice-consistency-for-helm-deployment):
Setting controllerOwner references to k8s objects that will exist before
appliance is installed.
Part 2 (serializing helm values into an annotation and using that as
input) will happen in a subsequent PR

We need to adopt Frontend's `Service` and `Ingress` for Appliance to
manage.

~~This is still a work in progress PR.~~

Currently handles and adopts frontend's Service (golden test included)
Yet to adopt frontend's Service. 

## Test plan
Golden test included
<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->

## Changelog

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
- feat(appliance): Appliance adopts Frontend's `Ingress` + `Service`

---------

Co-authored-by: Craig Furman <craig.furman@sourcegraph.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants