-
-
Notifications
You must be signed in to change notification settings - Fork 158
Operator CRD spec too limited for production use #447
Description
This issue is a follow up to #446 - after finishing the work on this PR, I realised that my team also needs to inject a serviceAccountName property into the worker pods that get launched by the operator. So naturally, I started putting together a PR for this patch too - but it occurred to me that this is a problem that is just going to keep expanding as other developers need more k8s configurations injected into their pods. For example:
- Service Account*
- Pod Affinity*
- Security Policy*
- Image Pull Secrets
- Volumes*
- Container Resources
- Volume Mounts*
- Sidecar containers
- Prometheus endpoints*
(* denotes requirements my team already requires)
It occurred to me that we would ultimately just be recreating the Pod core/v1 spec and embedding it inside the CRD files created, leading to significant maintenance requirements for keeping the CRD up-to-date with the latest API for Pods, not to mention possible bugs introduced through multiple versions of the Pod spec. Plus its not DRY, KISS or any of that stuff - personally, I'd hate having to maintain this.
Proposed Solution
Any solution to this problem should aim to eliminate the overhead and problems associated with re-writing k8s standard configurations (like Pods/Deployments) while enabling customisation of a CRD which works with the current operator structure (IE: I'm not proposing rewriting the operator). Since the operator does not currently have a build/release pipeline and CRDs aren't currently published in any standardised way, we could use a build process to convert OpenApi V3 references ($ref) into a "compiled" yaml template. This isn't a new idea, the developers at Agones [1] have some CRD definitions that are generated directly from their go source code.
An example with psuedocode:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: daskworkergroups.kubernetes.dask.org
spec:
scope: Namespaced
group: kubernetes.dask.org
names:
kind: DaskWorkerGroup
...
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
template:
# TODO: Point this to something that can be resolved somehow
$ref: 'io.k8s.api.core.v1.PodTemplateSpec'
status:
type: object
...After running a build process over using something like k8s-crd-resolver you would get something like:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: daskworkergroups.kubernetes.dask.org
spec:
scope: Namespaced
group: kubernetes.dask.org
names:
kind: DaskWorkerGroup
...
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
template:
description: PodTemplateSpec describes the data a pod should have when created from a template
properties:
metadata: ...
spec:
description: PodSpec is a description of a pod.
...
status:
type: object
...Note that the output of the second yaml snippet is the real configuration - I put the actual output of k8s-crd-resolver in this gist: https://gist.github.com/samdyzon/2082109b588fef3031b250e3def01feb
Theoretically this methodology could also be used to build a pseudo-CRD for each Dask resource (scheduler and worker) individually, and the build process could create one or more CRD specifications for DaskCluster and DaskWorkerGroup without duplicating source code.
After installing the built CRD into a cluster, a developer could then deploy their resources with as much or as little configuration as needed, but with the same API as deploying a standard pod:
apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
name: simple-cluster
namespace: dask
spec:
template:
metadata: ...
spec:
serviceAccountName: dask-service-account
containers:
- name: worker
image: dask/dask:latest
args: ...
ports:
- ...
env:
- name: TEST_ENVIRONMENT
value: hello-worldFinally, once the output CRD spec has everything it that is needed to spin up a pod, it is trivial to include the pod spec in build_worker_pod_spec and build_scheduler_service_spec:
def build_worker_pod_spec(name, namespace, image, n, template):
# template refers to the CRD property we generated during build, see gist for more
worker_name = f"{name}-worker-{n}"
return {
"apiVersion": "v1",
"kind": "Pod",
# The example snippet also provided the metadata under template["metadata"]
# but excluding it is just finding the schema that only has the "spec"...
"metadata": {
"name": worker_name,
"labels": {
"dask.org/cluster-name": scheduler_name,
"dask.org/workergroup-name": name,
"dask.org/component": "worker",
},
},
"spec": template["spec"]
}Outcomes:
- Developers can provide their own fully specified pod templates
- No need to maintain CRD specification for K8s standard templates
- Current operator architecture still works
Dependencies
- k8s-crd-resolver - looks well maintained, comes with API schemas for many versions of K8s, worked well during my 15 minutes of testing to build this issue.
As I mentioned in my PR, my team and I are extremely keen to get this operator into production - our current methodology for managing our Dask workloads is a massive source of technical debt and reliability issues which I want to remove ASAP. With that in mind, I'd be willing to contribute to the development of this process and help design the full workflow. I'm not keen on starting this work without some feedback from the maintainers though, since I'd be working on this out-of-office hours.
What do you guys think? If there is interest, I can put together some more detailed design documentation.
References:
[1] - They use a build script: export-openapi.sh that produces a YAML file with the embedded spec inside it (and parses some comments to add doco etc) - Generated CRD from build tools