-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Match claims to classes using label selectors #870
Description
What problem are you facing?
In Crossplane 0.3 we introduced support for "provider specific" (aka "non-portable", aka "strongly typed") resource classes. During the development cycle we identified that if a claim must explicitly reference a provider specific resource class in order to enable dynamic provisioning, then the claim itself is now provider specific and no longer portable across providers. We solved this issue by introducing portable resource classes.
A portable resource class is effectively an indirection to a non-portable resource class. Portable resource classes can be set as the default for a particular namespace, and will be used by any resource claims of their corresponding kind that do not specify a resource class for dynamic provisioning, or a managed resource for static provisioning. This can be thought of as "publishing" non-portable resource classes (which may exist in a distinct namespace used to logically group infrastructure) to other namespaces, where applications may use them to satisfy claims for their infrastructure requirements.
This pattern is flexible and powerful, but Crossplane maintainers and community members have observed that it's verbose; three distinct Kubernetes resources (a claim, portable class, and non-portable class) must exist for dynamic provisioning to occur.
How could Crossplane help solve your problem?
Crossplane could remove the need for portable resource classes by using label selectors at the resource claim level, rather than the concept of a default class.
User experience
Resource claim authors who have a specific, non-portable resource class in mind just set the claim's classRef to that resource class (including its type metadata) just like they did before #723. Resource claim authors who have a specific, managed resource in mind just set the claim's resourceRef as they do today for static provisioning. Resource claim authors who don't know exactly what resource class they want omit the classRef, omit the resourceRef, and supply zero or more classSelector labels to match.
For example:
---
# Imagine this resource claim is created by a hypothetical GitLab stack
# controller to satisfy a request for a new GitLab installation. It does not
# specifically name a resource class, but declares it wants an experimental
# grade PostgreSQLInstance that is compatible with the Gitlab stack, in
# region us-east-1.
apiVersion: database.crossplane.io/v1alpha1
kind: PostgreSQLInstance
metadata:
name: gitlab-database
namespace: gitlab-experiments
spec:
classSelector:
matchLabels:
stack: gitlab
grade: experimental
region: us-east-1
writeConnectionSecretToRef:
name: app-postgresql-connectionNote that under this proposal the concept of a "portable class" is removed, so any reference to "class" or "resource class" means "non-portable resource class", for example CloudSQLInstanceClass.
From the claim author's perspective Crossplane would handle dynamic provisioning of the above PostgreSQLInstance claim by:
- "Finding" all classes that match its
.spec.classSelector.matchLabels. - "Picking one at random".
- Seting the
PostgreSQLInstance.spec.classRefto the randomly selected resource class. - Dynamically provisioning and binding as we do today.
This proposal has the following properties:
- It supports portable, strongly typed, dynamic provisioning using only two concepts; resource claims and resource classes.
- It works the same for a hypothetical non-portable
CloudSQLInstanceClaimas it does for a portablePostgreSQLInstanceClaim. - It approximates support for "default resource classes". Label a resource class
{default: "true"}to make it the global default for compatible resource claims (that select that label). Label a resource class{default: "true", namespace: app-project1-dev}, to make it the "default for a namespace" (actually, the default for any claim that selects those labels). - It makes resource class selection non-deterministic. If multiple resource classes match the labels, one is chosen at random.
- It removes the concept of "publishing" a class to a namespace. This means a cluster operator can set a class as the "default" (kind of - see above) for app namespaces that don't exist yet, but also makes it harder for app operators to determine what resource classes they could choose from, presuming the resource classes lived in a distinct infrastructure namespace.
- It enables a rudimentary form of "matching" (dare I say scheduling?) of resource claims by decoupling the language used to match claims to classes from the (strongly typed) language used to describe classes.
Elaborating on that last bullet, it's hard to match the requirements of a resource claim to the capabilities of a resource class on spec alone due to the "lowest common denominator" problem of multi cloud. The RedisCluster claim, for example, exposes a field named engineVersion, which specifies its desired version of Redis (e.g. "3.2"). Ideally we'd use this snippet of information to intelligently match a RedisCluster claim to a resource class for dynamic provisioning, but that is hard(TM). The following resource classes could satisfy a RedisCluster claim today:
- GCP
CloudMemorystoreInstanceClasshas an equivalent field toengineVersion- it's namedredisVersionand expects the version to be written as"REDIS_3_2", not"3.2". - AWS
ReplicationGroupClasshas anengineVersionfield that expects the patch version to be included, i.e."3.2.0". - Azure
RedisClasshas no field for engine version. It's just always version 3.2.
We could try to solve this by having resource claims expose the intersection (or the union?) of every possible resource class's set of fields in some standardised fashion, then teach each resource claim controller how to match the standardised resource claim fields to the provider specific resource class fields, or we could just use labels.
For example this resource claim specifies that it needs Redis 3.2:
---
apiVersion: cache.crossplane.io/v1alpha1
kind: RedisCluster
metadata:
name: gitlab-cache
namespace: gitlab-experiments
spec:
classSelector:
matchLabels:
stack: gitlab
engineVersion: "3.2"
writeConnectionSecretToRef:
name: gitlab-cache-connectionWhile these resource classes specify that they can satisfy claims that need Redis 3.2, while also using documented, high fidelity, validated, schemas when dynamically provisioning managed resources:
---
apiVersion: cache.gcp.crossplane.io/v1alpha2
kind: CloudMemorystoreInstanceClass
metadata:
name: gitlab-development-default
namespace: gcp-infra-dev
labels:
engineVersion: "3.2"
specTemplate:
tier: STANDARD_HA
region: us-west2
memorySizeGb: 1
redisVersion: REDIS_3_2
providerRef:
name: example
namespace: gcp-infra-dev
reclaimPolicy: Delete
---
apiVersion: cache.azure.crossplane.io/v1alpha2
kind: RedisClass
metadata:
name: azure-redis-standard
namespace: azure-infra-dev
labels:
engineVersion: "3.2"
specTemplate:
resourceGroupName: group-westus-1
location: West US
sku:
name: Basic
family: C
capacity: 0
enableNonSslPort: true
providerRef:
name: example
namespace: azure-infra-dev
reclaimPolicy: DeleteImplementation Details
There are a lot of quotation marks in the above steps because what would actually be happening under the hood would be more like a race to satisfy the claim. In Crossplane there is not one controller for each resource claim kind, but rather one controller for each possible (resource claim, managed resource) tuple. This allows an infrastructure stack that adds support for a managed resource that could satisfy a resource claim to implement the dynamic provisioning and claim binding logic for said claim without having to touch Crossplane core. Put otherwise, crossplaneio/stack-gcp can add support for CloudSQLInstance resources to satisfy MySQLInstance claims in crossplaneio/stack-gcp, instead of teaching crossplaneio/crossplane how to dynamically provision a CloudSQLInstance. Hence the race. In reality the following would happen for the example above:
- All resource claim controllers are updated to only reconcile resources with a
resourceRefor aclassRef. - A new "class selector" controller is introduced for each (resource claim, resource class) tuple. Its job is to set the
classReffor resource classes that omit it, by using theirclassSelector.
Upon the creation of a PostgreSQLInstance without a classRef or resourceRef:
- Every class selector controller watching for
PostgreSQLInstanceclaims has a reconcile queued. I'll use the(PostgreSQLInstance, CloudSQLInstanceClass)controller for this example, but(PostgreSQLInstance, RDSInstanceClass)and(PostgreSQLInstance, MysqlServerClass)controllers would run through the process in parallel. - The controller would list all
CloudSQLInstanceClassresources (in any namespace) that matched thePostgresSQLInstanceclaim' label selectors. - If no
CloudSQLInstanceClassmatched the labels, the reconcile is done. Otherwise, one of the matchingCloudSQLInstanceClassresources is selected at random. - The controller sleeps for a small, jittered amount of time then sets the
classRefof thePostgreSQLInstanceto the selectedCloudSQLInstanceClass. The jitter reduces the chance of two controllers trying to set theclassRefat the same time. If two controllers do try to set theclassRefat the same time one will fail to commit the change due to thePostgreSQLInstanceclaim's resource version having changed since it was read. - The reconcile is done. With the
classRefset thePostgresSQLInstancenow passes the watch predicates of the(PostgreSQLInstance, CloudSQLInstance)resource claim reconciler, which dynamically provisions and binds it.