Skip to content

Match claims to classes using label selectors #870

@negz

Description

@negz

What problem are you facing?

In Crossplane 0.3 we introduced support for "provider specific" (aka "non-portable", aka "strongly typed") resource classes. During the development cycle we identified that if a claim must explicitly reference a provider specific resource class in order to enable dynamic provisioning, then the claim itself is now provider specific and no longer portable across providers. We solved this issue by introducing portable resource classes.

A portable resource class is effectively an indirection to a non-portable resource class. Portable resource classes can be set as the default for a particular namespace, and will be used by any resource claims of their corresponding kind that do not specify a resource class for dynamic provisioning, or a managed resource for static provisioning. This can be thought of as "publishing" non-portable resource classes (which may exist in a distinct namespace used to logically group infrastructure) to other namespaces, where applications may use them to satisfy claims for their infrastructure requirements.

This pattern is flexible and powerful, but Crossplane maintainers and community members have observed that it's verbose; three distinct Kubernetes resources (a claim, portable class, and non-portable class) must exist for dynamic provisioning to occur.

How could Crossplane help solve your problem?

Crossplane could remove the need for portable resource classes by using label selectors at the resource claim level, rather than the concept of a default class.

User experience

Resource claim authors who have a specific, non-portable resource class in mind just set the claim's classRef to that resource class (including its type metadata) just like they did before #723. Resource claim authors who have a specific, managed resource in mind just set the claim's resourceRef as they do today for static provisioning. Resource claim authors who don't know exactly what resource class they want omit the classRef, omit the resourceRef, and supply zero or more classSelector labels to match.

For example:

---
# Imagine this resource claim is created by a hypothetical GitLab stack
# controller to satisfy a request for a new GitLab installation. It does not
# specifically name a resource class, but declares it wants an experimental
# grade PostgreSQLInstance that is compatible with the Gitlab stack, in
# region us-east-1.
apiVersion: database.crossplane.io/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: gitlab-database
  namespace: gitlab-experiments
spec:
  classSelector:
    matchLabels:
      stack: gitlab
      grade: experimental
      region: us-east-1
  writeConnectionSecretToRef:
    name: app-postgresql-connection

Note that under this proposal the concept of a "portable class" is removed, so any reference to "class" or "resource class" means "non-portable resource class", for example CloudSQLInstanceClass.

From the claim author's perspective Crossplane would handle dynamic provisioning of the above PostgreSQLInstance claim by:

  1. "Finding" all classes that match its .spec.classSelector.matchLabels.
  2. "Picking one at random".
  3. Seting the PostgreSQLInstance .spec.classRef to the randomly selected resource class.
  4. Dynamically provisioning and binding as we do today.

This proposal has the following properties:

  • It supports portable, strongly typed, dynamic provisioning using only two concepts; resource claims and resource classes.
  • It works the same for a hypothetical non-portable CloudSQLInstanceClaim as it does for a portable PostgreSQLInstanceClaim.
  • It approximates support for "default resource classes". Label a resource class {default: "true"} to make it the global default for compatible resource claims (that select that label). Label a resource class {default: "true", namespace: app-project1-dev}, to make it the "default for a namespace" (actually, the default for any claim that selects those labels).
  • It makes resource class selection non-deterministic. If multiple resource classes match the labels, one is chosen at random.
  • It removes the concept of "publishing" a class to a namespace. This means a cluster operator can set a class as the "default" (kind of - see above) for app namespaces that don't exist yet, but also makes it harder for app operators to determine what resource classes they could choose from, presuming the resource classes lived in a distinct infrastructure namespace.
  • It enables a rudimentary form of "matching" (dare I say scheduling?) of resource claims by decoupling the language used to match claims to classes from the (strongly typed) language used to describe classes.

Elaborating on that last bullet, it's hard to match the requirements of a resource claim to the capabilities of a resource class on spec alone due to the "lowest common denominator" problem of multi cloud. The RedisCluster claim, for example, exposes a field named engineVersion, which specifies its desired version of Redis (e.g. "3.2"). Ideally we'd use this snippet of information to intelligently match a RedisCluster claim to a resource class for dynamic provisioning, but that is hard(TM). The following resource classes could satisfy a RedisCluster claim today:

  • GCP CloudMemorystoreInstanceClass has an equivalent field to engineVersion - it's named redisVersion and expects the version to be written as "REDIS_3_2", not "3.2".
  • AWS ReplicationGroupClass has an engineVersion field that expects the patch version to be included, i.e. "3.2.0".
  • Azure RedisClass has no field for engine version. It's just always version 3.2.

We could try to solve this by having resource claims expose the intersection (or the union?) of every possible resource class's set of fields in some standardised fashion, then teach each resource claim controller how to match the standardised resource claim fields to the provider specific resource class fields, or we could just use labels.

For example this resource claim specifies that it needs Redis 3.2:

---
apiVersion: cache.crossplane.io/v1alpha1
kind: RedisCluster
metadata:
  name: gitlab-cache
  namespace: gitlab-experiments
spec:
  classSelector:
    matchLabels:
      stack: gitlab
      engineVersion: "3.2"
  writeConnectionSecretToRef:
    name: gitlab-cache-connection

While these resource classes specify that they can satisfy claims that need Redis 3.2, while also using documented, high fidelity, validated, schemas when dynamically provisioning managed resources:

---
apiVersion: cache.gcp.crossplane.io/v1alpha2
kind: CloudMemorystoreInstanceClass
metadata:
  name: gitlab-development-default
  namespace: gcp-infra-dev
  labels:
    engineVersion: "3.2"
specTemplate:
  tier: STANDARD_HA
  region: us-west2
  memorySizeGb: 1
  redisVersion: REDIS_3_2
  providerRef:
    name: example
    namespace: gcp-infra-dev
  reclaimPolicy: Delete
---
apiVersion: cache.azure.crossplane.io/v1alpha2
kind: RedisClass
metadata:
  name: azure-redis-standard
  namespace: azure-infra-dev
  labels:
    engineVersion: "3.2"
specTemplate:
  resourceGroupName: group-westus-1
  location: West US
  sku:
    name: Basic
    family: C
    capacity: 0
  enableNonSslPort: true
  providerRef:
    name: example
    namespace: azure-infra-dev
  reclaimPolicy: Delete

Implementation Details

There are a lot of quotation marks in the above steps because what would actually be happening under the hood would be more like a race to satisfy the claim. In Crossplane there is not one controller for each resource claim kind, but rather one controller for each possible (resource claim, managed resource) tuple. This allows an infrastructure stack that adds support for a managed resource that could satisfy a resource claim to implement the dynamic provisioning and claim binding logic for said claim without having to touch Crossplane core. Put otherwise, crossplaneio/stack-gcp can add support for CloudSQLInstance resources to satisfy MySQLInstance claims in crossplaneio/stack-gcp, instead of teaching crossplaneio/crossplane how to dynamically provision a CloudSQLInstance. Hence the race. In reality the following would happen for the example above:

  • All resource claim controllers are updated to only reconcile resources with a resourceRef or a classRef.
  • A new "class selector" controller is introduced for each (resource claim, resource class) tuple. Its job is to set the classRef for resource classes that omit it, by using their classSelector.

Upon the creation of a PostgreSQLInstance without a classRef or resourceRef:

  1. Every class selector controller watching for PostgreSQLInstance claims has a reconcile queued. I'll use the (PostgreSQLInstance, CloudSQLInstanceClass) controller for this example, but (PostgreSQLInstance, RDSInstanceClass) and (PostgreSQLInstance, MysqlServerClass) controllers would run through the process in parallel.
  2. The controller would list all CloudSQLInstanceClass resources (in any namespace) that matched the PostgresSQLInstance claim' label selectors.
  3. If no CloudSQLInstanceClass matched the labels, the reconcile is done. Otherwise, one of the matching CloudSQLInstanceClass resources is selected at random.
  4. The controller sleeps for a small, jittered amount of time then sets the classRef of the PostgreSQLInstance to the selected CloudSQLInstanceClass. The jitter reduces the chance of two controllers trying to set the classRef at the same time. If two controllers do try to set the classRef at the same time one will fail to commit the change due to the PostgreSQLInstance claim's resource version having changed since it was read.
  5. The reconcile is done. With the classRef set the PostgresSQLInstance now passes the watch predicates of the (PostgreSQLInstance, CloudSQLInstance) resource claim reconciler, which dynamically provisions and binds it.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions