kpt `include-file` function

A weakness of execution configuration pattern is that it interleaves configuration and code. For example, gatekeeper constraint templates wrap Rego code in KRM. Similarly, starlark code is wrapped in StarlarkRun resource. We can improve the development UX by providing an imperative pre-processor function that injects the content of a file into a particular field of a KRM resource. Note that this is useful for gatekeeper admission controller, even when not using the gatekeeper kpt function.

Strawman:

Assume one or more resources (which may be deeply nested) in the current directory have a comment directive pointing to a file containing code, e.g.:

apiVersion: fn.kpt.dev/v1alpha1
kind: StarlarkRun
metadata:
  name: set-namespace-to-prod
# kpt-include: myscript.star
source: |
...

This script references myscript.star in the same directory:

def setnamespace(resources, namespace):
  for resource in resources:
    # mutate the resource
    resource["metadata"]["namespace"] = namespace
setnamespace(ctx.resource_list["items"], "prod")

And we have a gatekeeper template:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata: # kpt-merge: /k8sbannedconfigmapkeysv1
  name: k8sbannedconfigmapkeysv1
spec:
  crd:
    spec:
      names:
        kind: K8sBannedConfigMapKeysV1
        validation:
          openAPIV3Schema:
            properties:
              keys:
                type: array
                items:
                  type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      # kpt-include: myscript.rego
      rego: |
...

That references myscript.rego:

package ban_keys

violation[{"msg": sprintf("%v", [val])}] {
  keys = {key | input.review.object.data[key]}
  banned = {key | input.parameters.keys[_] = key}
  overlap = keys & banned
  count(overlap) > 0
  val := sprintf("The following banned keys are being used in the ConfigMap: %v", [overlap])
}

After running the imperative function:

$ kpt fn eval -i gcr.io/kpt-fn/include-file --mount-current

(See #2351)

All resources containing the kpt-include comment directive are expanded:

apiVersion: fn.kpt.dev/v1alpha1
kind: StarlarkRun
metadata:
  name: set-namespace-to-prod
# kpt-include: myscript.star
source: |
  # set the namespace on all resources
  def setnamespace(resources, namespace):
    for resource in resources:
      # mutate the resource
      resource["metadata"]["namespace"] = namespace
  setnamespace(ctx.resource_list["items"], "prod")

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata: # kpt-merge: /k8sbannedconfigmapkeysv1
  name: k8sbannedconfigmapkeysv1
spec:
  crd:
    spec:
      names:
        kind: K8sBannedConfigMapKeysV1
        validation:
          openAPIV3Schema:
            properties:
              keys:
                type: array
                items:
                  type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
     # kpt-include: myscript.rego
      rego: |-
        package ban_keys

        violation[{"msg": sprintf("%v", [val])}] {
          keys = {key | input.review.object.data[key]}
          banned = {key | input.parameters.keys[_] = key}
          overlap = keys & banned
          count(overlap) > 0
          val := sprintf("The following banned keys are being used in the ConfigMap: %v", [overlap])
        }

Considerations:

Potential for abuse of this function. We don't want to encourage its use for maintaining partial resource configuration as a DRY technique (i.e. injecting KRM into KRM). Injecting into a single field of type string mitigates this. Should also consider a more descriptive name.
Whether to support relative paths or only allow the file in the same directory as the KRM. The latter is fine to start with to keep things simple.

Jun 29 '21 17:06 frankfarzan

I agree with the injection idea in order to reduce boilerplate and facilitate separation of code and data, and possibly other preprocessing. This could also facilitate testing.

A non-kpt ConstraintTemplate generation example: https://github.com/plexsystems/konstraint/tree/main/examples/container-deny-latest-tag

Possibly another issue would be warranted, but functions are currently too verbose. For example, setting an annotation should only require a couple of lines of code. The current function contains 150. https://github.com/GoogleContainerTools/kpt-functions-catalog/blob/master/functions/go/set-annotations/main.go

Other Go implementations that set annotations are also fairly long, largely because they include a lot of code that could be in a more generic harness: https://github.com/kubernetes-sigs/kustomize/blob/master/plugin/builtin/annotationstransformer/AnnotationsTransformer.go https://github.com/kubernetes-sigs/kustomize/blob/master/api/filters/annotations/annotations.go https://github.com/jenkins-x-plugins/jx-gitops/blob/main/pkg/cmd/annotate/annotate.go https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubectl/pkg/cmd/annotate/annotate.go

Interpreted languages especially have the potential for simplification, but even our typescript functions are much longer than is ideal and would benefit from kernel injection or composition: https://github.com/GoogleContainerTools/kpt-functions-catalog/blob/master/functions/ts/kubeval/src/kubeval.ts

Metacontroller did a decent job of facilitating simple transformation kernels: https://github.com/GoogleCloudPlatform/metacontroller/blob/master/examples/catset/sync.js

In particular, transformers and validators would benefit from a simple filter kernel framework, with filtering implemented in the orchestrator, as discussed in #2364.

@karlkfi @droot

Jul 08 '21 17:07 bgrant0607

Coming from https://github.com/GoogleContainerTools/kpt/issues/1151#issuecomment-877836890 supporting such an include comment directive sounds interesting.

My use cases that could benefit from such an include functionality are as follows:

Loading a kustomization including JSON patches as input of a kustomize pipeline function.
Loading a local Helm chart's templates as input of a khelm function.

In both cases the function could define a custom resource kind (e.g. FileBlob) used as wrapper resource for non-KRM contents and the comment directive used to add a file's contents to it. However creating a separate resource for each required non-KRM file could be very cumbersome. Ideally users should be able to just point to a directory (within a Kptfile?) whose files should be wrapped into FileBlob resources (defining the resource kind within the kpt project) - in that case the include comment directive wouldn't be involved. The Helm chart use case comes with an additional problem: Helm templates need to be maintained outside the kpt function source directory since they contain *.yaml files that actually aren't yaml which makes the kpt source yaml parser fail otherwise. Therefore one would either need to be able to let the comment directive refer to files outside the function source directory (which should be avoided due to security reasons) or prevent kpt from parsing yaml files that are included as wrapped KRM resources.

Jul 11 '21 18:07 mgoltzsche

@maxsmythe and @julianKatz and @willbeason -- just FYI, not sure if this is on your radar

Dec 01 '21 18:12 linde

Would this function only modify files in-place, or could it redirect the output to a separate directory, leaving the original YAML untouched?

Dec 02 '21 01:12 maxsmythe

Also, at what granularity could/does this operate? If I point it to a directory tree, will it perform all substitutions in that directory tree?

If I'm not modifying the file in-place, will the structure of the output's dirtree be maintained?

Dec 02 '21 01:12 maxsmythe

Rather than mounting the current working directory, another approach could be including non-KRM files in the ResourceList by wrapping them with KRM: #3118. That's probably the approach we'll need for porch. Like the FileBlob proprosal above.

Jul 14 '22 15:07 bgrant0607

Hi, few more cases for this feature.

Include non-yaml files (sh scripts, python scripts, etc)

Tekton allows to add content of the script to yaml file

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: kubectl-run
spec:
  taskRef:
    name: kubernetes-actions
  params:
    - name: script
      value: |
        kubectl get pods 
        echo "-----------"
        kubectl get deploy

This is fine when the script is small, but when it become bigger you may want to move it to a separate file to get IDE features

e.g. my-script.sh

#!/bin/bash
kubectl get pods 
echo "-----------"
kubectl get deploy

Include Markdown files

In order to describe Pipeline in Tekton can be used description field

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: cep0001
  namespace: tekton-namespace
spec:
  description: |
    TBD. include-file @cep0001.md

I would like to store description separately from the pipeline and include it into Pipeline with kpt cep0001.md

Chaos Scenario CEP0001

Link: TBD

Flow:
1. Get pods
2. Kill master redis pod
3. Get pods

Include YAML files

I can have PipelineRun where my input parameter is YAML as well and I want IDE features to be able to change that yaml separately

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: cep0002-run
spec:
  params:
    - name: chaos-mesh-yaml
      value: |
         TBD include pod-kill.yaml

pod-kill.yaml

kind: PodChaos
apiVersion: chaos-mesh.org/v1alpha1
metadata:
  namespace: target-namespace
  name: pod-kill
  annotations:
    config.kubernetes.io/local-config: "true"
spec:
  selector:
    namespaces:
      -  target-namespace
    labelSelectors:
      app.kubernetes.io/component: master
      app.kubernetes.io/name: redis
  mode: one
  action: pod-kill
  gracePeriod: 0

Note: I would like to modify pod-kill.yaml by kpt before including into file

Partial YAMLs included into one YAML

https://github.com/argoproj/argo-workflows/blob/master/examples/handle-large-output-results.yaml This huge yaml can be split into different parts with the functions where to upsert it

templates:
    - name: echo
      inputs:
        parameters:
          - name: index
        artifacts:
          - name: items
            path: /tmp/items
      container:
        image: stedolan/jq:latest
        command: [sh, -c]
        # jq is just one way to get the appropriate item.
        args: ["cat /tmp/items | jq '.[{{inputs.parameters.index}}]'"]

Aug 26 '22 03:08 marniks7