Skip to content

qa/tasks/kubeadm: set up tigera resources via kubectl create#47854

Merged
adk3798 merged 1 commit intoceph:mainfrom
phlogistonjohn:jjm-fix-57268
Sep 12, 2022
Merged

qa/tasks/kubeadm: set up tigera resources via kubectl create#47854
adk3798 merged 1 commit intoceph:mainfrom
phlogistonjohn:jjm-fix-57268

Conversation

@phlogistonjohn
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/57268

The tigera operator for the calico CNI has some pretty large resource
definitions. The length of the definitions can cause the "client side
apply", the default mode for kubectl apply ...., to fail due to the
length of the needed annotation that would result:

2022-08-22T20:24:55.636 INFO:teuthology.orchestra.run.smithi087.stdout:clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
2022-08-22T20:24:55.670 INFO:teuthology.orchestra.run.smithi087.stdout:deployment.apps/tigera-operator created
2022-08-22T20:24:55.671 INFO:teuthology.orchestra.run.smithi087.stderr:The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
2022-08-22T20:24:55.674 DEBUG:teuthology.orchestra.run:got remote process result: 1

There are two simple options for avoiding this error. One is to use
kubectl create. The create command will not make this lengthy
annotation. It will fail if any of the resources already exist. The
other option is to use server-side apply, via the kubectl apply --server-side ... command. It is new in k8s 1.18. It will not create
the annotation either.

The block of code setting up the CNI already uses kubectl create to
create the custom resources that configure the tigera operator.
Therefore it should be safe to assume the block of code in question
doesn't need to be idempotent and we can also use kubectl create
elsewhere in the same block.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Fixes: https://tracker.ceph.com/issues/57268

The tigera operator for the calico CNI has some pretty large resource
definitions. The length of the definitions can cause the "client side
apply", the default mode for `kubectl apply ....`, to fail due to the
length of the needed annotation that would result:

```
2022-08-22T20:24:55.636 INFO:teuthology.orchestra.run.smithi087.stdout:clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
2022-08-22T20:24:55.670 INFO:teuthology.orchestra.run.smithi087.stdout:deployment.apps/tigera-operator created
2022-08-22T20:24:55.671 INFO:teuthology.orchestra.run.smithi087.stderr:The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
2022-08-22T20:24:55.674 DEBUG:teuthology.orchestra.run:got remote process result: 1
```

There are two simple options for avoiding this error. One is to use
`kubectl create`. The create command will not make this lengthy
annotation. It will fail if any of the resources already exist. The
other option is to use server-side apply, via the `kubectl apply
--server-side ...` command. It is new in k8s 1.18. It will not create
the annotation either.

The block of code setting up the CNI already uses `kubectl create` to
create the custom resources that configure the tigera operator.
Therefore it should be safe to assume the block of code in question
doesn't need to be idempotent and we can also use `kubectl create`
elsewhere in the same block.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
@github-actions github-actions bot added the tests label Aug 29, 2022
@phlogistonjohn phlogistonjohn requested review from a team and ljflores August 29, 2022 14:19
@phlogistonjohn
Copy link
Contributor Author

jenkins test make check

@adk3798
Copy link
Contributor

adk3798 commented Aug 29, 2022

Do you think this would this also fix https://tracker.ceph.com/issues/57269? @phlogistonjohn

@adk3798
Copy link
Contributor

adk3798 commented Aug 29, 2022

jenkins test api

@adk3798
Copy link
Contributor

adk3798 commented Sep 1, 2022

@adk3798
Copy link
Contributor

adk3798 commented Sep 1, 2022

https://pulpito.ceph.com/adking-2022-08-30_17:06:47-orch:cephadm-wip-adk-testing-2022-08-29-1644-distro-default-smithi/ and reruns https://pulpito.ceph.com/adking-2022-09-01_12:53:01-orch:cephadm-wip-adk-testing-2022-08-29-1644-distro-default-smithi/

Failures tracked by:

the only one that is new and surprising on this run were the iscsi test failures (https://tracker.ceph.com/issues/57371). As a result, I'm not going to merge any PRs in this run that seem related to iscsi. Other PRs I think will still be okay to merge though.

none of that is really relevant to this specific PR. Scheduled a run with some rook jobs using calico using the same build that includes this PR https://pulpito.ceph.com/adking-2022-09-01_17:38:25-orch:rook-wip-adk-testing-2022-08-29-1644-distro-default-smithi/

@adk3798
Copy link
Contributor

adk3798 commented Sep 1, 2022

@phlogistonjohn the tests seem to just be failing on the kubectl create rather than the apply BUT the error doesn't reference tigera.

2022-09-01T18:38:13.403 INFO:teuthology.orchestra.run.smithi008.stderr:error: resource mapping not found for name: "00-rook-privileged" namespace: "" from "rook/cluster/examples/kubernetes/ceph/common.yaml": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
2022-09-01T18:38:13.404 INFO:teuthology.orchestra.run.smithi008.stderr:ensure CRDs are installed first
2022-09-01T18:38:13.410 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-09-01T18:38:13.411 ERROR:tasks.rook:Command failed on smithi008 with status 1: 'kubectl create -f rook/cluster/examples/kubernetes/ceph/crds.yaml -f rook/cluster/examples/kubernetes/ceph/common.yaml -f operator.yaml'
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36f4d04b435bfae43936a7c648e95cb826d90419/qa/tasks/rook.py", line 128, in rook_operator
    '-f', 'operator.yaml',
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36f4d04b435bfae43936a7c648e95cb826d90419/qa/tasks/rook.py", line 38, in _kubectl
    **kwargs
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b58f2c18636eb10faa77ed3614abd00cb85dfc2c/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b58f2c18636eb10faa77ed3614abd00cb85dfc2c/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b58f2c18636eb10faa77ed3614abd00cb85dfc2c/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b58f2c18636eb10faa77ed3614abd00cb85dfc2c/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi008 with status 1: 'kubectl create -f rook/cluster/examples/kubernetes/ceph/crds.yaml -f rook/cluster/examples/kubernetes/ceph/common.yaml -f operator.yaml'

https://pulpito.ceph.com/adking-2022-09-01_17:38:25-orch:rook-wip-adk-testing-2022-08-29-1644-distro-default-smithi/

Do you think this is just another issue that was masked by the one being addressed in this PR? If so, we could still merge this and a new tracker could be opened for the new issue.

@phlogistonjohn
Copy link
Contributor Author

there's another tracker for the rook podsecuritypolicy issue. https://tracker.ceph.com/issues/57311
the current title is a bit misleading. This issue is only hit if using K8S 1.25. The tigera CRD size issue also impacts 1.24 (that's where I first hit it independent of ceph)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants