Kafka e2e: Bump Strimzi/Kafka for Kube 1.33, fix offset test flakes #6929
Kafka e2e: Bump Strimzi/Kafka for Kube 1.33, fix offset test flakes #6929JorTurFer merged 3 commits intokedacore:mainfrom
Conversation
f6b071c to
0c36265
Compare
|
Wow, that kafka test! I'm so sorry about that! But thanks for running it to ground and fixing it! /lgtm |
dttung2905
left a comment
There was a problem hiding this comment.
Thanks alot for the fix. I agree it is due time that we make an upgrade to the Kafka test suites, considering Kafka 4.0 with Kraft support is out
tests/helper/helper.go
Outdated
| StringTrue = "true" | ||
|
|
||
| StrimziVersion = "0.35.0" | ||
| StrimziVersion = "0.46.0" |
There was a problem hiding this comment.
I see that the latest is 0.47 now. Should we bump to 0.47 or is there other reason that you specifically choose 0.46
https://github.com/strimzi/strimzi-kafka-operator/releases?
There was a problem hiding this comment.
I went to 0.46.0 because that was the first version where it would work again. We sat on 0.35.0 for awhile and that was from like...June of 2023? 😄
I can bump this to 0.47.0 if you want to go to 0.47.0, I was going for least change.
There was a problem hiding this comment.
There is another PR updating the strimizi version. So please go to 0.47.0 :)
There was a problem hiding this comment.
Thanks! Bumped this to 0.47.0. I was like "how are they passing e2e over in that other one without the rest of the fixes" but it looks like they only ran cron?
There was a problem hiding this comment.
Sorry, that was my mistake. That should indeed have been Kafka.
Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com>
The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com>
0c36265 to
8b4dfe4
Compare
|
Rebased and bumped strimzi to 0.47.0 (latest). |
|
/run-e2e kafka |
|
I did run these locally before I PR'd it, I promise 😄. In the new CRD definition zookeeper isn't required. So hmm...somehow...are we getting the old one during the test run? |
Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com>
|
So it looks like:
So we're stuck with those old CRDs unless we delete them. I added a "clean up Strimzi CRDs" section to the e2e-cleanup, but I'm having "race condition" feelings. Like if two concurrent runs of kafka e2e happen on separate PRs, the first one will probably blow the CRDs out from under the second one. It looks like already have most of that risk given that we install Strimzi to just the |
|
/run-e2e kafka |
|
I've deleted all the Kafka CRDs from both clusters and triggered the e2e test again. in any case, old CRDs can be reinstalled by other e2e tests, currently, @zroubalik @wozniakjan and me have access to the cluster, just ping us if they have to be deleted again |
…edacore#6929) * Update strimzi to 0.47.0 for Kube 1.33, fix setup Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com> * Move Kafka offset tests to own consumer group The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com> * Delete strimzi CRDs during e2e test cleanup Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com> --------- Signed-off-by: John Kyros <jkyros@redhat.com>
…edacore#6929) * Update strimzi to 0.47.0 for Kube 1.33, fix setup Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com> * Move Kafka offset tests to own consumer group The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com> * Delete strimzi CRDs during e2e test cleanup Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com> --------- Signed-off-by: John Kyros <jkyros@redhat.com> Signed-off-by: David Pochopsky <david.pochopsky@united.com>
…edacore#6929) * Update strimzi to 0.47.0 for Kube 1.33, fix setup Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com> * Move Kafka offset tests to own consumer group The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com> * Delete strimzi CRDs during e2e test cleanup Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com> --------- Signed-off-by: John Kyros <jkyros@redhat.com> Signed-off-by: Dmitriy Altuhov <altuhovd@gmail.com>
…edacore#6929) * Update strimzi to 0.47.0 for Kube 1.33, fix setup Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com> * Move Kafka offset tests to own consumer group The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com> * Delete strimzi CRDs during e2e test cleanup Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com> --------- Signed-off-by: John Kyros <jkyros@redhat.com>
…edacore#6929) * Update strimzi to 0.47.0 for Kube 1.33, fix setup Kube 1.33 added an emulationMajor field to the version API, which breaks the version parsing of Strimzi older than 0.47.0, which causes our Strimzi steup to fail to start on kube >= 1.33. Additionally, the faulure mode for this was silent, as all we currently test for is whether the helm chart for Strimzi was successfully applied. To rectify this, this does the following to the kafka scaler tests: - Waits for the Strimzi deployment to become available on setup - Bumps Strimzi version for tests to 0.47.0 - Bumps Kafka version for tests to 4.0.0 (3.4.0 is too old for Strimzi) - Configures Kafka for KRaft since Zookeeper has been deprecated - Disables topic finalization so topics don't block namespace deletion Signed-off-by: John Kyros <jkyros@redhat.com> * Move Kafka offset tests to own consumer group The Kafka offset tests flake in some environments if you move through the test cases too fast -- the state from the consumer group in the previous test seems to leak through to the next one because they are sharing a consumer group, and thus will share offsets. . This fixes these flakes by moving each of these tests to their own consumer group to prevent this test pollution. They will still share the same topic, which is fine, the offset is consumer-group specific. Signed-off-by: John Kyros <jkyros@redhat.com> * Delete strimzi CRDs during e2e test cleanup Strimzi installs its CRDs in the cluster when it does a helm install during the e2e test run, but helm is a big chicken and won't overwrite or remove any CRDs during cleanup/reinstall, so we're stuck with the first versions helm installed unless something explicitly removes them. That wouldn't be a problem if we were grabbing a fresh cluster every time, but we're not. We just scale up an existing one and create some testing namespaces, so those old CRDs conflict with the newer versions of Strimzi we're trying to move to. This just adds strimzi CRD cleanup to the e2e cleanup script so they get removed at the end of a test run, so the next test run can install the proper ones. Signed-off-by: John Kyros <jkyros@redhat.com> --------- Signed-off-by: John Kyros <jkyros@redhat.com>
Running our Kafka tests against kube 1.33, the version of Strimzi we're pegged to doesn't like it:
Strimzi fixed it in 0.46.0: strimzi/strimzi-kafka-operator#11456 (comment), but they aren't backporting, so we need to upgrade to a new version.
This:
Also:
Checklist
[ ] When introducing a new scaler, I agree with the scaling governance policy[ ] Tests have been added[ ] Changelog has been updated and is aligned with our changelog requirements[ ] A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)[ ] A PR is opened to update the documentation on (repo) (if applicable)Fixes #
Relates to #