Skip to content

Remove Ginkgo-based tests #37837

@joestringer

Description

@joestringer

We've gone through several iterations of the primary test harness used for building end-to-end tests in Cilium. Initially we built tests in bash and Vagrant, then subsequently Ginkgo with Jenkins, and now a combination of cilium-cli and GitHub actions. Over time, we lose the specialized knowledge necessary to adequately maintain older test harnesses and they begin to deteriorate. In order to provide reliable testing that is maintainable, we should gradually retire older frameworks so we can collectively focus effort on a smaller, simpler set of infrastructure. To that end, this issue is focused on removing Ginkgo testing from the tree.

Some of the key flaws of our use of Ginkgo were that the we tied bootstrap into the tests themselves which caused test instability and slowness, we developed extensions on top of the Ginkgo framework that raised the barrier to contribution, and the tests were almost exclusively end-to-end tests. To remedy these issues, the CLI-based framework split bootstrap into dedicated GitHub workflow steps to better isolate deploying a testable environment from the tests themselves. This is the basis for dedicated workflows per cloud for instance, or for specific primary configurations of Cilium. With this simplification of the architecture, we could focus the CLI testing framework on providing just a set of test scenarios with individual actions inside. We've seen a significant reduction in tests failing due to leftover state compared with the older style tests.

One area that the CLI-based tests do not specifically address is the emphasis on end-to-end testing. The end-to-end tests will continue to provide confirmation about the overall behavior of Cilium, however for us to build reliable end-to-end tests, we need to isolate the factors that may introduce instability into those broader tests. The current thinking about how to approach this is to build component testing using Go tests around Cells. The Hive library has hive/script and scripttest capabilities which allow developers to express operations and desired state in a simple manner using txtar. An example of this style of testing can be seen in the CiliumEnvoyConfig tests, triggered via hive/script here. To find other examples, run git grep -e scripttest -e txtar.

How should we remove Ginkgo-based tests? For a given test, we should evaluate what coverage that test provides. If the test case is fully covered already by a test within cilium-cli, then we can simply remove the corresponding test from Ginkgo. If a test case is not yet covered by other test logic in the tree, then we can evaluate whether to implement the corresponding test in cilium-cli or (preferred) build component testing to provide that coverage, using hive/script as outlined in the developer docs.

Related analysis of Ginkgo runtime test coverage: #12258

Subtasks for this meta issue:

  • Create tracking issues for categories of runtime tests to migrate
  • Create tracking issues for categories of k8s tests to migrate

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CI-improvementTopic or proposal to improve the Continuous Integration workflowhelp-wantedYou can help! Post a detailed plan on the issue or create a PR to solve this issue.kind/metaMeta-task for co-ordination.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions