-
Notifications
You must be signed in to change notification settings - Fork 632
Description
Description
Currently, testing for CloudNativePG involves a series of unit tests that evaluate small components, as well as a comprehensive set of end-to-end tests that assess complete features, including responses to simulated failures. However, we lack a systematic approach to address failures that occur unexpectedly and randomly.
We want to expand the scope of our tests, introducing a full-fledged chaos testing framework that can better validate Cloudnativepg's resilience, fault tolerance, and recovery mechanisms.
By adopting chaos testing, we aim to:
- Increase confidence that our services remain functional under adverse conditions.
- Identify weak points that traditional testing may not uncover.
This self-contained project is perfect for the LFX Mentorship Program. The mentee can become a component owner of the project.
Expected outcomes
- Selection of a Kubernetes-native chaos testing framework (e.g., LitmusChaos or Chaos Mesh).
- Design and automation of an initial set of chaos experiments covering common failure scenarios.
- Integration of these experiments into CI/CD to ensure reproducible testing.
- Collection of clear observability metrics (e.g., failover time, data consistency) to assess resilience and recovery.
- Documentation and guidelines to help contributors create and run new chaos experiments safely.
Recommended Skills
- Experience with chaos testing frameworks (preferably LitmusChaos or Chaos Mesh).
- Familiarity with Kubernetes, PostgreSQL, and CloudNativePG.
- Understanding of observability tools such as Prometheus or Grafana.
Additional context
This should be a separate project under the CloudNativePG organisation.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status