Skip to content

Add Preferred-site node failure scenario for metro resiliency#320

Merged
xuluna merged 11 commits into
mainfrom
usr/luna/preferred-site
Aug 12, 2025
Merged

Add Preferred-site node failure scenario for metro resiliency#320
xuluna merged 11 commits into
mainfrom
usr/luna/preferred-site

Conversation

@xuluna

@xuluna xuluna commented Aug 7, 2025

Copy link
Copy Markdown
Contributor

Description

Add a new scenario that fails node which are labeled as preferred site to test metro with Resiliency.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #
https://github.com/dell/csm/issues/1961

Checklist:

  • I have performed a self-review of my own code to ensure there are no formatting, vetting, linting, or security issues
  • I have verified that new and existing unit tests pass locally with my changes
  • I have not allowed coverage numbers to degenerate
  • I have maintained at least 90% code coverage
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • Backward compatibility is not broken

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

Unit test:

make unit-test:
PASS
coverage: 96.8% of statements
status 0
ok      podmon/cmd/podmon       1.578s  coverage: 96.8% of statements
make[1]: Leaving directory '/root/workspace/karavi-resiliency/cmd/podmon'

Integration test:

time="2025-08-06T21:10:01+01:00" level=info msg="k8sPoll: Node: worker-2-rjcs0evho6ev5.domain Ready:true Taints: "
    Given a kubernetes <kubeConfig>                                                                                                                        # <autogenerated>:1 -> *integration
    And cluster is clean of test pods                                                                                                                      # <autogenerated>:1 -> *integration
    And wait <nodeCleanSecs> to see there are no taints                                                                                                    # <autogenerated>:1 -> *integration
    And label <workers> node as <preferred> site                                                                                                           # <autogenerated>:1 -> *integration
    And <podsPerNode> pods per node with <nVol> volumes and <nDev> devices using <driverType> and <storageClass> in <deploySecs> with <preferred> affinity # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <deploySecs> seconds                                                                                    # <autogenerated>:1 -> *integration
    And all pods are running on <preferred> node                                                                                                           # <autogenerated>:1 -> *integration
    When I fail labeled <preferred> nodes with <failure> failure for <failSecs> seconds                                                                    # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <runSecs> seconds                                                                                       # <autogenerated>:1 -> *integration
    And labeled pods are on a different node                                                                                                               # <autogenerated>:1 -> *integration
    And the taints for the failed nodes are removed within <nodeCleanSecs> seconds                                                                         # <autogenerated>:1 -> *integration
    Then finally cleanup everything                                                                                                                        # <autogenerated>:1 -> *integration

    Examples:
      | kubeConfig | podsPerNode | nVol  | nDev  | driverType   | storageClass       | workers     | primary | failure         | failSecs | deploySecs | runSecs | nodeCleanSecs | preferred |
      | ""         | "1-1"       | "1-1" | "0-0" | "powerstore" | "powerstore-metro" | "one-third" | "zero"  | "interfacedown" | 240      | 600        | 600     | 600           | "site"    |

1 scenarios (1 passed)
12 steps (12 passed)
7m37.729621083s
time="2025-08-06T21:10:03+01:00" level=info msg="Integration test finished"
--- PASS: TestPowerStoreIntegration (457.76s)
PASS
status 0
ok      podmon/internal/monitor 479.952s

Comment thread internal/monitor/features/integration.feature Outdated
Comment thread internal/monitor/features/integration.feature Outdated
Comment thread internal/monitor/integration_steps_test.go Outdated
Comment thread internal/monitor/integration_steps_test.go Outdated
Comment thread internal/monitor/features/integration.feature Outdated
Comment thread internal/monitor/integration_steps_test.go Outdated
Comment thread test/podmontest/deploy/templates/test.yaml Outdated
@xuluna xuluna requested a review from falfaroc August 11, 2025 19:01
lukeatdell
lukeatdell previously approved these changes Aug 11, 2025
falfaroc
falfaroc previously approved these changes Aug 11, 2025
@xuluna xuluna dismissed stale reviews from falfaroc and lukeatdell via 33c1bb5 August 12, 2025 14:53
@github-actions

Copy link
Copy Markdown

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/dell/karavi-resiliency/internal/monitor 0.00% (ø)

Coverage by file

Changed unit test files

  • github.com/dell/karavi-resiliency/internal/monitor/integration_steps_test.go
  • github.com/dell/karavi-resiliency/internal/monitor/integration_test.go

falfaroc
falfaroc previously approved these changes Aug 12, 2025
@falfaroc

Copy link
Copy Markdown
Contributor

Rebase needed

@xuluna xuluna force-pushed the usr/luna/preferred-site branch from 4b9bf2f to 75571ab Compare August 12, 2025 15:06
@xuluna

xuluna commented Aug 12, 2025

Copy link
Copy Markdown
Contributor Author

Rebase needed

Rebased

@xuluna xuluna merged commit 8b512c7 into main Aug 12, 2025
6 checks passed
@xuluna xuluna deleted the usr/luna/preferred-site branch August 12, 2025 18:13
@xuluna xuluna restored the usr/luna/preferred-site branch August 12, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants