Add PowerStore Metro, Multiple Preferred Nodes E2E Test by falfaroc · Pull Request #321 · dell/karavi-resiliency

falfaroc · 2025-08-08T16:11:28Z

Description

Add new scenario that tests and verifies PowerStore Metro + Resiliency when there are multiple preferred nodes and the preferred node that has the application pod goes down. This ensures that the pod migrates to another preferred node as expected.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #
https://github.com/dell/csm/issues/1961

Checklist:

I have performed a self-review of my own code to ensure there are no formatting, vetting, linting, or security issues
I have verified that new and existing unit tests pass locally with my changes
I have not allowed coverage numbers to degenerate
I have maintained at least 90% code coverage
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Backward compatibility is not broken

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

Add and run scenario of PowerStore Metro with multiple preferred nodes.

Clean Run:

$ make powerstore-metro-integration-test
RESILIENCY_INT_TEST="true" \
RESILIENCY_TEST_CLEANUP="true" \
POLL_K8S="true" \
SCRIPTS_DIR="../../test/sh" \
POWERSTORE_METRO="true" \
go test -timeout 6h -test.v -test.run "^\QTestPowerStoreFirstCheck\E|\QTestPowerStoreMetroIntegration\E"
=== RUN   TestPowerStoreFirstCheck
INFO[0000] RESILIENCY_INT_TEST_STOP_ON_FAILURE = true
Feature: Integration Test
  As a CSM for Resiliency developer
  I want to test CSM for Resiliency in a kubernetes environment
  So that it is known to work on various pod clean up cases and give consistent results
INFO[0000] attempting k8sapi connection
INFO[0000] Using kubeconfig /home/falfaroc/.kube/config
INFO[0000] connected to k8sapi
...

  Scenario Outline: Validate that we have a valid k8s configuration for the PowerStore metro integration tests # features/integration.feature:59
INFO[0000] Driver csi-powerstore.dellemc.com exists on the cluster
...
    Given a kubernetes <kubeConfig>                                                                            # <autogenerated>:1 -> *integration
    And test environmental variables are set                                                                   # <autogenerated>:1 -> *integration
    And these CSI driver <driverNames> are configured on the system                                            # <autogenerated>:1 -> *integration
    And these storageClasses <storageClasses> exist in the cluster                                             # <autogenerated>:1 -> *integration
    And there is a <namespace> in the cluster                                                                  # <autogenerated>:1 -> *integration
    And there are driver pods in <namespace> with this <name> prefix                                           # <autogenerated>:1 -> *integration
    And can logon to nodes and drop test scripts                                                               # <autogenerated>:1 -> *integration

    Examples:
      | kubeConfig | driverNames                  | namespace    | name         | storageClasses     |
      | ""         | "csi-powerstore.dellemc.com" | "powerstore" | "powerstore" | "powerstore-metro" |

1 scenarios (1 passed)
7 steps (7 passed)
1m32.717367134s
INFO[0092] Integration setup check finished
--- PASS: TestPowerStoreFirstCheck (92.73s)
=== RUN   TestPowerStoreMetroIntegration
INFO[0092] RESILIENCY_INT_TEST_STOP_ON_FAILURE = true
INFO[0092] Starting PowerStore Metro integration test
Feature: Integration Test
  As a CSM for Resiliency developer
  I want to test CSM for Resiliency in a kubernetes environment
  So that it is known to work on various pod clean up cases and give consistent results
INFO[0092] attempting k8sapi connection
...
  Scenario Outline: Preferred site node failover testing using test StatefulSet pods (node interface down)                                                 # features/integration.feature:215
INFO[0093] Removing preferred labels from nodes
INFO[0093] Checking if all the nodes are in 'Ready' state
INFO[0093] Checking if nodes have taints
INFO[0093] Taints were not found on the nodes.
...
NAME: pmtps1
LAST DEPLOYED: Fri Aug  8 14:45:01 2025
NAMESPACE: pmtps1
STATUS: deployed
REVISION: 1
TEST SUITE: None
INFO[0155] Waiting up to 600 seconds for pods to deploy
...
    Given a kubernetes <kubeConfig>                                                                                                                        # <autogenerated>:1 -> *integration
    And cluster is clean of test pods                                                                                                                      # <autogenerated>:1 -> *integration
    And wait <nodeCleanSecs> to see there are no taints                                                                                                    # <autogenerated>:1 -> *integration
    And label <workers> node as <preferred> site                                                                                                           # <autogenerated>:1 -> *integration
    And <podsPerNode> pods per node with <nVol> volumes and <nDev> devices using <driverType> and <storageClass> in <deploySecs> with <preferred> affinity # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <deploySecs> seconds                                                                                    # <autogenerated>:1 -> *integration
    And all pods are running on <preferred> node                                                                                                           # <autogenerated>:1 -> *integration
    When I fail labeled <preferred> nodes with <failure> failure for <failSecs> seconds                                                                    # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <runSecs> seconds                                                                                       # <autogenerated>:1 -> *integration
    And labeled pods are on a different node                                                                                                               # <autogenerated>:1 -> *integration
    And the taints for the failed nodes are removed within <nodeCleanSecs> seconds                                                                         # <autogenerated>:1 -> *integration
    Then finally cleanup everything                                                                                                                        # <autogenerated>:1 -> *integration

    Examples:
      | kubeConfig | podsPerNode | nVol  | nDev  | driverType   | storageClass       | workers     | primary | failure         | failSecs | deploySecs | runSecs | nodeCleanSecs | preferred |
      | ""         | "1-1"       | "1-1" | "0-0" | "powerstore" | "powerstore-metro" | "one-third" | "zero"  | "interfacedown" | 240      | 600        | 600     | 600           | "site"    |
INFO[0585] attempting k8sapi connection
INFO[0585] Using kubeconfig /home/falfaroc/.kube/config
INFO[0585] connected to k8sapi
...

  Scenario Outline: Preferred site node failover to preferred node (w/ metro, multiple preferred nodes)                                                    # features/integration.feature:234
INFO[0585] Node master-1-up9snjq0d5fyy.domain is a control plane node
INFO[0585] Attempting to clean up everything for driverType 'powerstore'
INFO[0585] Removing preferred labels from nodes
INFO[0585] Checking if all the nodes are in 'Ready' state
INFO[0585] Checking if nodes have taints
INFO[0585] Taints were not found on the nodes.
...
INFO[1076] Removing preferred labels from nodes
    Given a kubernetes <kubeConfig>                                                                                                                        # <autogenerated>:1 -> *integration
    And there are at least <nNodes> worker nodes which are ready                                                                                           # <autogenerated>:1 -> *integration
    And cluster is clean of test pods                                                                                                                      # <autogenerated>:1 -> *integration
    And wait <nodeCleanSecs> to see there are no taints                                                                                                    # <autogenerated>:1 -> *integration
    And label <workers> node as <preferred> site                                                                                                           # <autogenerated>:1 -> *integration
    And <podsPerNode> pods per node with <nVol> volumes and <nDev> devices using <driverType> and <storageClass> in <deploySecs> with <preferred> affinity # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <deploySecs> seconds                                                                                    # <autogenerated>:1 -> *integration
    And all pods are running on <preferred> node                                                                                                           # <autogenerated>:1 -> *integration
    When I fail <workers> nodes with label <preferred> with <failure> failure for <failSecs> seconds                                                       # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <runSecs> seconds                                                                                       # <autogenerated>:1 -> *integration
    And labeled pods are on a different node                                                                                                               # <autogenerated>:1 -> *integration
    And the taints for the failed nodes are removed within <nodeCleanSecs> seconds                                                                         # <autogenerated>:1 -> *integration
    Then finally cleanup everything                                                                                                                        # <autogenerated>:1 -> *integration

    Examples:
      | kubeConfig | nNodes | podsPerNode | nVol  | nDev  | driverType   | storageClass       | workers    | failure         | failSecs | deploySecs | runSecs | nodeCleanSecs | preferred |
      | ""         | 4      | "1-1"       | "1-1" | "0-0" | "powerstore" | "powerstore-metro" | "one-half" | "interfacedown" | 240      | 600        | 600     | 600           | "site"    |

2 scenarios (2 passed)
25 steps (25 passed)
16m23.829281099s
INFO[1076] Integration test finished
--- PASS: TestPowerStoreMetroIntegration (983.84s)
PASS
status 0
ok      podmon/internal/monitor 1076.600s

Ensure that multiple preferred nodes are needed for this test and this scenario is skipped if that isn't satisfied.

  Scenario Outline: Preferred site node failover to preferred node (w/ metro, multiple preferred nodes)                                                    # features/integration.feature:234
INFO[0104] Node master-1-njo3bz3jjysjb is a control plane node
WARN[0104] Skipping this scenario. Expected at least 4 but found 2
    Given a kubernetes <kubeConfig>                                                                                                                        # <autogenerated>:1 -> *integration
    And there are at least <nNodes> worker nodes which are ready                                                                                           # <autogenerated>:1 -> *integration
    And cluster is clean of test pods                                                                                                                      # <autogenerated>:1 -> *integration
    And wait <nodeCleanSecs> to see there are no taints                                                                                                    # <autogenerated>:1 -> *integration
    And label <workers> node as <preferred> site                                                                                                           # <autogenerated>:1 -> *integration
    And <podsPerNode> pods per node with <nVol> volumes and <nDev> devices using <driverType> and <storageClass> in <deploySecs> with <preferred> affinity # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <deploySecs> seconds                                                                                    # <autogenerated>:1 -> *integration
    And all pods are running on <preferred> node                                                                                                           # <autogenerated>:1 -> *integration
    When I fail <workers> nodes with label <preferred> with <failure> failure for <failSecs> seconds                                                       # <autogenerated>:1 -> *integration
    Then validate that all pods are running within <runSecs> seconds                                                                                       # <autogenerated>:1 -> *integration
    And labeled pods are on a different node                                                                                                               # <autogenerated>:1 -> *integration
    And the taints for the failed nodes are removed within <nodeCleanSecs> seconds                                                                         # <autogenerated>:1 -> *integration
    Then finally cleanup everything                                                                                                                        # <autogenerated>:1 -> *integration

    Examples:
      | kubeConfig | nNodes | podsPerNode | nVol  | nDev  | driverType   | storageClass       | workers    | failure         | failSecs | deploySecs | runSecs | nodeCleanSecs | preferred |
      | ""         | 4      | "1-1"       | "1-1" | "0-0" | "powerstore" | "powerstore-metro" | "one-half" | "interfacedown" | 240      | 600        | 600     | 600           | "site"    |

1 scenarios (1 passed)
13 steps (1 passed, 12 skipped)
322.868553ms
INFO[0104] Integration test finished
--- PASS: TestPowerStoreMetroIntegration (0.33s)
PASS
status 0
ok      podmon/internal/monitor 104.399s

falfaroc · 2025-08-12T15:42:26Z

Setting to draft until dependent PR is merged.

The base branch was changed.

github-actions · 2025-08-13T14:01:21Z

Merging this branch will not change overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/dell/karavi-resiliency/internal/monitor	0.00% (ø)
github.com/dell/karavi-resiliency/test/ssh	0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/dell/karavi-resiliency/test/ssh/client.go	0.00% (ø)	0	0	0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

github.com/dell/karavi-resiliency/internal/monitor/integration_steps_test.go

Add PowerStore metro integration test

d1657fd

falfaroc force-pushed the usr/falfaroc/add-metro-node-failure-test branch from aad054c to b279a18 Compare August 8, 2025 19:08

falfaroc marked this pull request as ready for review August 8, 2025 19:12

falfaroc requested review from EvgenyUglov, HarishH-DELL, alexemc, alikdell, anathoodell, atye, chimanjain, nitesh3108, panigs7, rajendraindukuri, rbo54 and shaynafinocchiaro August 8, 2025 19:12

lukeatdell previously approved these changes Aug 11, 2025

View reviewed changes

Comment thread internal/monitor/features/integration.feature Outdated

lukeatdell mentioned this pull request Aug 11, 2025

Support metro volumes in csm-resiliency dell/csi-powerstore#533

Merged

8 tasks

Check to ensure migrated pods are on a preferred node

7efef4b

xuluna force-pushed the usr/luna/preferred-site branch from 4b9bf2f to 75571ab Compare August 12, 2025 15:06

falfaroc marked this pull request as draft August 12, 2025 15:42

Base automatically changed from usr/luna/preferred-site to main August 12, 2025 18:13

falfaroc force-pushed the usr/falfaroc/add-metro-node-failure-test branch from 898929b to 7efef4b Compare August 12, 2025 18:26

falfaroc marked this pull request as ready for review August 12, 2025 18:26

Clean up rebased code

a607de4

falfaroc requested a review from lukeatdell August 12, 2025 18:28

falfaroc force-pushed the usr/falfaroc/add-metro-node-failure-test branch 3 times, most recently from 49bc15e to a607de4 Compare August 12, 2025 19:19

lukeatdell previously approved these changes Aug 12, 2025

View reviewed changes

Comment thread internal/monitor/integration_steps_test.go

Address PR comments

484503a

falfaroc dismissed lukeatdell’s stale review via 9830304 August 12, 2025 20:11

falfaroc requested a review from lukeatdell August 12, 2025 20:12

lukeatdell previously approved these changes Aug 12, 2025

View reviewed changes

shaynafinocchiaro reviewed Aug 13, 2025

View reviewed changes

Comment thread test/podmontest/Makefile Outdated

falfaroc dismissed lukeatdell’s stale review via bffda3c August 13, 2025 13:56

anathoodell previously approved these changes Aug 13, 2025

View reviewed changes

falfaroc dismissed anathoodell’s stale review via 484503a August 13, 2025 13:57

falfaroc force-pushed the usr/falfaroc/add-metro-node-failure-test branch from bffda3c to 484503a Compare August 13, 2025 13:57

falfaroc requested review from anathoodell, lukeatdell and shaynafinocchiaro August 13, 2025 13:58

lukeatdell approved these changes Aug 13, 2025

View reviewed changes

shaynafinocchiaro approved these changes Aug 13, 2025

View reviewed changes

falfaroc merged commit 6c1fe06 into main Aug 13, 2025
6 checks passed

falfaroc deleted the usr/falfaroc/add-metro-node-failure-test branch August 13, 2025 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PowerStore Metro, Multiple Preferred Nodes E2E Test#321

Add PowerStore Metro, Multiple Preferred Nodes E2E Test#321
falfaroc merged 4 commits into
mainfrom
usr/falfaroc/add-metro-node-failure-test

falfaroc commented Aug 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

falfaroc commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Aug 13, 2025

Changed files (no unit tests)

Changed unit test files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

falfaroc commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

GitHub Issues

Checklist:

How Has This Been Tested?

Uh oh!

Uh oh!

falfaroc commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Aug 13, 2025

Merging this branch will not change overall coverage

Changed files (no unit tests)

Changed unit test files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

falfaroc commented Aug 8, 2025 •

edited

Loading