Skip to content

Add reachable IP check to avoid node pod crash#459

Merged
santhoshatdell merged 6 commits into
mainfrom
reachable-ip
Feb 28, 2025
Merged

Add reachable IP check to avoid node pod crash#459
santhoshatdell merged 6 commits into
mainfrom
reachable-ip

Conversation

@santhoshatdell

Copy link
Copy Markdown
Contributor

Description

  • PowerMax node pods are crashing when the IP interfaces for first array are not reachable and when it processes the second/next iSCSI array - if 2-minute timeout is reached for NodeGetInfo() call, the port-group API call fails with 'context cancelled' error.

  • This is addressed by adding an IP reachability check similar to other iSCSI supported drivers - PowerStore and Unity. We can potentially avoid the node pod crash and topology map will contain at least the second array.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #
https://github.com/dell/csm/issues/1769

Checklist:

  • Have you run format,vet & lint checks against your submission?
  • Have you made sure that the code compiles?
  • Did you run the unit & integration tests successfully?
  • Have you maintained at least 90% code coverage?
  • Have you commented your code, particularly in hard-to-understand areas
  • Have you done corresponding changes to the documentation
  • Did you run tests in a real Kubernetes cluster?
  • Backward compatibility is not broken

How Has This Been Tested?

  • Tested the driver with and without the fix. With fix, the node pod didn't crash and NodeGetInfo() had second array in its topology list.
[root@master-1-0BX8wDGBJidld samples]# k get csm -A
NAMESPACE   NAME       CREATIONTIME   CSIDRIVERTYPE   CONFIGVERSION   STATE
powermax    powermax   117s           powermax        v2.13.0         Succeeded

[root@master-1-0BX8wDGBJidld samples]# k get pod -n powermax
NAME                                   READY   STATUS    RESTARTS       AGE
powermax-controller-5dcd5c9fc4-szlvp   6/6     Running   8 (103s ago)   2m
powermax-node-6rnr4                    2/2     Running   2 (75s ago)    2m
powermax-node-8x5fh                    2/2     Running   1 (79s ago)    2m

time="2025-02-20T15:13:43Z" level=info msg="TransportProtocol ISCSI FC portWWNs: [] ... IQNs: [iqn.1994-05.com.redhat:11ca66677777] ... HostNQNs: [] "
time="2025-02-20T15:13:43Z" level=debug msg="GetSymmetrixIDList returned: [000120000001 000120000002]"
time="2025-02-20T15:14:03Z" level=info msg="/csi.v1.Node/NodeGetInfo: REQ 0002: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2025-02-20T15:14:05Z" level=info msg="IP interface(10.20.30.41) of array(000120000001) is not rechable from the node"
time="2025-02-20T15:14:07Z" level=info msg="IP interface(10.20.30.42) of array(000120000001) is not rechable from the node"
time="2025-02-20T15:14:07Z" level=info msg="IP interface(10.20.30.43) of array(000120000001) is not rechable from the node"
time="2025-02-20T15:14:07Z" level=info msg="Node label 'max-powermax-volumes-per-node' is not available. Retrieving the value from yaml file"
time="2025-02-20T15:14:07Z" level=info msg="/csi.v1.Node/NodeGetInfo: REP 0002: NodeId=worker-1-0bx8wdgbjidld, MaxVolumesPerNode=0, AccessibleTopology=segments:<key:\"csi-powermax.dellemc.com/000120000002\" value:\"csi-powermax.dellemc.com\" > segments:<key:\"csi-powermax.dellemc.com/000120000002.iscsi\" value:\"csi-powermax.dellemc.com\" > segments:<key:\"csi-powermax.dellemc.com/000120000002.nfs\" value:\"csi-powermax.dellemc.com\" > , XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"

  • cert-csi basic tests passed.

@santhoshatdell

Copy link
Copy Markdown
Contributor Author

UT check is failing with coverage check which is currently addressed by another PR. This PR will be updated once that is merged.
Please review the current changes in this PR.

jooseppi-luna
jooseppi-luna previously approved these changes Feb 24, 2025

@jooseppi-luna jooseppi-luna left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                                                                           
                                                                           
lllllll                               tttt                                  
lllllll                            ttttttt                                  
lllllll                            ttttttt                                  
lllllll                            ttttttt                                  
 llllll     ggggggggg   ggggg ttttttttttttttttttt        mmmmmmm    mmmmmmm   
 llllll    gggggggggggggggggg ttttttttttttttttttt      mmmmmmmmmm  mmmmmmmmmm 
 llllll   ggggggggggggggggggg ttttttttttttttttttt     mmmmmmmmmmmmmmmmmmmmmmmm
 llllll  gggggggggggggggggggg ttttttttttttttttttt     mmmmmmmmmmmmmmmmmmmmmmmm
 llllll  ggggggg     ggggggg        ttttttt           mmmmmmmmmmmmmmmmmmmmmmmm
 llllll  ggggggg     ggggggg        ttttttt           mmmmmm   mmmmmm   mmmmmm
 llllll  ggggggg     ggggggg        ttttttt           mmmmmm   mmmmmm   mmmmmm
 llllll  gggggggg    ggggggg        ttttttt    tttttt mmmmmm   mmmmmm   mmmmmm
llllllll ggggggggggggggggggg        ttttttttttttttttt mmmmmm   mmmmmm   mmmmmm
llllllll  gggggggggggggggggg        ttttttttttttttttt mmmmmm   mmmmmm   mmmmmm
llllllll   ggggggggggggggggg          ttttttttttttttt mmmmmm   mmmmmm   mmmmmm
llllllll     ggggggggggggggg            ttttttttttt   mmmmmm   mmmmmm   mmmmmm
                     ggggggg                                                
         gggggg      ggggggg                                                
         gggggggg   gggggggg                                                
          gggggggggggggggggg                                                
           ggggggggggggggggg                                                 
             ggggggggggggg                                                   
                gggggg                                                      

abhi16394
abhi16394 previously approved these changes Feb 24, 2025
adarsh-dell
adarsh-dell previously approved these changes Feb 25, 2025

@adarsh-dell adarsh-dell left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@github-actions

Copy link
Copy Markdown
Contributor

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/dell/csi-powermax/service 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/dell/csi-powermax/service/node.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/dell/csi-powermax/service/node_test.go

Comment thread service/node_test.go

@donatwork donatwork left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@santhoshatdell santhoshatdell merged commit 2858acb into main Feb 28, 2025
@santhoshatdell santhoshatdell deleted the reachable-ip branch February 28, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants