Skip to content

Set session scanning to manual to avoid discovering all iSCSI devices…#90985

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
mrobson:iscsi-nodescan-manual
May 20, 2020
Merged

Set session scanning to manual to avoid discovering all iSCSI devices…#90985
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
mrobson:iscsi-nodescan-manual

Conversation

@mrobson
Copy link
Copy Markdown
Contributor

@mrobson mrobson commented May 11, 2020

… during login

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR sets node.session.scan to manual to avoid unused LUNs being discovered and mapped on a node when it does its initial login. If that happens, the devices are never removed.

Which issue(s) this PR fixes:
Fixes #90982

Special notes for your reviewer:
Some older iscsi package may not support node.session.scan - it was added here: open-iscsi/open-iscsi#40

This will not negatively impact anyone and would log a warning letting them know.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2020
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @mrobson. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 11, 2020
@k8s-ci-robot k8s-ci-robot requested review from jsafrane and msau42 May 11, 2020 16:18
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 11, 2020

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 11, 2020
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 11, 2020

/assign humblec

@jsafrane
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2020
@jsafrane
Copy link
Copy Markdown
Member

/assign @bswartz

Ben, can you please take a look? It's related to your commit 6d23d8e (and its commit message).

@bswartz
Copy link
Copy Markdown
Contributor

bswartz commented May 11, 2020

This approach is something I didn't know was possible. I will definitely do some testing of my own to see if this does what I think it does. If so, then we definitely want this change.

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 11, 2020

Thanks @bswartz - I did some local testing but would appreciate the verification! You can also do it at the session level like: iscsiadm -m session -r <session_id> -o update -n node.session.scan -v manual but that just seemed overly complicated versus what's currently being done.

@humblec
Copy link
Copy Markdown
Contributor

humblec commented May 12, 2020

/retest

@humblec
Copy link
Copy Markdown
Contributor

humblec commented May 12, 2020

Yeah, the approach looks good to me. But lets wait for verification.

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 12, 2020

For completeness, here is the basic test I ran with 3 LUNs:

/iscsi/iqn.20...et1/tpg1/luns> cd /
/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 3]
  | | o- scsi_disk1 ....................................................... [/dev/vg_iscsi1/lv_iscsi1 (5.0GiB) write-thru activated]
  | | | o- alua ................................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | | o- scsi_disk2 ....................................................... [/dev/vg_iscsi2/lv_iscsi2 (5.0GiB) write-thru activated]
  | | | o- alua ................................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | | o- scsi_disk3 ...................................................... [/dev/vg_iscsi3/lv_iscsi3 (10.0GiB) write-thru activated]
  | |   o- alua ................................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2020-05.matt.redhat:target1 ................................................................................... [TPGs: 1]
  |   o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
  |     o- acls .......................................................................................................... [ACLs: 1]
  |     | o- iqn.2020-05.matt.redhat:infra1 ....................................................................... [Mapped LUNs: 3]
  |     |   o- mapped_lun0 ............................................................................ [lun0 block/scsi_disk1 (rw)]
  |     |   o- mapped_lun1 ............................................................................ [lun1 block/scsi_disk2 (rw)]
  |     |   o- mapped_lun2 ............................................................................ [lun2 block/scsi_disk3 (rw)]
  |     o- luns .......................................................................................................... [LUNs: 3]
  |     | o- lun0 ................................................. [block/scsi_disk1 (/dev/vg_iscsi1/lv_iscsi1) (default_tg_pt_gp)]
  |     | o- lun1 ................................................. [block/scsi_disk2 (/dev/vg_iscsi2/lv_iscsi2) (default_tg_pt_gp)]
  |     | o- lun2 ................................................. [block/scsi_disk3 (/dev/vg_iscsi3/lv_iscsi3) (default_tg_pt_gp)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ..................................................................................................... [OK]
  o- loopback ......................................................................................................... [Targets: 0]

Preform current discovery / login steps, minus CHAP:

# iscsiadm -m discoverydb -t sendtargets -p 10.10.100.34:3260 -I default -o new
# iscsiadm -m discoverydb -t sendtargets -p 10.10.100.34:3260 -I default --discover
# iscsiadm -m node -p 10.10.100.34:3260 -T iqn.2020-05.matt.redhat:target1 -I default --login

Logging in to [iface: default, target: iqn.2020-05.matt.redhat:target1, portal: 10.10.100.34,3260] (multiple)
Login to [iface: default, target: iqn.2020-05.matt.redhat:target1, portal: 10.10.100.34,3260] successful.

All 3 LUNs are discovered:

# lsscsi
[5:0:0:0]    disk    LIO-ORG  scsi_disk1       4.0   /dev/sda
[5:0:0:1]    disk    LIO-ORG  scsi_disk2       4.0   /dev/sdb
[5:0:0:2]    disk    LIO-ORG  scsi_disk3       4.0   /dev/sdc
ls /dev/disk/by-path/
fc---lun-0  ip-10.10.100.34:3260-iscsi-iqn.2020-05.matt.redhat:target1-lun-0  pci-0000:00:04.0        pci-0000:00:05.0-part1   virtio-pci-0000:00:04.0-part1  virtio-pci-0000:00:05.0-part2
fc---lun-1  ip-10.10.100.34:3260-iscsi-iqn.2020-05.matt.redhat:target1-lun-1  pci-0000:00:04.0-part1  pci-0000:00:05.0-part2   virtio-pci-0000:00:05.0
fc---lun-2  ip-10.10.100.34:3260-iscsi-iqn.2020-05.matt.redhat:target1-lun-2

Logout

# iscsiadm -m node -T iqn.2020-05.matt.redhat:target1 -p 10.10.100.34:3260 -u

Logging out of session [sid: 4, target: iqn.2020-05.matt.redhat:target1, portal: 10.10.100.34,3260]
Logout of [sid: 4, target: iqn.2020-05.matt.redhat:target1, portal: 10.10.100.34,3260] successful.

All gone

# lsscsi

Preform discovery / login steps, minus CHAP, setting node.session.scan to manual before login :

# iscsiadm -m discoverydb -t sendtargets -p 10.10.100.34:3260 -I default -o new
# iscsiadm -m discoverydb -t sendtargets -p 10.10.100.34:3260 -I default --discover
# iscsiadm -m node -p 10.10.100.34:3260 -T iqn.2020-05.matt.redhat:target1 -I default -o update -n node.session.scan -v manual
# iscsiadm -m node -p 10.10.100.34:3260 -T iqn.2020-05.matt.redhat:target1 -I default --login

Empty

# lsscsi

Discover one LUN

# echo '0 0 0' > /sys/class/scsi_host/host6/scan
# lsscsi
[6:0:0:0]    disk    LIO-ORG  scsi_disk1       4.0   /dev/sda

Discover another LUN

# echo '0 0 2' > /sys/class/scsi_host/host6/scan
lsscsi
[6:0:0:0]    disk    LIO-ORG  scsi_disk1       4.0   /dev/sda
[6:0:0:2]    disk    LIO-ORG  scsi_disk3       4.0   /dev/sdb

@mrobson mrobson force-pushed the iscsi-nodescan-manual branch from f6856c0 to ea0fdca Compare May 12, 2020 17:53
@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 12, 2020
@mrobson mrobson force-pushed the iscsi-nodescan-manual branch from ea0fdca to 0389209 Compare May 12, 2020 17:56
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 12, 2020
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 12, 2020

/retest

1 similar comment
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 12, 2020

/retest

@bswartz
Copy link
Copy Markdown
Contributor

bswartz commented May 13, 2020

I'm completed my initial tests and this does indeed seem to work as expected. Just need to make sure this is called on the write codepaths inside kubelet.

@bswartz
Copy link
Copy Markdown
Contributor

bswartz commented May 13, 2020

@mrobson wrote:

Thanks @bswartz - I did some local testing but would appreciate the verification! You can also do it at the session level like: iscsiadm -m session -r <session_id> -o update -n node.session.scan -v manual but that just seemed overly complicated versus what's currently being done.

Based on my understanding, there is no session until after you log in, and by the time you log in, it's too late to set this option, so that would not be a viable approach. Setting at the node level does seem to be exactly what we want though.

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 13, 2020

/retest

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 13, 2020

I'm completed my initial tests and this does indeed seem to work as expected. Just need to make sure this is called on the write codepaths inside kubelet.

@bswartz From the testing / reproducer we ran, the only time this appears to be an issue is when AttachDisk is called and there is no existing login. Login only occurs again when a server is drained (all pod removed triggering logout), rebooted or when a logout happens naturally when the last mount in use is removed.

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 13, 2020

Yay, green! Moved the location to before the chap check @jsafrane @bswartz @humblec PTAL

@bswartz
Copy link
Copy Markdown
Contributor

bswartz commented May 13, 2020

The "iscsiadm -o update" command will fail on Ubuntu and Debian because they don't support the "node.session.scan" config option yet. I see that the code handles that case properly by logging a warning.

@bswartz
Copy link
Copy Markdown
Contributor

bswartz commented May 18, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2020
@jsafrane
Copy link
Copy Markdown
Member

/approve
Thanks a lot of for this PR!

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, mrobson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 19, 2020
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 19, 2020

/retest

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 20, 2020

/retest

1 similar comment
@mrobson
Copy link
Copy Markdown
Contributor Author

mrobson commented May 20, 2020

/retest

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic session scanning discovers all iSCSI devices during login and can cause LUN assignment issues

6 participants