Skip to content

Add pcie-check service to check PCIe devices at boot#4771

Merged
sujinmkang merged 14 commits intosonic-net:masterfrom
sujinmkang:pcie-mon
Jul 13, 2020
Merged

Add pcie-check service to check PCIe devices at boot#4771
sujinmkang merged 14 commits intosonic-net:masterfrom
sujinmkang:pcie-mon

Conversation

@sujinmkang
Copy link
Copy Markdown
Collaborator

@sujinmkang sujinmkang commented Jun 13, 2020

- Why I did it
To monitor the PCIe device status during boot time and initiate the rescan if the status fails
- How I did it
Created a systemd service to monitor the PCIe device using pcieutil
- How to verify it
check the syslog to see the service execution spawn off periodically and perform expectedly.
- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@sujinmkang sujinmkang marked this pull request as ready for review June 19, 2020 00:08
@sujinmkang
Copy link
Copy Markdown
Collaborator Author

retest this please

2 similar comments
@sujinmkang
Copy link
Copy Markdown
Collaborator Author

retest this please

@sujinmkang
Copy link
Copy Markdown
Collaborator Author

retest this please

@jleveque jleveque changed the title PCIe Monitor service Add pcie-check service to check PCIe devices at boot Jun 29, 2020
[Service]
Type=simple
ExecStart=/usr/bin/pcie-check.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space


VERBOSE="no"
RESULTS="PCIe Device Checking All Test"
PCIE_CHK_CMD=`sudo pcieutil pcie-check |grep "$RESULTS"`
Copy link
Copy Markdown
Contributor

@padmanarayana padmanarayana Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done based on pcie_check_sysfs() like API rather than thru' CLI ?
This will also help in avoiding to check the CLI string output for EXPECTED.

exit
else
debug "PCIe check failed, try pci bus rescan"
echo 1 > /sys/bus/pci/rescan
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here rescan is instantiated for all-devices. (sys/bus/pci/rescan). Rescan should be device specific.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. For now, we will rescan the entire bus. Rescanning individual devices will be a future enhancement.

exit
fi

for i in $(seq 1 1 $MAX_RESCAN)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_RESCAN – This also should be device specific. Each PCIE peripheral do have a different tunable retry timeout. We can have an entry in *.yaml or STATE_DB and we import here

@sujinmkang
Copy link
Copy Markdown
Collaborator Author

retest mellanox please

@sujinmkang sujinmkang merged commit bf45e11 into sonic-net:master Jul 13, 2020
jleveque added a commit that referenced this pull request Aug 29, 2020
The pcie-check.sh script was added in #4771, but was not given executable permission. Therefore, we would see messages like:

```
Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied
Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied
Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'.
```
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
The pcie-check.sh script was added in sonic-net#4771, but was not given executable permission. Therefore, we would see messages like:

```
Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied
Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied
Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants