Add pcie-check service to check PCIe devices at boot#4771
Add pcie-check service to check PCIe devices at boot#4771sujinmkang merged 14 commits intosonic-net:masterfrom sujinmkang:pcie-mon
Conversation
|
retest this please |
2 similar comments
|
retest this please |
|
retest this please |
| [Service] | ||
| Type=simple | ||
| ExecStart=/usr/bin/pcie-check.sh | ||
|
|
|
|
||
| VERBOSE="no" | ||
| RESULTS="PCIe Device Checking All Test" | ||
| PCIE_CHK_CMD=`sudo pcieutil pcie-check |grep "$RESULTS"` |
There was a problem hiding this comment.
Can this be done based on pcie_check_sysfs() like API rather than thru' CLI ?
This will also help in avoiding to check the CLI string output for EXPECTED.
| exit | ||
| else | ||
| debug "PCIe check failed, try pci bus rescan" | ||
| echo 1 > /sys/bus/pci/rescan |
There was a problem hiding this comment.
Here rescan is instantiated for all-devices. (sys/bus/pci/rescan). Rescan should be device specific.
There was a problem hiding this comment.
Discussed offline. For now, we will rescan the entire bus. Rescanning individual devices will be a future enhancement.
| exit | ||
| fi | ||
|
|
||
| for i in $(seq 1 1 $MAX_RESCAN) |
There was a problem hiding this comment.
MAX_RESCAN – This also should be device specific. Each PCIE peripheral do have a different tunable retry timeout. We can have an entry in *.yaml or STATE_DB and we import here
|
retest mellanox please |
The pcie-check.sh script was added in #4771, but was not given executable permission. Therefore, we would see messages like: ``` Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'. ```
The pcie-check.sh script was added in sonic-net#4771, but was not given executable permission. Therefore, we would see messages like: ``` Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'. ```
- Why I did it
To monitor the PCIe device status during boot time and initiate the rescan if the status fails
- How I did it
Created a systemd service to monitor the PCIe device using pcieutil
- How to verify it
check the syslog to see the service execution spawn off periodically and perform expectedly.
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)