pmon docker - Enable config of thermalctd polling interval#23139
Merged
rlhui merged 1 commit intosonic-net:masterfrom Oct 29, 2025
Merged
pmon docker - Enable config of thermalctd polling interval#23139rlhui merged 1 commit intosonic-net:masterfrom
rlhui merged 1 commit intosonic-net:masterfrom
Conversation
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
louis-nexthop
approved these changes
Oct 1, 2025
Contributor
Author
|
cc @judyjoseph |
9 tasks
judyjoseph
reviewed
Oct 28, 2025
| {% endif -%} | ||
|
|
||
| {% if thermalctld.thermal_monitor_update_elapsed_threshold is defined and thermalctld.thermal_monitor_update_elapsed_threshold is not none %} | ||
| {%- set options = options + " --thermal-monitor-update-elapsed-threshold " + thermalctld.thermal_monitor_update_elapsed_threshold|string %} |
Contributor
There was a problem hiding this comment.
LGTM, but please add details in PR description on the new intervals and which daemon/thread in thermalctld it will affect
Contributor
Author
There was a problem hiding this comment.
Thanks @judyjoseph , I have updated the PR description to document this, please take a look.
judyjoseph
approved these changes
Oct 28, 2025
spilkey-cisco
approved these changes
Oct 28, 2025
Contributor
|
/azpw ms_conflict |
Contributor
|
@rlhui could you help merge this PR |
9 tasks
FengPan-Frank
pushed a commit
to FengPan-Frank/sonic-buildimage
that referenced
this pull request
Dec 4, 2025
Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:
# example
{
"thermalctld": {
"thermal_monitor_initial_interval": 5,
"thermal_monitor_update_interval": 30,
"thermal_monitor_update_elapsed_threshold": 25
}
}
Signed-off-by: Feng Pan <fenpan@microsoft.com>
xwjiang-ms
pushed a commit
to xwjiang-ms/sonic-buildimage
that referenced
this pull request
Dec 22, 2025
Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:
# example
{
"thermalctld": {
"thermal_monitor_initial_interval": 5,
"thermal_monitor_update_interval": 30,
"thermal_monitor_update_elapsed_threshold": 25
}
}
Signed-off-by: xiaweijiang <xiaweijiang@microsoft.com>
9 tasks
Collaborator
|
Cherry-pick PR to 202505: #26422 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:
Note this only affects the
ThermalMonitorthread in thethermalctlddaemon.ThermalMonitor's role is to poll fan and temperature sensors from hardware and publish information to redis.This redis values are used in
show platform temperatureandshow platform fanfor example.Parameter Details
thermal_monitor_initial_intervalThermalMonitoronthermalctldstartup.thermal_monitor_update_intervalthermal_monitor_update_intervalseconds, the hardware is polledthermal_monitor_update_elapsed_thresholdthermal_monitor_update_elapsed_thresholdseconds to poll hardware (collected information from all fans and temperature sensors), a warning is logged.Why I did it
The default polling interval of 60s is quite high and feels unresponsive (i.e. an operator can remove a fan and wait nearly a minute for
show plat fanto update).How I did it
In sonic-net/sonic-platform-daemons#635 we made these intervals configurable.
This PR updates the jinja template to handle these new configuration options.
It decreases the update interval from 60s -> 10s for NH-4010. I'm aiming for a balance of responsiveness without polling excessively.
Example usage of these feature:
https://github.com/nexthop-ai/private-sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json
How to verify it
Verified on NH-4010 that
thermalctldis being run with the expected options.Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)