Skip to content

[action] [PR:23139] pmon docker - Enable config of thermalctd polling interval#26422

Merged
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/23139
Mar 26, 2026
Merged

[action] [PR:23139] pmon docker - Enable config of thermalctd polling interval#26422
mssonicbld merged 1 commit intosonic-net:202505from
mssonicbld:cherry/202505/23139

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:

# example
{
    "thermalctld": {
        "thermal_monitor_initial_interval": 5,
        "thermal_monitor_update_interval": 30,
        "thermal_monitor_update_elapsed_threshold": 25
    }
}

Note this only affects the ThermalMonitor thread in the thermalctld daemon.
ThermalMonitor's role is to poll fan and temperature sensors from hardware and publish information to redis.
This redis values are used in show platform temperature and show platform fan for example.

Parameter Details

thermal_monitor_initial_interval

  • Purpose: The initial time to wait before the first poll by ThermalMonitor on thermalctld startup.
  • Default: 5 seconds

thermal_monitor_update_interval

  • Purpose: Every thermal_monitor_update_interval seconds, the hardware is polled
  • Default: 60 seconds

thermal_monitor_update_elapsed_threshold

  • Purpose: If it takes longer than thermal_monitor_update_elapsed_threshold seconds to poll hardware (collected information from all fans and temperature sensors), a warning is logged.
  • Default: 30 seconds

Why I did it

The default polling interval of 60s is quite high and feels unresponsive (i.e. an operator can remove a fan and wait nearly a minute for show plat fan to update).

How I did it

In sonic-net/sonic-platform-daemons#635 we made these intervals configurable.

This PR updates the jinja template to handle these new configuration options.

It decreases the update interval from 60s -> 10s for NH-4010. I'm aiming for a balance of responsiveness without polling excessively.

Example usage of these feature:
https://github.com/nexthop-ai/private-sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json

How to verify it

Verified on NH-4010 that thermalctld is being run with the expected options.

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

Signed-off-by: Sonic Build Admin sonicbld@microsoft.com

A picture of a cute animal (not mandatory but encouraged)

<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:
```
# example
{
    "thermalctld": {
        "thermal_monitor_initial_interval": 5,
        "thermal_monitor_update_interval": 30,
        "thermal_monitor_update_elapsed_threshold": 25
    }
}
```

Note this only affects the `ThermalMonitor` thread in the `thermalctld` daemon.
`ThermalMonitor`'s role is to poll fan and temperature sensors from hardware and publish information to redis.
This redis values are used in `show platform temperature` and `show platform fan` for example.

Parameter Details

`thermal_monitor_initial_interval`
- Purpose: The initial time to wait before the first poll by `ThermalMonitor` on `thermalctld` startup.
- Default: 5 seconds

`thermal_monitor_update_interval`
- Purpose: Every `thermal_monitor_update_interval` seconds, the hardware is polled
- Default: 60 seconds

`thermal_monitor_update_elapsed_threshold`
- Purpose: If it takes longer than `thermal_monitor_update_elapsed_threshold` seconds to poll hardware (collected information from all fans and temperature sensors), a warning is logged.
- Default: 30 seconds

#### Why I did it
The default polling interval of 60s is quite high and feels unresponsive (i.e. an operator can remove a fan and wait nearly a minute for `show plat fan` to update).

#### How I did it
In sonic-net/sonic-platform-daemons#635 we made these intervals configurable.

This PR updates the jinja template to handle these new configuration options.

It decreases the update interval from 60s -> 10s for NH-4010. I'm aiming for a balance of responsiveness without polling excessively.

Example usage of these feature:
https://github.com/nexthop-ai/private-sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json

#### How to verify it
Verified on NH-4010 that `thermalctld` is being run with the expected options.
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #23139

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 5092337 into sonic-net:202505 Mar 26, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant