Skip to content

Conversation

@HynoR
Copy link
Contributor

@HynoR HynoR commented Nov 26, 2025

What this PR does / why we need it?

#11077 #11078

  • 当前调用CPU获取usage的方法每次都需要sleep 100ms,很消耗时间
  • 很多几乎不变的数据每次都得调用IO来获取
  • systemproxy 值始终不变

Summary of your change

  • 自行实现 CPU 库,缓存上一次的 cpu 累计使用时间,每次调用计算当前时间和上一次缓存时间来获取 cpu 使用,而不是 gopsutil 从头 sleep 统计。
  • 用 psutil 包套 gopsutil 包,缓存部分变化很慢的信息,下次直接从内存读,无需 IO
  • 修复 systemproxy 不变的 Bug

优化时间: 100ms -> 1 μs 左右
telegram-cloud-photo-size-5-6140758641859038036-x

Please indicate you've done the following:

  • Made sure tests are passing and test coverage is added if needed.
  • Made sure commit message follow the rule of Conventional Commits specification.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed.

copilot summary

This pull request refactors system resource monitoring in the agent by introducing a new psutil utility package that provides cached and optimized access to CPU, disk, and host information. The changes replace direct usage of gopsutil functions throughout the service layer with calls to the new psutil abstractions, improving performance and consistency in resource data retrieval. Key updates include changes to CPU and disk usage calculations, host info loading, and dashboard data aggregation.

System resource access refactoring:

  • Introduced the new agent/utils/psutil package, which provides thread-safe, cached access to CPU, disk, and host information through global state objects (CPU, CPUInfo, DISK, HOST). [1] [2] [3] [4]
  • Refactored all usages of gopsutil functions in agent/app/service/dashboard.go, agent/app/service/alert_helper.go, and agent/app/service/monitor.go to use the new psutil methods for CPU core counts, CPU usage, disk usage, and host info. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Dashboard and monitoring logic improvements:

  • Updated dashboard and node info loading logic to use cached resource values and improved CPU usage sampling, including per-core statistics. [1] [2]
  • Enhanced disk info loading to perform usage sampling in a goroutine with timeout and cache results, increasing reliability and responsiveness.

Other enhancements:

  • Improved system proxy detection logic in dashboard service using the new cmp.Or method for environment variable selection.
  • Ensured consistent error handling and logging for resource sampling failures across services. [1] [2]

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 26, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 26, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wanghe-fit2cloud for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lan-yonghui
Copy link
Member

/lgtm

@f2c-ci-robot f2c-ci-robot bot added the lgtm label Nov 26, 2025
@wanghe-fit2cloud wanghe-fit2cloud merged commit 0a42d49 into 1Panel-dev:dev-v2 Nov 26, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants