Skip to content

In metrics, only show cumulative number of page faults#7482

Merged
timvisee merged 15 commits intodevfrom
metrics-aggregate-page-faults
Nov 11, 2025
Merged

In metrics, only show cumulative number of page faults#7482
timvisee merged 15 commits intodevfrom
metrics-aggregate-page-faults

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Oct 30, 2025

In metrics, only expose a cumulative number of minor and major page faults. Remove the metrics for page faults in joined children:

# HELP qdrant_procfs_minor_page_faults_total count of minor page faults which didn't cause a 
# TYPE qdrant_procfs_minor_page_faults_total counter
qdrant_procfs_minor_page_faults_total 180664
# HELP qdrant_procfs_major_page_faults_total count of disk accesses caused by a mmap page 
# TYPE qdrant_procfs_major_page_faults_total counter
qdrant_procfs_major_page_faults_total 0

In my opinion, we should only expose the above two. I see no point in reporting separate values for children. In fact, it is currently incorrectly implemented, more on that later.

In practice we're only interested in the cumulative number of page faults that happen during the life time of the Qdrant process, in the process itself and in all its children. If the number grows faster than we expect, the cluster may need attention.

The minor/major metric needs to be a sum of:

  1. page faults in the current PID
  2. plus, page faults in all current child processes
  3. plus, historic page faults from all joined child processes

Actually, Qdrant currently does not even spawn any child processes at all. It means that the child metrics will always be 0 at this time. (Might be wrong of this, but the fact remains that we currently don't report it correctly)

But, to make our reporting more future proof I still recommend to implement it properly. For point two we'd have to iterate over all child processes recursively and fetch its page fault stats (left a TODO). Point three is a different number and we've already implemented this.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

@timvisee timvisee added the release:1.16.0 Pull requests that should be merged for the Qdrant 1.16.0 release. label Oct 30, 2025
@timvisee timvisee changed the title In metrics, aggregate page faults In metrics, only show cumulative number of page faults Oct 30, 2025
@timvisee timvisee marked this pull request as ready for review October 30, 2025 15:40
@qdrant qdrant deleted a comment from coderabbitai bot Oct 30, 2025
@timvisee timvisee force-pushed the metrics-aggregate-page-faults branch from 5d28877 to 7e160bb Compare October 31, 2025 09:11
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@timvisee timvisee force-pushed the metrics-aggregate-page-faults branch from 7e160bb to 413b91a Compare October 31, 2025 12:33
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
coderabbitai[bot]

This comment was marked as outdated.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee force-pushed the metrics-aggregate-page-faults branch from f77cc9a to faadcc3 Compare October 31, 2025 17:08
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@timvisee timvisee force-pushed the metrics-aggregate-page-faults branch from b1b7ce1 to faa7132 Compare October 31, 2025 18:59
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
coderabbitai[bot]

This comment was marked as resolved.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Oct 31, 2025
@timvisee timvisee requested a review from xzfc November 2, 2025 10:30
Comment on lines +1004 to +1006
.tasks()?
.filter_map(|task| {
let children = task.and_then(|task| task.children());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL we can get a list of children PIDs directly via /proc/*/task/*/children.

The classic approach was to read /proc/*/stat for every process and rebuild the process tree by looking at parent PIDs.

The /children approach requires kernel with CONFIG_PROC_CHILDREN. I think it should be enabled practically everywhere on servers, but not sure.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 11, 2025
@timvisee timvisee merged commit 84f9dc2 into dev Nov 11, 2025
15 checks passed
@timvisee timvisee deleted the metrics-aggregate-page-faults branch November 11, 2025 12:38
timvisee added a commit that referenced this pull request Nov 14, 2025
* In metrics, aggregate page faults

* Include page fault of active children (recursively)

* type alias Pid: i32

* Code improvements

* Ignore already terminated processes

* Clippy

* Also count faults for joined descendant threads in children

* Import std::cmp::min

* Rework child fault recursion, remove second hash set

* Add simple test for iterating child processes

* Fix typo

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Use limit values directly

* We don't include the parent PID

* Don't allocate hashmap when listing process children

* Extend process child PIDs test, fork recursively

---------

Co-authored-by: jojii <jojii@gmx.net>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@timvisee timvisee mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:1.16.0 Pull requests that should be merged for the Qdrant 1.16.0 release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants