[release/0.9] Query stats directly in-shim#1403
Merged
dcantah merged 5 commits intomicrosoft:release/0.9from May 20, 2022
Merged
[release/0.9] Query stats directly in-shim#1403dcantah merged 5 commits intomicrosoft:release/0.9from
dcantah merged 5 commits intomicrosoft:release/0.9from
Conversation
The `AllPids` method on JOBOBJECT_BASIC_PROCESS_ID_LIST would allocate a very large array to store any pids in the job object, cast the memory to this array, and then slice down to what elements it actually took up in the array based off the value of what was in the NumberOfProcessIdsInList field. The checkptr compile option doesn't like when slices don't have an explicit length and capacity so this change updates the slicing to use a three-index slice to set the capacity to the same as the length. Before for fmt.Println(len(arr), cap(arr)) -> 2, 134217727 After: -> 2, 2 This change additionally adds a test in internal/jobobject and modifies the TestExecWithJob test in internal/exec to verify that checkptr doesn't get angry when we hit a codepath that performs the cast Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit d082725) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
* Swap to fmt.Errorf in jobobject package This change swaps to fmt.Errorf for wrapped errors in the jobobject package from errors.Wrap. We'd had talks of moving this code to winio where we've removed our pkg/errors dependency so this would make the port easier. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit ccec73f) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
…rosoft#1307) Previously we had our own definition for GetQueuedCompletionStatus as x/sys/windows had an incorrect definition for it. This was remedied a bit ago in this change golang/sys@683adc9 so we're alright to remove our own at this point. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit 4721411) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This change adds in functionality to query statistics directly in the shim instead of reaching out to HCS. One of the main motivators behind this was poor performance for tallying up the private working set total for the container in HCS. HCS calls NtQuerySystemInformation with the class SystemProcessInformation which returns an array containing system information for every process running on the machine. They then grab the pids that are running in the container and filter down the entries in the array to only what's running in that silo and start tallying up the total. This doesn't work well as performance should get worse if more processes are running on the machine in general and not just in the container. All of the additional information besides the WorkingSetPrivateSize field is ignored as well which isn't great and is wasted work to fetch. HCS only let's you grab statistics in an all or nothing fashion, so we can't just grab the private working set ourselves and ask for everything else separately. We can open the silo ourselves and do the same queries for the rest of the info, as well as calculating the private working set in a more efficient manner by: 1. Find the pids running in the silo 2. Get a process handle for every process (only need PROCESS_QUERY_LIMITED_INFORMATION access) 3. Call NtQueryInformationProcess on each process with the class ProcessVmCounters 4. Tally up the total using the field PrivateWorkingSetSize in VM_COUNTERS_EX2. This change additionally: - Changes the jobcontainers package to use this new way to calculate the private working set. - Change the query the StorageStats method in the jobobject package uses to grab IO counters to match what HCS queries. Signed-off-by: Daniel Canter <dcanter@microsoft.com> (cherry picked from commit 4a1216a) Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Previous cherry-picked commits brought in changes from other commits that were not cherry-picked and these changes end up in /test/vendor because of our 'go mod vendor' flow. This just ensures our verify-test-vendor CI step will pass by re-vendoring what is actually present in the repo. Signed-off-by: Daniel Canter <dcanter@microsoft.com>
helsaawy
approved these changes
May 19, 2022
Contributor
|
Why backports? Why not just new release and update containerd? |
Contributor
Author
|
@jterry75 Wanna keep containerd 1.6 on the same tagline of hcsshim to not bring in too much new stuff |
Contributor
Author
|
For 1.7/main branch currently, we plan to start a 0.10.x line soon (probably a week or so). |
Contributor
|
Oh I seeeeeeeeee. You are going to backport CD as well. Ok thanks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This cherry-picks in the work to query for statistics for Windows containers in-process as well as some additional
commits that fixed some issues for our job object code/was easier to include than work around the merge conflicts.
Original PR description for the query work:
This change adds in functionality to query statistics directly in the shim instead of reaching out to HCS. One of the main motivators behind this was poor performance for tallying up the private working set total for the container in HCS.
HCS calls NtQuerySystemInformation with the class SystemProcessInformation which returns an array containing system information for every process running on the machine. They then grab the pids that are running in the container and filter down the entries in the array to only what's running in that silo and start tallying up the total. This doesn't work well as performance should get worse if more processes are running on the machine in general and not just in the container. All of the additional information besides the WorkingSetPrivateSize field is ignored as well which isn't great and is wasted work to fetch.
HCS only let's you grab statistics in an all or nothing fashion, so we can't just grab the private working set ourselves and ask for everything else separately. We can open the silo ourselves and do the same queries for the rest of the info, as well as calculating the private working set in a more efficient manner by:
PROCESS_QUERY_LIMITED_INFORMATION access)
ProcessVmCounters
VM_COUNTERS_EX2.
This change additionally:
private working set.
to grab IO counters to match what HCS queries.