Skip to content

Add memory wipe on allocation/deallocation#1530

Merged
brianmcgillion merged 1 commit intotiiuae:mainfrom
vadika:memory-wipe-shutdown
Nov 17, 2025
Merged

Add memory wipe on allocation/deallocation#1530
brianmcgillion merged 1 commit intotiiuae:mainfrom
vadika:memory-wipe-shutdown

Conversation

@vadika
Copy link
Copy Markdown
Contributor

@vadika vadika commented Nov 4, 2025

Summary

This PR implements secure memory wiping.

Usage

ghaf.host.memory-wipe.enable = true;

@vadika vadika force-pushed the memory-wipe-shutdown branch from 8a35409 to ceff12d Compare November 4, 2025 11:12
@vadika vadika marked this pull request as draft November 4, 2025 11:12
@vadika vadika force-pushed the memory-wipe-shutdown branch from ceff12d to b2eda36 Compare November 4, 2025 11:16
@vadika vadika force-pushed the memory-wipe-shutdown branch from b2eda36 to 71f5937 Compare November 4, 2025 11:17
@vadika vadika requested a review from brianmcgillion November 4, 2025 11:18
@vadika vadika force-pushed the memory-wipe-shutdown branch from 71f5937 to 243fdd2 Compare November 4, 2025 11:50
@vadika vadika force-pushed the memory-wipe-shutdown branch from 243fdd2 to 5ab4c4b Compare November 4, 2025 12:08
@vadika vadika force-pushed the memory-wipe-shutdown branch from 5ab4c4b to 087a062 Compare November 4, 2025 14:21
@brianmcgillion
Copy link
Copy Markdown
Collaborator

we may have an issue with this. kexec is usually one of those features that should be restricted. We will have to make sure that the target of the kexec is signed itself. along with using the correct kexec commands that force the loading of a signed image. all doable, but has to be considered.

the next question is it just on shutdown? or do we want it on the shutdown of a particular vm? so is there something in the vmm that will cause the memory of the vm to be wiped. Do we need sdmem, swap clearing (swap on/off) forced sync and maybe even drop caches? Just to flush the memory pipeline for any residuals. in an individual vm?

@vadika
Copy link
Copy Markdown
Contributor Author

vadika commented Nov 4, 2025

we may have an issue with this. kexec is usually one of those features that should be restricted. We will have to make sure that the target of the kexec is signed itself. along with using the correct kexec commands that force the loading of a signed image. all doable, but has to be considered.

the next question is it just on shutdown? or do we want it on the shutdown of a particular vm? so is there something in the vmm that will cause the memory of the vm to be wiped. Do we need sdmem, swap clearing (swap on/off) forced sync and maybe even drop caches? Just to flush the memory pipeline for any residuals. in an individual vm?

well, requirements says "zero on shutdown" :-)
IMO this is not solvable from the user level scope, because of memory mapping, shadowing, swapping....

we can just enable init_on_free on host kernel, and it will solve this task complitely for the price of 5-10% performance degradation, so every memory deallocation on the host kernel level will trigger zeroing of deallocated block.

for security reasons this option can't be controlled in runtime, so kexec is the only option to trigger it when needed.

ps the best way to solve this is to enable memory encryption, so there will be no need for memory zeroing, but intel processors on thinkpads and system76 doesn't support it unfortunately.

@vadika vadika force-pushed the memory-wipe-shutdown branch from 087a062 to 28dd04d Compare November 12, 2025 11:20
@vadika vadika marked this pull request as ready for review November 12, 2025 11:22
@vadika vadika force-pushed the memory-wipe-shutdown branch from 28dd04d to 724d4a8 Compare November 12, 2025 11:33
@leivos-unikie
Copy link
Copy Markdown
Contributor

Images from the automated pre-merge were already cleaned away.

I managed to launch performance tests for Dell-7330 and Darter-Pro early enough (yesterday).
https://ci-dev.vedenemo.dev/job/ghaf-hw-test-manual/2652/
https://ci-dev.vedenemo.dev/job/ghaf-hw-test-manual/2654/

Results didn't show clear regression when comparing to the last measurements in
https://ci-dev.vedenemo.dev/job/ghaf-hw-test/599/

@vadika vadika changed the title Add memory wipe on shutdown via kexec Add memory wipe on allocation/deallocation Nov 13, 2025
@leivos-unikie
Copy link
Copy Markdown
Contributor

I ran performance tests locally on Lenovo-X1 too. Didn't find significant regression compared to current mainline.

Few apps failed first boot time tests but afterwards in manual check they launched fast.

@leivos-unikie leivos-unikie added Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon Tested on rugged laptop Tested on System76 and removed Needs Testing CI Team to pre-verify labels Nov 14, 2025
@vadika vadika force-pushed the memory-wipe-shutdown branch from a0fe82e to a13b916 Compare November 14, 2025 09:26
@vunnyso
Copy link
Copy Markdown
Collaborator

vunnyso commented Nov 14, 2025

By default PAGE_POISONING and INIT_ON_ALLOC_DEFAULT_ON will enabled by default on latest kernel v6.17.7, so its only INIT_ON_FREE_DEFAULT_ON which is making difference.
So if we focus on test cases where memory pages are freed more often we hit right target.

Tested on System76 with PR

[ghaf@ghaf-host:/tmp]$ stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4354] setting to a 30 secs run per stressor
stress-ng: info:  [4354] dispatching hogs: 4 vm
stress-ng: info:  [4354] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4355] vm: using 256M per stressor instance (total 1G of 69.40G available memory)
stress-ng: metrc: [4354] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4354]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4354] vm              5846934     30.00     84.94     13.20    194888.92       59579.51
stress-ng: info:  [4354] skipped: 0
stress-ng: info:  [4354] passed: 4: vm (4)
stress-ng: info:  [4354] failed: 0
stress-ng: info:  [4354] metrics untrustworthy: 0
stress-ng: info:  [4354] successful run completed in 30.00 secs

Tested on System76 with mainline

[ghaf@ghaf-host:/tmp]$  stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4738] setting to a 30 secs run per stressor
stress-ng: info:  [4738] dispatching hogs: 4 vm
stress-ng: info:  [4738] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4739] vm: using 256M per stressor instance (total 1G of 68.71G available memory)
stress-ng: metrc: [4738] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4738]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4738] vm              5847321     30.00     87.61     13.47    194904.28       57850.66
stress-ng: info:  [4738] skipped: 0
stress-ng: info:  [4738] passed: 4: vm (4)
stress-ng: info:  [4738] failed: 0
stress-ng: info:  [4738] metrics untrustworthy: 0
stress-ng: info:  [4738] successful run completed in 30.00 secs

There is not big difference in bogo ops/s (usr+sys time) operation, maybe its worth to explore more with stress-ng tool as well. (I haven't done much testing)

@vadika
Copy link
Copy Markdown
Contributor Author

vadika commented Nov 14, 2025

By default PAGE_POISONING and INIT_ON_ALLOC_DEFAULT_ON will enabled by default on latest kernel v6.17.7, so its only INIT_ON_FREE_DEFAULT_ON which is making difference. So if we focus on test cases where memory pages are freed more often we hit right target.

Tested on System76 with PR

[ghaf@ghaf-host:/tmp]$ stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4354] setting to a 30 secs run per stressor
stress-ng: info:  [4354] dispatching hogs: 4 vm
stress-ng: info:  [4354] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4355] vm: using 256M per stressor instance (total 1G of 69.40G available memory)
stress-ng: metrc: [4354] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4354]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4354] vm              5846934     30.00     84.94     13.20    194888.92       59579.51
stress-ng: info:  [4354] skipped: 0
stress-ng: info:  [4354] passed: 4: vm (4)
stress-ng: info:  [4354] failed: 0
stress-ng: info:  [4354] metrics untrustworthy: 0
stress-ng: info:  [4354] successful run completed in 30.00 secs

Tested on System76 with mainline

[ghaf@ghaf-host:/tmp]$  stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4738] setting to a 30 secs run per stressor
stress-ng: info:  [4738] dispatching hogs: 4 vm
stress-ng: info:  [4738] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4739] vm: using 256M per stressor instance (total 1G of 68.71G available memory)
stress-ng: metrc: [4738] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4738]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4738] vm              5847321     30.00     87.61     13.47    194904.28       57850.66
stress-ng: info:  [4738] skipped: 0
stress-ng: info:  [4738] passed: 4: vm (4)
stress-ng: info:  [4738] failed: 0
stress-ng: info:  [4738] metrics untrustworthy: 0
stress-ng: info:  [4738] successful run completed in 30.00 secs

There is not big difference in bogo ops/s (usr+sys time) operation, maybe its worth to explore more with stress-ng tool as well. (I haven't done much testing)

So impact is visible but very low IMO?

Talking about options -- need to define explicitly, defaults may be different in different configurations.

@vunnyso
Copy link
Copy Markdown
Collaborator

vunnyso commented Nov 17, 2025

By default PAGE_POISONING and INIT_ON_ALLOC_DEFAULT_ON will enabled by default on latest kernel v6.17.7, so its only INIT_ON_FREE_DEFAULT_ON which is making difference. So if we focus on test cases where memory pages are freed more often we hit right target.
Tested on System76 with PR

[ghaf@ghaf-host:/tmp]$ stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4354] setting to a 30 secs run per stressor
stress-ng: info:  [4354] dispatching hogs: 4 vm
stress-ng: info:  [4354] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4355] vm: using 256M per stressor instance (total 1G of 69.40G available memory)
stress-ng: metrc: [4354] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4354]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4354] vm              5846934     30.00     84.94     13.20    194888.92       59579.51
stress-ng: info:  [4354] skipped: 0
stress-ng: info:  [4354] passed: 4: vm (4)
stress-ng: info:  [4354] failed: 0
stress-ng: info:  [4354] metrics untrustworthy: 0
stress-ng: info:  [4354] successful run completed in 30.00 secs

Tested on System76 with mainline

[ghaf@ghaf-host:/tmp]$  stress-ng --vm 4 --vm-bytes 1G --vm-method all  --verify -t 30s --metrics-brief
stress-ng: info:  [4738] setting to a 30 secs run per stressor
stress-ng: info:  [4738] dispatching hogs: 4 vm
stress-ng: info:  [4738] note: 16 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [4739] vm: using 256M per stressor instance (total 1G of 68.71G available memory)
stress-ng: metrc: [4738] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4738]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4738] vm              5847321     30.00     87.61     13.47    194904.28       57850.66
stress-ng: info:  [4738] skipped: 0
stress-ng: info:  [4738] passed: 4: vm (4)
stress-ng: info:  [4738] failed: 0
stress-ng: info:  [4738] metrics untrustworthy: 0
stress-ng: info:  [4738] successful run completed in 30.00 secs

There is not big difference in bogo ops/s (usr+sys time) operation, maybe its worth to explore more with stress-ng tool as well. (I haven't done much testing)

So impact is visible but very low IMO?

Talking about options -- need to define explicitly, defaults may be different in different configurations.

In my minimal testing, I haven't observed any notable differences so far.

- Enable PAGE_POISONING, INIT_ON_ALLOC_DEFAULT_ON, INIT_ON_FREE_DEFAULT_ON
- Enable by default for all x86_64 platforms via hardware module
- Update documentation to reflect build-time kernel configuration approach

This provides runtime memory protection against information disclosure
throughout system operation.

Signed-off-by: vadik likholetov <vadikas@gmail.com>
@brianmcgillion brianmcgillion merged commit 00bb4e6 into tiiuae:main Nov 17, 2025
28 checks passed
@vadika vadika deleted the memory-wipe-shutdown branch November 24, 2025 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants