Skip to content

Pre-allocate space in shared memory files#2267

Merged
sporksmith merged 1 commit intoshadow:mainfrom
sporksmith:detect-ooshm
Jul 10, 2022
Merged

Pre-allocate space in shared memory files#2267
sporksmith merged 1 commit intoshadow:mainfrom
sporksmith:detect-ooshm

Conversation

@sporksmith
Copy link
Copy Markdown
Contributor

Replace usage of ftruncate with posix_fallocate. ftruncate sets
the size of a file without actually allocating the backing storage,
which is allocated on-demand. While this allows the true allocation to
be "lazy", and thereby allows us to do large allocations without wasting
resources if the whole allocation isn't used, it can cause unpredictable
and confusing failure modes.

With posix_fallocate, the space is eagerly allocated. If the call
succeeds, we are guaranteed that the storage is actually available.
When we run out of storage, the call fails and we can fail with a clear
error message.

We were previously allocating quite-large chunks of memory at a time;
about 132 MB. While this probably wouldn't actually cause problems for a
single shadow simulation (which typically only needs one such
allocation), it ends up using a lot of space when running the shadow
tests in parallel. This changes the default allocation to 1 MB, instead.

Fixes #2266

Replace usage of `ftruncate` with `posix_fallocate`. `ftruncate` sets
the size of a file without actually allocating the backing storage,
which is allocated on-demand. While this allows the true allocation to
be "lazy", and thereby allows us to do large allocations without wasting
resources if the whole allocation isn't used, it can cause unpredictable
and confusing failure modes.

With `posix_fallocate`, the space is eagerly allocated. If the call
succeeds, we are guaranteed that the storage is actually available.
When we run out of storage, the call fails and we can fail with a clear
error message.

We were previously allocating quite-large chunks of memory at a time;
about 132 MB. While this probably wouldn't actually cause problems for a
single shadow simulation (which typically only needs one such
allocation), it ends up using a lot of space when running the shadow
tests in parallel. This changes the default allocation to 1 MB, instead.

Fixes shadow#2266
@github-actions github-actions bot added the Component: Main Composing the core Shadow executable label Jul 8, 2022
@sporksmith sporksmith added this to the Code health and maintenance milestone Jul 8, 2022
@sporksmith
Copy link
Copy Markdown
Contributor Author

@sporksmith sporksmith requested a review from rwails July 8, 2022 21:22
@sporksmith sporksmith marked this pull request as ready for review July 8, 2022 21:23
Copy link
Copy Markdown
Collaborator

@rwails rwails left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm glad the default mem block file size is being reduced; I remember setting it to a high value when we were prototyping, but I didn't realize that the 100 MB value stuck!

I spot-checked to make sure that most of the allocations being made are under 500 k, and I believe this is the case. (Otherwise, we'd lose small-object allocation optimizations and the shmem allocator would make a shmem syscall for every allocation, which is how it handles large objects).

Thanks Jim!

@sporksmith
Copy link
Copy Markdown
Contributor Author

Looks good. I'm glad the default mem block file size is being reduced; I remember setting it to a high value when we were prototyping, but I didn't realize that the 100 MB value stuck!

It was completely reasonable when using ftruncate, since the actual allocation happened on-demand anyway, but yeah is a bit too large now :).

I spot-checked to make sure that most of the allocations being made are under 500 k, and I believe this is the case.

Thanks! Yeah the benchmark results looks ok too; run-time and memory usage didn't change compared to the previous nightly run.

@sporksmith sporksmith merged commit ea80705 into shadow:main Jul 10, 2022
@sporksmith sporksmith deleted the detect-ooshm branch July 10, 2022 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Main Composing the core Shadow executable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve failure mode when /dev/shm is full

2 participants