-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Mini-malloc option #6249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mini-malloc option #6249
Conversation
| // In general, if you don't need one of those special modes, and if you don't | ||
| // allocate very many small objects, you should use emmalloc since it's | ||
| // smaller. Otherwise, if you do allocate many small objects, dlmalloc | ||
| // is usually worth the extra size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if an option "none" will be useful, to force user to supply implementation of malloc and free themselves? Not to say that should be implemented in this PR, but just wondering if that'll be feasible in this arch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think regardless of what this setting is, the linking of malloc should behave as it does on native platforms; i.e. if you put your own implementation of malloc and free in an object file in the link, then that will be used instead of the emscripten-provided one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right, and that's something we do already test to work. This replacement however can be a bit of magic, since if something changes, e.g. one forgets to link to own malloc, there is generally no error message, and one silently gets the built-in malloc, so thought this kind of option would be nice to explicitly kill the original malloc from being considered, in which case one would get a nice missing symbol error instead if e.g. after build system refactoring one lacks linking in their own malloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could add a "none" option that means it would be a link error to not provide your own malloc. I wonder if there isn't a more general way to do this. Not opposed, though.
| // All allocations are aligned to this value. | ||
| static const size_t ALIGNMENT = 8; | ||
|
|
||
| // Even allocating 1 byte incurs this much actual payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will allocating 0 bytes just allocate the 8 byte metadata, or also 8 bytes of payload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating 0 just returns NULL, actually, which is allowed by the docs
If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
|
|
||
| Region*& prev() { return _prev; } | ||
| // The next region is not, as we compute it on the fly | ||
| Region* next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function and all the above up to getTotalSize be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nm, first read that these were global scope functions, but they are instead members of Region.
tools/ports/binaryen.py
Outdated
| import os, shutil, logging | ||
|
|
||
| TAG = 'version_42' | ||
| TAG = 'version_43' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional to coincide with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it depending on some fixes there. Meanwhile time has passed and a later version has been updated anyhow ;) So this will vanish in the merge.
tools/system_libs.py
Outdated
| if shared.Settings.USE_PTHREADS: | ||
| ret += '_threadsafe' | ||
| extra += '_threadsafe' | ||
| base = 'dlmalloc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of validation ends up silently ignoring -s MALLOC=emmalloc directive when -s USE_PTHREADS=1 and similar are set. Perhaps emcc.py could instead validate that -s MALLOC=emmalloc and -s USE_PTHREADS=1 are mutually exclusive and so forth, and then here we could just pass-through base = shared.Settings.MALLOC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll make it show a clear error in those cases, instead of silently "fixing" things.
|
Very nice, looks like you went much farther in writing an allocator rather than doing the minimum. What kind of multithreading approaches can you think? Top level mutex guard, or guard each pool separately by a mutex? Linked list inserts and removes can be made lock free, though did not look really close to details if there's some data that inherently will need the mutexes. lgtm, though probably may want to squash when merging? |
dschuff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a super deep dive but a few things I noticed.
| * - 32-bit system. | ||
| * - Single-threaded. | ||
| * - sbrk() is used, and nothing else. | ||
| * - sbrk() will not be accessed by anyone else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really only an assumption that sbrk will never be called, or an assumption that there is nothing else trying to use or allocate any address space other than the static/stack/etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does assume sbrk() is not called by anyone else. Reading the docs, I don't see an official statement on it, but I see statements like "malloc will call sbrk(), and you shouldn't", so it seems already frowned upon. So many pointing this out here is redundant.
system/lib/emmalloc.cpp
Outdated
|
|
||
| // Align a pointer, increasing it upwards as necessary | ||
| static size_t alignUp(size_t ptr) { | ||
| return (size_t(ptr) + ALIGNMENT - 1) & -ALIGNMENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cast to size_t is unnecessary, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks, fixing.
system/lib/emmalloc.cpp
Outdated
| * | ||
| * Assumptions: | ||
| * | ||
| * - 32-bit system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really an assumption about 32-bitness, or about type sizes (e.g. size_t is the same as uintptr_t)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realize at first that this is a cpp file, we should totally add some static asserts to check these assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clarify, and add asserts - this about pointers being 32 bit.
system/lib/emmalloc.cpp
Outdated
| if (x == 0) return 1; | ||
| // e.g. 5 is 0..0101, so clz is 29, and we want | ||
| // 4 which is 1 << 2, so the result should be 2 | ||
| return 31 - __builtin_clz(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 31 could be a define/const or some sizeof instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it could be sizeof(size_t) * 8 - 1 but that seems less nice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do something like #define SIZE_BIT sizeof(size_t) * CHAR_BIT above where we check all of the assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, done.
system/lib/emmalloc.cpp
Outdated
| static void* getPayload(Region* region) { | ||
| assert(((char*)®ion->freeInfo()) - ((char*)region) == METADATA_SIZE); | ||
| assert(region->getUsed()); | ||
| assert(sizeof(FreeInfo) == ALLOC_UNIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be static asserts and moved up to where these values are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing.
| * Assumptions: | ||
| * | ||
| * - 32-bit system. | ||
| * - Single-threaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought there might be a way at compile-time to get the value of the compiler's thread-model (e.g. "single" or "posix") so we could assert on that too, but I didn't find one. It might be worth adding that to clang for wasm, since we expect users to care more about the distinction that on other platforms.
system/lib/emmalloc.cpp
Outdated
| static const size_t MAX_FREELIST_INDEX = 32; // uint32_t | ||
|
|
||
| static FreeInfo* freeLists[MAX_FREELIST_INDEX] = { | ||
| NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nulls could all be nullptrs.
system/lib/emmalloc.cpp
Outdated
| region = newAllocation(size); | ||
| if (!region) { | ||
| // We failed to allocate, sadly. | ||
| return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this allocator should respect ABORTING_MALLOC too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABORTING_MALLOC should just work, that is handled by sbrk (it will abort when it can't grow memory).
|
@juj "What kind of multithreading approaches can you think?" I don't have much thoughts there yet. Maybe this will end up mostly interesting for the single-threaded case anyhow. |
|
Merged current incoming and addressed review comments, thanks! |
|
Sweet! lgtm |
|
I think this broke the wasm backend: |
This benchmark was added during emmalloc development but was never part of any emscripten test AFAICT. See #6249. Moving this is file as part of a cleanup of the tests/ directory.
This benchmark was added during emmalloc development but was never part of any emscripten test AFAICT. See #6249. Moving this is file as part of a cleanup of the tests/ directory.
This adds a MALLOC option, where the user can pick which malloc/free implementation to use. The options are the existing dlmalloc, still the default, and the new emmalloc, which is about 1/3 the size. In small programs this can be very significant.
Performance-wise, emmalloc is small because it is very simple: all allocations are 8-byte, and there is no pooling, both of which are necessary to do well on huge amounts of tiny allocations. Because of that, is 35% slower on the havlak benchmark (which allocates tons of 12-24 size areas, an AST of small nodes) and 10% slower on lua-binarytrees (apparently each lua object is a separate allocation, and there's an AST of small nodes there too...). But on all other benchmarks the only noticeable difference is smaller code size.
In theory maybe we'd want to use the small allocator by default, or maybe just in say
-Os. But for now it seems safest to stick with dlmalloc, and have the new allocator as an option.This PR does two other noticeable changes: