Skip to content

Conversation

@kripken
Copy link
Member

@kripken kripken commented Feb 15, 2018

This adds a MALLOC option, where the user can pick which malloc/free implementation to use. The options are the existing dlmalloc, still the default, and the new emmalloc, which is about 1/3 the size. In small programs this can be very significant.

Performance-wise, emmalloc is small because it is very simple: all allocations are 8-byte, and there is no pooling, both of which are necessary to do well on huge amounts of tiny allocations. Because of that, is 35% slower on the havlak benchmark (which allocates tons of 12-24 size areas, an AST of small nodes) and 10% slower on lua-binarytrees (apparently each lua object is a separate allocation, and there's an AST of small nodes there too...). But on all other benchmarks the only noticeable difference is smaller code size.

In theory maybe we'd want to use the small allocator by default, or maybe just in say -Os. But for now it seems safest to stick with dlmalloc, and have the new allocator as an option.

This PR does two other noticeable changes:

  • Our sbrk used to align. The spec doesn't say it needs to, and it's redundant as dlmalloc and emmalloc check the alignment anyhow. So this PR removes that.
  • I noticed that building dlmalloc with NDEBUG for opt builds is better. So this PR makes dlmalloc a little smaller as well.

// In general, if you don't need one of those special modes, and if you don't
// allocate very many small objects, you should use emmalloc since it's
// smaller. Otherwise, if you do allocate many small objects, dlmalloc
// is usually worth the extra size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if an option "none" will be useful, to force user to supply implementation of malloc and free themselves? Not to say that should be implemented in this PR, but just wondering if that'll be feasible in this arch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think regardless of what this setting is, the linking of malloc should behave as it does on native platforms; i.e. if you put your own implementation of malloc and free in an object file in the link, then that will be used instead of the emscripten-provided one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right, and that's something we do already test to work. This replacement however can be a bit of magic, since if something changes, e.g. one forgets to link to own malloc, there is generally no error message, and one silently gets the built-in malloc, so thought this kind of option would be nice to explicitly kill the original malloc from being considered, in which case one would get a nice missing symbol error instead if e.g. after build system refactoring one lacks linking in their own malloc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could add a "none" option that means it would be a link error to not provide your own malloc. I wonder if there isn't a more general way to do this. Not opposed, though.

// All allocations are aligned to this value.
static const size_t ALIGNMENT = 8;

// Even allocating 1 byte incurs this much actual payload
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will allocating 0 bytes just allocate the 8 byte metadata, or also 8 bytes of payload?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating 0 just returns NULL, actually, which is allowed by the docs

 If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free(). 


Region*& prev() { return _prev; }
// The next region is not, as we compute it on the fly
Region* next() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function and all the above up to getTotalSize be static?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nm, first read that these were global scope functions, but they are instead members of Region.

import os, shutil, logging

TAG = 'version_42'
TAG = 'version_43'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional to coincide with this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it depending on some fixes there. Meanwhile time has passed and a later version has been updated anyhow ;) So this will vanish in the merge.

if shared.Settings.USE_PTHREADS:
ret += '_threadsafe'
extra += '_threadsafe'
base = 'dlmalloc'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of validation ends up silently ignoring -s MALLOC=emmalloc directive when -s USE_PTHREADS=1 and similar are set. Perhaps emcc.py could instead validate that -s MALLOC=emmalloc and -s USE_PTHREADS=1 are mutually exclusive and so forth, and then here we could just pass-through base = shared.Settings.MALLOC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll make it show a clear error in those cases, instead of silently "fixing" things.

@juj
Copy link
Collaborator

juj commented Feb 15, 2018

Very nice, looks like you went much farther in writing an allocator rather than doing the minimum. What kind of multithreading approaches can you think? Top level mutex guard, or guard each pool separately by a mutex? Linked list inserts and removes can be made lock free, though did not look really close to details if there's some data that inherently will need the mutexes.

lgtm, though probably may want to squash when merging?

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a super deep dive but a few things I noticed.

* - 32-bit system.
* - Single-threaded.
* - sbrk() is used, and nothing else.
* - sbrk() will not be accessed by anyone else.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really only an assumption that sbrk will never be called, or an assumption that there is nothing else trying to use or allocate any address space other than the static/stack/etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does assume sbrk() is not called by anyone else. Reading the docs, I don't see an official statement on it, but I see statements like "malloc will call sbrk(), and you shouldn't", so it seems already frowned upon. So many pointing this out here is redundant.


// Align a pointer, increasing it upwards as necessary
static size_t alignUp(size_t ptr) {
return (size_t(ptr) + ALIGNMENT - 1) & -ALIGNMENT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cast to size_t is unnecessary, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, fixing.

*
* Assumptions:
*
* - 32-bit system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an assumption about 32-bitness, or about type sizes (e.g. size_t is the same as uintptr_t)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize at first that this is a cpp file, we should totally add some static asserts to check these assumptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify, and add asserts - this about pointers being 32 bit.

if (x == 0) return 1;
// e.g. 5 is 0..0101, so clz is 29, and we want
// 4 which is 1 << 2, so the result should be 2
return 31 - __builtin_clz(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 31 could be a define/const or some sizeof instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could be sizeof(size_t) * 8 - 1 but that seems less nice...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do something like #define SIZE_BIT sizeof(size_t) * CHAR_BIT above where we check all of the assumptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, done.

static void* getPayload(Region* region) {
assert(((char*)&region->freeInfo()) - ((char*)region) == METADATA_SIZE);
assert(region->getUsed());
assert(sizeof(FreeInfo) == ALLOC_UNIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be static asserts and moved up to where these values are defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing.

* Assumptions:
*
* - 32-bit system.
* - Single-threaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought there might be a way at compile-time to get the value of the compiler's thread-model (e.g. "single" or "posix") so we could assert on that too, but I didn't find one. It might be worth adding that to clang for wasm, since we expect users to care more about the distinction that on other platforms.

static const size_t MAX_FREELIST_INDEX = 32; // uint32_t

static FreeInfo* freeLists[MAX_FREELIST_INDEX] = {
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nulls could all be nullptrs.

region = newAllocation(size);
if (!region) {
// We failed to allocate, sadly.
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this allocator should respect ABORTING_MALLOC too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABORTING_MALLOC should just work, that is handled by sbrk (it will abort when it can't grow memory).

@kripken
Copy link
Member Author

kripken commented Mar 5, 2018

@juj "What kind of multithreading approaches can you think?"

I don't have much thoughts there yet. Maybe this will end up mostly interesting for the single-threaded case anyhow.

@kripken
Copy link
Member Author

kripken commented Mar 5, 2018

Merged current incoming and addressed review comments, thanks!

@juj
Copy link
Collaborator

juj commented Mar 7, 2018

Sweet! lgtm

@kripken kripken merged commit 78d3f20 into incoming Mar 8, 2018
@kripken kripken deleted the minimalloc_8 branch March 8, 2018 00:58
@sbc100
Copy link
Collaborator

sbc100 commented Mar 8, 2018

I think this broke the wasm backend:

$ EM_CONFIG=$PWD/../waterfall/src/work/wasm-install/emscripten_config_vanilla  ./tests/runner.py binaryen2.test_emmalloc
WARNING:root:use EM_ALL_ENGINES=1 in the env to run against all JS engines, which is slower but provides more coverage
Test suites:
['test_core']
Running test_core: (1 tests)
test_emmalloc (test_core.binaryen2) ... (checking sanity from test runner)
INFO:root:(Emscripten: Running sanity checks)
normal
(test did not pass in JS engine: ['/usr/local/google/home/sbc/bin/node-v8.4.0-linux-x64/bin/node'])
ERROR

======================================================================
ERROR: test_emmalloc (test_core.binaryen2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/google/home/sbc/dev/wasm/emscripten/tests/test_core.py", line 731, in test_emmalloc
    test()
  File "/usr/local/google/home/sbc/dev/wasm/emscripten/tests/test_core.py", line 729, in test
    open(path_from_root('tests', 'core', 'test_emmalloc.txt')).read())
  File "/usr/local/google/home/sbc/dev/wasm/emscripten/tests/runner.py", line 701, in do_run
    raise e
Exception: Expected to find '>> the_end
' in '
>> beginning


>> basics


>> allocate 0

Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics
abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics
exception thrown: abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
failed to asynchronously prepare wasm: abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
', diff:

--- expected
+++ actual
@@ -1,2 +1,16 @@
->> the_end

+>> beginning
+
+
+>> basics
+
+
+>> allocate 0
+
+Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics
+abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
+Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics
+exception thrown: abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
+failed to asynchronously prepare wasm: abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
+abort("Assertion failed: ptr == 0, at: /tmp/emscripten_test_binaryen2_Ndqk7U/src.cpp,1242,basics"). Build with -s ASSERTIONS=1 for more info.
+



----------------------------------------------------------------------
Ran 1 test in 3.407s

@kripken
Copy link
Member Author

kripken commented Mar 8, 2018

@sbc100 sorry about that, turns out it's easy to fix though, see #6316

sbc100 added a commit that referenced this pull request Jan 11, 2022
This benchmark was added during emmalloc development but was never part
of any emscripten test AFAICT.  See #6249.

Moving this is file as part of a cleanup of the tests/ directory.
sbc100 added a commit that referenced this pull request Jan 11, 2022
This benchmark was added during emmalloc development but was never part
of any emscripten test AFAICT.  See #6249.

Moving this is file as part of a cleanup of the tests/ directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants