Currently, the profiler will sometimes miss memory allocated in global variables.
This is because allocations for a particular thread are recorded on a thread_local object holding allocation and deallocation information for that particular thread.
Once these thread-local objects are cleaned up, profiling is disabled. This occurs before global object cleanup occurs, and so the profiler misses some allocations.