Skip to content

Stats memory usage can probably be improved quite a bit #3585

@jmarantz

Description

@jmarantz

#3508 is about not having to allocate stats memory as an NxM block. But really the much bigger prize here is to use a lot less stats memory by not storing gigabytes of repeated strings. Why we want the fully elaborated stat name in memory at all?

If we represented stats structurally with a format string with named variable substitutions, e.g.
prefix.$var1.keyword.$var2
and variable assignments:
var1="xxxx"
var2="yyyy"

The static strings there "prefix.$var1.keyword.$var2" and "var1" and "var2" would not be needed in dynamic memory at all but could live in the code text as a static const char[] or one of those lazy-static-initialized structs of strings. All we'd need to keep in dynamic memory are the substitutions xxxx and yyyy in that case. And I think typically many stats would have the same valued substitutions which could share the same memory. This would be a little complex as a shared-memory block, depending on whether we need to free and reallocate within the shm-block.

I was thinking about this because we are starting to count stats memory in the gigabytes (reference earlier bug (#3463) where uint32 wasn't enough for byte offsets for @ggreenway). Most of this memory is for strings, most of which are really variations on common patterns. It'd be nice to ultimately use that memory for data cache.

I think this would speed things up too -- I was kind of going in that direction in my earlier optimizations (I got a working prototype based on structured substitutions instead of regexes) but I found I was able to get enough of the startup speed improvements with hacks to skip regex lookups, but it'd be faster still if we just used a better rep in the first place. But the main goal here is scalability.

Of course RawStatData could have a name() method which could elaborate a string for printing for debug or whatever, but I think many structured stat sinks would benefit from having the structure be explicit in the representation. Of course we'd have to make maps of stats know about this structure to avoid hanging onto the elaborated string data.

A simpler variation on the above was suggested by @htuch is to store arrays of "." separated tokens, each of which could be part of a symbol table. This would be simpler to integrate but still require complex regex-based tag-token extraction.

A few questions remain about how to do this, especially in the context of hot restart, but opening this issue now to collect discussion.

Metadata

Metadata

Assignees

Labels

enhancementFeature requests. Not bugs or questions.help wantedNeeds help!

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions