Request
The current kv-ir deserialization speed is not ideal. From our profiling experiment, around 30-40% of time is spent to construct/destruct the variable dictionary list (which is parsed and stored here: https://github.com/y-scope/clp/blob/main/components/core/src/clp/ffi/ir_stream/decoding_methods.cpp#L388)
We should come up a workaround to avoid this memory allocation overhead.
Possible implementation
We should improve how EncodedTextAst is implemented. Instead of storing var strings as a vector of strings, we can store all var strings and the logtype in a concat string. For example:
logtype: "id=%s, passwd=%s" (
vars: ["x", "y"]
We can store it as:
string_buffer (as a string): ["xyid=%s, passwd=%s"]
with a possition vector to track how to partition substrings:
In this way, we don't need to allocate strings for each var string and the logtype, and improve the spatial locality. In the meantime, we still preserve the capability to randomly access var strings or the logtype from the string buffer.
An early-stage experiment shows that this implementation leads to a 1.67x speedup, tested on two datasets.
Milestones:
Request
The current kv-ir deserialization speed is not ideal. From our profiling experiment, around 30-40% of time is spent to construct/destruct the variable dictionary list (which is parsed and stored here: https://github.com/y-scope/clp/blob/main/components/core/src/clp/ffi/ir_stream/decoding_methods.cpp#L388)
We should come up a workaround to avoid this memory allocation overhead.
Possible implementation
We should improve how
EncodedTextAstis implemented. Instead of storing var strings as a vector of strings, we can store all var strings and the logtype in a concat string. For example:We can store it as:
with a possition vector to track how to partition substrings:
In this way, we don't need to allocate strings for each var string and the logtype, and improve the spatial locality. In the meantime, we still preserve the capability to randomly access var strings or the logtype from the string buffer.
An early-stage experiment shows that this implementation leads to a 1.67x speedup, tested on two datasets.
Milestones: