Utility function get_corpus reads the entire file which does not work for very large JSON data sets. These data sets are often handled in form of newline delimited JSON (ndjson). An utility function or code example would be nice to loop over lines of a .ndjson file and parse each line. My rusted knowledge of C++ and memory management is not enough to judge whether simply using getline is the best approach: the documentation says that padded_string class is recommended but getline uses realloc internally anyway (?)