TorchScript models are compressed binary files where the size is dependent on purpose but for neural networks models a size larger than 1GB is not unusual. Storing such a large model as a single document in Elasticsearch would be sub-optimal as it requires contiguous memory. Additionally uploading a document larger than the HTTP max content length limit is infeasible.
Better performance will be found by splitting the model into chunks and streaming those chunks to the native process to be reassembled on use. ml already uses this pattern for Anomaly Detection job state. AD uses 16MB chunks but it is worth benchmarking smaller chunk sizes. Because the model is binary data it must be base64 encoded then it can be stored in a Binary field or an index with mappings disabled.
Once the model is split into constituent chunks a meta-document will track the number of chunks, the IDs of the documents containing model chunks will follow a predictable naming convention. One vital piece of information required by the native process is the size of the model in bytes. This must be sent before any chunks are streamed, this value should be stored in the meta document as reading all the chunks into a buffer to calculate the un-base64 encoded size defeats the purpose of streaming.
TODO
TorchScript models are compressed binary files where the size is dependent on purpose but for neural networks models a size larger than 1GB is not unusual. Storing such a large model as a single document in Elasticsearch would be sub-optimal as it requires contiguous memory. Additionally uploading a document larger than the HTTP max content length limit is infeasible.
Better performance will be found by splitting the model into chunks and streaming those chunks to the native process to be reassembled on use. ml already uses this pattern for Anomaly Detection job state. AD uses 16MB chunks but it is worth benchmarking smaller chunk sizes. Because the model is binary data it must be base64 encoded then it can be stored in a Binary field or an index with mappings disabled.
Once the model is split into constituent chunks a meta-document will track the number of chunks, the IDs of the documents containing model chunks will follow a predictable naming convention. One vital piece of information required by the native process is the size of the model in bytes. This must be sent before any chunks are streamed, this value should be stored in the meta document as reading all the chunks into a buffer to calculate the un-base64 encoded size defeats the purpose of streaming.
TODO
byte []?