[ML] Store compressed model definitions in ByteReferences#71679
[ML] Store compressed model definitions in ByteReferences#71679davidkyle merged 9 commits intoelastic:feature/pytorch-inferencefrom
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
7714b2e to
6dd2c1e
Compare
There was a problem hiding this comment.
total definition length is now tracked again as we need to know the size of PyTorch models up front
There was a problem hiding this comment.
It might be nice to have a private method BytesArray base64Encode(String) and use it throughout
There was a problem hiding this comment.
OK, so this will write out the raw bytes of the GZIP. Is this what we want or do we want to run the Base64 encoder?
I thought the guarantees around base64 character sizes was one of the reasons we could skip transforming into a string?
There was a problem hiding this comment.
Binary data is stored in lucene base64 encoded,
Ah, so since the mapping is binary we get that for free.
There was a problem hiding this comment.
It's handled by the Jackson JSON generator which is used by the various XContentBuilder::value(byte[] value) methods to write bytes
There was a problem hiding this comment.
Might be good to indicate that the length is in UTF-8 bytes.
EDIT: Well, maybe not utf-8 bytes...but bytes or something
There was a problem hiding this comment.
I added the fact the size is in bytes to the message in 7a59661
.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java
Outdated
Show resolved
Hide resolved
.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java
Outdated
Show resolved
Hide resolved
The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713
Binary data is stored in lucene base64 encoded, the same data stored in a Java string uses 2 bytes (UTF16) to represent each base64 character consuming twice the amount of memory required. This change uses ByteReferences to hold the binary data which is not base64 encoded, encoding must take place the bytes can be persisted and this is performed by the XContent classes.
For BWC I've added a new field mapping
binary_definitionto .ml-inference-* which means the index version has to be incremented.Compatibility for HLRC and REST API users is preserved as the
compressed_definitionfield still contains the base64 encoded rep.