Edit: the path forward is described in #288 (comment)
We should change the message definition to simply wrap the JSON serialized event that would be written to Elasticsearch. We will use a oneof type to allow for alternate serialization formats in the future, in particular the alternate vtprotobuf implementation appears to be more efficient than JSON and is the first candidate for an alternate format.
message Event {
// JSON serialized event to be written to Elasticsearch.
oneof event {
bytes json = 1;
}
}
Original Description
This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.
We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.
The shipper protocol currently only accepts events serialized to a customized version of the google.protobuf.Struct type. The intent is to allow more efficiently serializing frequently used types like timestamps and make processors easier to write and validate given the complete set of types they must operate on are known ahead of time.
We currently define both the message metadata and fields as our messages.Struct type https://github.com/elastic/elastic-agent-shipper-client/blob/1fbbb05f0b174053a5b160cdd5836eaed430cdbd/api/messages/publish.proto#L39-L42
// Metadata JSON object (map[string]google.protobuf.Value)
messages.Struct metadata = 4;
// Field JSON object (map[string]google.protobuf.Value)
messages.Struct fields = 5;
Given that most processes that will use the shipper are currently designed to serialize their internal event representations to JSON for direct ingestion by Elasticsearch, we should evaluate whether there is a noticeable performance hit introducing the conversion to messages.Struct for the shipper. Most processes using the shipper are highly optimized for serializing to JSON and may be noticeably less performant serializing to messages.Struct instead.
Specifically, we should benchmark the performance of Filebeat ingesting events using the shipper with the messages.Struct type and compare it to the performance of the same setup modified to transport the event as JSON bytes directly:
// Metadata JSON object.
bytes metadata = 4;
// Fields JSON object.
bytes fields = 5;
Edit: the path forward is described in #288 (comment)
We should change the message definition to simply wrap the JSON serialized event that would be written to Elasticsearch. We will use a oneof type to allow for alternate serialization formats in the future, in particular the alternate vtprotobuf implementation appears to be more efficient than JSON and is the first candidate for an alternate format.
Original Description
This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.
We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.
The shipper protocol currently only accepts events serialized to a customized version of the google.protobuf.Struct type. The intent is to allow more efficiently serializing frequently used types like timestamps and make processors easier to write and validate given the complete set of types they must operate on are known ahead of time.
We currently define both the message metadata and fields as our messages.Struct type https://github.com/elastic/elastic-agent-shipper-client/blob/1fbbb05f0b174053a5b160cdd5836eaed430cdbd/api/messages/publish.proto#L39-L42
Given that most processes that will use the shipper are currently designed to serialize their internal event representations to JSON for direct ingestion by Elasticsearch, we should evaluate whether there is a noticeable performance hit introducing the conversion to
messages.Structfor the shipper. Most processes using the shipper are highly optimized for serializing to JSON and may be noticeably less performant serializing tomessages.Structinstead.Specifically, we should benchmark the performance of Filebeat ingesting events using the shipper with the
messages.Structtype and compare it to the performance of the same setup modified to transport the event as JSON bytes directly: