This repository was archived by the owner on Sep 8, 2018. It is now read-only.
Merged
Conversation
The /stream endpoint now accepts a &window=W parameter, default 3s.
Also, found a bug where the btree in deduplicate will blow the stack if you try to push too many records into it, by combining a broad query e.g. &q=A with a short enough window. Haha ;(
When you do a bufio.Scanner.Scan loop that emits the tokens onto a chan, it's important to copy the token out of the scanner's internal buffer. Especially if the channel is buffered.
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds streaming queries. But how do it work?? 🤔
The flow of data
Stores regularly poll ingesters for their oldest segment. Stores aggregate ingester segments in memory until they reach an age or size threshold. Then, they replicate those aggregate segments to other store nodes. Only replicated segments are available for query. Once replication is successful, the original ingester segments are confirmed and deleted. See Design · Consume segments.
This PR adds a few things to the replication handler. First, it tees the replicated segment to disk and into memory, and then throws the in-memory segment over to a matching function. The matching function sends matching records to registered queries via a channel.
How do queries get registered?
Stream query lifecycle
Users register a stream query by making a GET to /store/stream. This builds a stream.Execute and stream.Deduplicate pipeline, and feeds the results back to the client via a long-lived HTTP connection (HTTP/1.1 with Transfer-Encoding: Chunked).
stream.Execute is responsible for making connections to each store node's internal stream endpoint. When the topology of the cluster changes, stream.Execute will detect it, and either create new connections or terminate old ones. If there is a network interruption between the initial store node and any other store node, stream.Execute will detect the error and retry until success.
Recall that each log record will be duplicated on replication_factor store nodes. stream.Deduplicate does a best-effort deduplication of log records based on their ULID over a given time window. Note this produces memory pressure commensurate with throughput; see caveats and risks, below.
The context from the user request is used to control the lifecycle of all components.
Internal stream endpoint
We've described how a user stream query fans out to N internal stream queries. The internal stream endpoint is the one that actually registers the stream query in the query registry and writes the matching records back to the originating node.
Caveats and risks
The following caveats and risks apply, and should be addressed in future work.