Conversation
…te the segment record to a json file Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
…e, updated the file as well Signed-off-by: njingco <jingco.nicole@gmail.com>
…segment records from multiple endpoints Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
cstyan
left a comment
There was a problem hiding this comment.
Good progress, made some comments. Let me know if anything is unclear.
tsdb/wal/watcher.go
Outdated
| } | ||
|
|
||
| // Sets the current Segment files name | ||
| w.SegmentFile = SegmentName("", currentSegment) |
There was a problem hiding this comment.
Would it be possible for the watcher to have progressed to a new segment, without sending all samples from the previous segment? I think if you had a large enough buffer capacity in the queues that it could happen.
Signed-off-by: njingco <jingco.nicole@gmail.com>
…tly update the json file every set time, only made queue_manager have an update function for the CheckpointRecord Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
Signed-off-by: njingco <jingco.nicole@gmail.com>
cstyan
left a comment
There was a problem hiding this comment.
made some comments, good progress
| w.recordsReadMetric.WithLabelValues(recordType(dec.Type(rec))).Inc() | ||
|
|
||
| // Sets the current Segment files it's reading from | ||
| w.CurrSegmentFile = SegmentName("", segmentNum) |
There was a problem hiding this comment.
this should be set in the same location as the metric is being set
There was a problem hiding this comment.
would this not go in the readSegment() since this is where the current segment is being read
There was a problem hiding this comment.
It's cleaner to set it where the control of what currentSegment is set to is. The Watchers Run function controls the local currentSegment, both passing the value to watch (and subsequently readSegment) and updating the value to the next segment number when watch returns. It's also easier for the next person who comes along to read the code if these two related variables are set in the same location.
…reating the checkpoint file, added the context the timed checkpointing go thread Signed-off-by: njingco <jingco.nicole@gmail.com>
|
|
||
| // UpdateEndpointRecord gets the current segment number read from the watcher and stores it to | ||
| func (t *QueueManager) UpdateEndpointRecord() { | ||
| t.endpointRecord.Segment = t.watcher.CurrSegmentFile |
There was a problem hiding this comment.
watcher should maybe have a CurrentSegment() function rather than exposing one of it's fields
| w.recordsReadMetric.WithLabelValues(recordType(dec.Type(rec))).Inc() | ||
|
|
||
| // Sets the current Segment files it's reading from | ||
| w.CurrSegmentFile = SegmentName("", segmentNum) |
There was a problem hiding this comment.
It's cleaner to set it where the control of what currentSegment is set to is. The Watchers Run function controls the local currentSegment, both passing the value to watch (and subsequently readSegment) and updating the value to the next segment number when watch returns. It's also easier for the next person who comes along to read the code if these two related variables are set in the same location.
|
|
||
| MaxSegment: -1, | ||
| MaxSegment: -1, | ||
| CurrSegmentFile: "", |
There was a problem hiding this comment.
empty string is the default value for a string, no need to set it here
| // before. | ||
| var castagnoliTable = crc32.MakeTable(crc32.Castagnoli) | ||
| var ( | ||
| castagnoliTable = crc32.MakeTable(crc32.Castagnoli) |
There was a problem hiding this comment.
are you using this for something, maybe forgot to add a file to a commit?
|
|
||
| // EndpointRecord Structure holds the segment the endpoint url | ||
| type EndpointRecord struct { | ||
| Segment string |
There was a problem hiding this comment.
the fields on this and checkpoint record don't need to be exported if you don't have a need to access those fields directly outside of the storage/remote package.
| } | ||
|
|
||
| // Segmentnumber out of bounds | ||
| if segmentNumber < wal.MinSegmentNumber || segmentNumber > wal.MaxSegmentNumber { |
There was a problem hiding this comment.
We should maybe have the watcher have a ValidSegment function that uses wal.Segments to check if segmentNumber exists in the current WAL directory. Thoughts @csmarchbanks ?
storage/remote to cache these segment offsets to disk. Deprecates prometheus#7710. Deprecates prometheus#8918. Closes prometheus#8809. Signed-off-by: Robert Fratto <robertfratto@gmail.com>
storage/remote to cache these segment offsets to disk. Deprecates prometheus#7710. Deprecates prometheus#8918. Closes prometheus#8809. Signed-off-by: Robert Fratto <robertfratto@gmail.com>
|
We have looked at this pull request during our bug scrub. @csmarchbanks @cstyan, we don't have much context on our side. What is the status of this pull request? is this still valid/needed? Thank you for your contribution anyway @njingco |
|
We can close this implementation. We'd still like this feature but no one has worked on this implementation path in quite some time. |
Currently remote write sends data via Prometheus WAL, but the pointer for each remote write queue in the WAL does not persist across restarts, as per #6333 . This PR will attempt to solve this issue by recording an identifier to indicate which segment file to re transmit on Prometheus restart. Re transmitting the whole segment file will potentially have zero data loss. Please refer to my proposal document for more information: Proposal Doc
This PR is currently in progress, but will include the base of the proposal. It will include recording the identifier during shut down, and at a set time, identifying the segment number for each endpoint, and error/corruption checks for the record json file.
It currently has the ability to record the identifier into a json. Sample of the record file is in
SegmentRecord.json. It also does corruption checks of the json file at the start of thequeue_manager.go.Whats left to do in this PR add the timed recording from start of
queue_manager, and implementing the time customization in the configuration file.This PR will not have the implementation of setting the segment file for re transmission. This will be in a future PR. This PR will only manage the recording of the identifiers.