fix: Add support for compressed files for tail package#5363
Conversation
ptodev
left a comment
There was a problem hiding this comment.
LGTM. I only added a few minor comments.
One issue we have is that the previous implementation tracked position by line numbers but this implementation will track offset on uncompressed data
Is this going to be a problem for users who have a partially consumed archive? Is there a chance of logs being lost during Alloy upgrades?
if we had data that did not include a newline it would never be consumed.
It'd be good to note this with a fix: in the changelog.
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
Co-authored-by: Paulin Todev <paulin.todev@gmail.com>
It would be the other way around, we would most likely read lines that have already been consumed again |
|
I want to add integration tests for compression usage. If it's okay I will leave that to a followup |
Yeah this is probably good because now we get the same support for BOM detection when reading compressed data. |
loki.source.file
loki.source.file### Brief description of Pull Request When `decompression` is configured for `loki.soure.file` we now use the same code internally as we do when it's not configured. This aligns the feature between them like BOM detection. (cherry picked from commit 2347c1b)
) ## Backport of #5363 This PR backports #5363 to release/v1.13. ### Original PR Author @kalleep ### Description ### Brief description of Pull Request When `decompression` is configured for `loki.soure.file` we now use the same code internally as we do when it's not configured. This aligns the feature between them like BOM detection. ### Pull Request Details Before this pr we had two different implementation that used when tailing file, tailer and decompressor. The latter was used any time decompression was configured. Ever since my major refactors to tail package I was certain that we could add support for compressed files and reuse the same implementation. That is what I have done here so most code are now shared. When compression is configured for `tail.File` it will not wait if EOF is returned. It will then check if we have any remaining data to flush and the return EOF. This will stop the tailer. One issue we have is that the previous implementation tracked position by line numbers but this implementation will track offset on uncompressed data, not sure how we can handle that. In addition to decompression support I fixed an issue that I noticed by adding `flush` to reader, if we hit EOF we drain all remaining data. But if we had data that did not include a newline it would never be consumed. ### Issue(s) fixed by this Pull Request ### Notes to the Reviewer If we consume a compressed file we will exit and remove the stored position, while alloy is running this is fine. But if alloy is restarted we will ingest the files again. This is true with both this implementation and the previous one and we should fix that in another pr. ### PR Checklist <!-- Remove items that do not apply. For completed items, change [ ] to [x]. --> - [ ] Documentation added - [x] Tests updated - [ ] Config converters updated --- *This backport was created automatically.* Co-authored-by: Karl Persson <23356117+kalleep@users.noreply.github.com>
Brief description of Pull Request
When
decompressionis configured forloki.soure.filewe now use the same code internally as we do when it's not configured. This aligns the feature between them like BOM detection.Pull Request Details
Before this pr we had two different implementation that used when tailing file, tailer and decompressor. The latter was used any time decompression was configured.
Ever since my major refactors to tail package I was certain that we could add support for compressed files and reuse the same implementation.
That is what I have done here so most code are now shared. When compression is configured for
tail.Fileit will not wait if EOF is returned. It will then check if we have any remaining data to flush and the return EOF. This will stop the tailer.One issue we have is that the previous implementation tracked position by line numbers but this implementation will track offset on uncompressed data, not sure how we can handle that.
In addition to decompression support I fixed an issue that I noticed by adding
flushto reader, if we hit EOF we drain all remaining data. But if we had data that did not include a newline it would never be consumed.Issue(s) fixed by this Pull Request
Notes to the Reviewer
If we consume a compressed file we will exit and remove the stored position, while alloy is running this is fine. But if alloy is restarted we will ingest the files again. This is true with both this implementation and the previous one and we should fix that in another pr.
PR Checklist