feat: loki.source.file file_match block#4743
Conversation
|
💻 Deploy preview available (Kalleep/loki source file match): |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces built-in file discovery capabilities to loki.source.file component through a new file_match block. The feature allows glob pattern-based file discovery directly within the component as an alternative to using the separate local.file_match component.
Key changes:
- Added
file_matchconfiguration block with glob pattern support using the doublestar library - Implemented a resolver abstraction (static vs. glob) to handle different path resolution strategies
- Added a watcher mechanism to periodically sync filesystem state when file matching is enabled
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
internal/component/loki/source/file/watcher.go |
New watcher component for periodic filesystem synchronization |
internal/component/loki/source/file/resolver.go |
New resolver abstraction with static and glob implementations |
internal/component/loki/source/file/resolver_test.go |
Unit tests for resolver implementations |
internal/component/loki/source/file/file.go |
Core component changes to integrate file matching and resolver system |
internal/component/loki/source/file/file_test.go |
Test infrastructure updates to validate both file match modes |
internal/component/loki/source/file/*_test.go |
Updated test utilities to use new logging API |
docs/sources/reference/components/loki/loki.source.file.md |
Documentation for new file_match block with examples |
docs/sources/reference/components/local/local.file_match.md |
Updated to recommend built-in file_match for simple cases |
CHANGELOG.md |
Added feature entry |
|
💻 Deploy preview available (faet: loki.source.file file_match block): |
| `local.file_match` discovers files on the local filesystem using glob patterns and the [doublestar][] library. | ||
|
|
||
| {{< admonition type="note" >}} | ||
| For simple use cases where you only need to discover files for a single `loki.source.file` component, consider using the built-in [`file_match`](../../loki/loki.source.file/#file_match) block in `loki.source.file` instead. |
There was a problem hiding this comment.
I think it's also better option for very large numbers of files that are frequently changing?
Perhaps we should recommend to always use the built-in block and only use local.file_match for things that are not possible with the built-in block. That way we ensure people get maximum correctness & performance.
Should we try create a list of use-cases that are not possible with built-in block?
There was a problem hiding this comment.
Yes there is examples with file globing using either builtin or local.file_match. There I added this header
### File globbing with file_match (recommended)
Do you think we should make it clearer here?
There was a problem hiding this comment.
Yes, I think we need to make it more clear when to use which option. And probably the logic should be to use the built-in file_match whenever you can and a separate component for cases that are not possible with built-in (we need to list a few of them)
| - Better performance: File discovery is integrated directly into `loki.source.file`, eliminating the overhead of component-to-component communication | ||
| - Same functionality: Supports the same glob patterns and features, including `__path_exclude__` for excluding files | ||
|
|
||
| When `enabled` is set to `true`, you can use glob patterns (e.g., `/tmp/*.log` or `/var/log/**/*.log`) directly in the `targets` argument's `__path__` label. |
There was a problem hiding this comment.
What happens when enabled is false?
There was a problem hiding this comment.
Then we require absolute file paths, like this component behaved before this pr
There was a problem hiding this comment.
Can we write that in docs?
There was a problem hiding this comment.
We already have it here https://github.com/grafana/alloy/pull/4743/files#diff-4b8c6b16f7e932bea249731a9f49d947024699eb129a37f0846aed9c1086e945R21-R23.
Do you think we need it in both places?
There was a problem hiding this comment.
Changed this in 61c3e8c
I listed the use cases I could think of for local.file_match
| The `file_match` block enables built-in file discovery directly within `loki.source.file`. | ||
| When enabled, the component discovers files on the local filesystem using glob patterns and the [doublestar][] library. |
There was a problem hiding this comment.
What happens when someone provides file_match and also sends targets in using the previous pattern?
I think that should ideally be an error, unless there's a simple way to make both work.
There was a problem hiding this comment.
It will work but you get some overhead because now you are globing and syncing twice. Because all paths will match directly to one file and that is fine to pass to Glob
|
💻 Deploy preview deleted (feat: loki.source.file file_match block). |
internal/converter/internal/promtailconvert/testdata/azure.alloy
Outdated
Show resolved
Hide resolved
| |---------------------|---------------------|-------------------------------------------------------------|---------|----------| | ||
| | `enabled` | `bool` | Enable file discovery using glob patterns. | `false` | no | | ||
| | `ignore_older_than` | `duration` | Ignores files with modification times before this duration. | `"0s"` | no | | ||
| | `sync_period` | `duration` | How often to sync filesystem and targets. | `"10s"` | no | |
There was a problem hiding this comment.
is that a good default? we have a chance to change it
There was a problem hiding this comment.
I think the defaults are good, this is the same defaults that are using in promtail too.
0e6b45a to
ba28b05
Compare
thampiotr
left a comment
There was a problem hiding this comment.
Thanks, excited for this to get merged! :)
Just a few comments so far.
There was a problem hiding this comment.
There is a //go:build !race on top of this file.
- Are we sure this test runs in CI? it's not obvious to me right now
- Could we get rid of this and fix the race conditions, since we done some improvements recently?
| c.resolver = newStaticResolver() | ||
| } | ||
|
|
||
| if newArgs.FileMatch.SyncPeriod != c.args.FileMatch.SyncPeriod && newArgs.FileMatch.SyncPeriod < 0 { |
There was a problem hiding this comment.
| if newArgs.FileMatch.SyncPeriod != c.args.FileMatch.SyncPeriod && newArgs.FileMatch.SyncPeriod < 0 { | |
| if newArgs.FileMatch.SyncPeriod != c.args.FileMatch.SyncPeriod && newArgs.FileMatch.SyncPeriod > 0 { |
Although I'm not sure if that can even happen? Because default is not 0, right?
maybe we should think what to do with c.watcher when we disable FileMatch separately.
| if !c.args.FileMatch.Enabled { | ||
| return | ||
| } |
There was a problem hiding this comment.
So if we disable FileMatch and then enable FileMatch, I think the goroutine will not be restarted... can we add a test for this and fix the issue?
We have two choices: always keep the goroutines and watcher ticker on and check for FileMatch enabled - a bit more expensive but a simpler code. Or we schedule them only when FileMatch is enabled - a bit more complex, but fewer resources used. I don't have a strong opinion, I would like this code to be easier to follow though...
There was a problem hiding this comment.
Good catch! My intention was to keep it running regardless, so we should not return here.
And only because it's way more simple to do that then having to manage it in between Updates
| `local.file_match` discovers files on the local filesystem using glob patterns and the [doublestar][] library. | ||
|
|
||
| {{< admonition type="note" >}} | ||
| Use the built-in [`file_match`](../../loki/loki.source.file/#file_match) block in `loki.source.file` whenever possible. |
There was a problem hiding this comment.
We might want to reach out to k8s helm charts maintainers to change their defaults if they have them.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Piotr <17101802+thampiotr@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
b56ee4b to
28814a9
Compare
PR Description
This PR adds a new
file_matchblock toloki.source.filethat enables built-in file discovery using glob patterns. This allows users to discover files directly withinloki.source.filewithout needing a separatelocal.file_matchcomponent.If
file_matchblock is not enabled component behaves like it did before.If this block enabled
local.file_matchis no longer needed and we avoid the overhead of waiting for alloy graph updates before we can schedule new files to tail.Which issue(s) this PR fixes
Notes to the Reviewer
I added "resolver" interface that will change depending on configuration. The static one behaves exactly like before, requiring absolute paths and the glob implementation will perform globing with doublestar.
The watcher will do nothing when file_match is not configured.
I also changed to use nop logger in test, logging in tests adds additional noise that makes it hard to see test output. Maybe we should be able to control logging for test with an env variable instead?
PR Checklist