Support CloseWrite event such that we don't reload on every Write event#434
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for the CloseWrite fsnotify event so that the telemetry server only reloads certificates once the writer has finished, and includes a test for slow writes to verify the new behavior.
- Patch fsnotify to expose a
CloseWriteevent and updatetelemetry.goto listen for it instead of rawWriteevents. - Introduce
TestINotifyCertMonitoringSlowWritesintelemetry_test.gowith helper functions to simulate a slow writer. - Update
Makefileto apply the new fsnotify patch during the build.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| telemetry/telemetry.go | Switch from fsnotify.Write to fsnotify.CloseWrite events |
| telemetry/telemetry_test.go | Add copyFile, writeSlowKey, and TestINotifyCertMonitoringSlowWrites |
| patches/0004-CloseWrite-event-support.patch | Enable CloseWrite support in vendored fsnotify |
| Makefile | Apply the CloseWrite patch |
Comments suppressed due to low confidence (1)
telemetry/telemetry_test.go:21
- The test uses time.Sleep but the "time" package is not imported; please add
"time"to the imports to ensure the code compiles.
"io"
telemetry/telemetry.go
Outdated
| filepath.Ext(event.Name) == ".key") { | ||
| log.V(1).Infof("Inotify watcher has received event: %v", event) | ||
| if event.Op&fsnotify.Write == fsnotify.Write || event.Op&fsnotify.Create == fsnotify.Create { | ||
| if event.Op&fsnotify.CloseWrite == fsnotify.CloseWrite || event.Op&fsnotify.Create == fsnotify.Create { |
There was a problem hiding this comment.
if we trigger a reload when create signal is received wouldn't we have a race condition where the file is still being rewritten and we start reloading? it seems that when CREATE is received there will be a CLOSE_WRITE event later on as well so we may just wait for CLOSE_WRITE?
also if the file is moved from another location to the location that we're monitoring we should check IN_MOVED_TO event as well, right? #Closed
There was a problem hiding this comment.
-
There is a potential timing issue where if the cert file is created and then written to that we could potentially miss the CLOSE_WRITE event thus reloading with the non-written or partial-written cert/key. I think the one way we can fix this is with a debounce timer such that we check for incoming events in the next couple of seconds and we use the latest event as the event to act on. We can't just use CLOSE_WRITE as the only option as there are certain operations where only CREATE is triggered but not CLOSE_WRITE such as when you execute "mv" from another location to the monitored directory.
-
CREATE/CLOSE_WRITE are triggered when we move /copy/write to the cert files in the monitored directory.
There was a problem hiding this comment.
for mv it's actually rename() syscall invoked which triggers IN_MOVED_TO event:
kema2@KEMA2-SL1:/mnt/c/Users/kema2$ strace mv temp1.csv.bak temp1.csv
renameat(AT_FDCWD, "temp1.csv.bak", AT_FDCWD, "temp1.csv") = 0
https://man7.org/linux/man-pages/man7/inotify.7.html
Suppose an application is watching the directories dir1 and dir2,
and the file dir1/myfile. The following examples show some events
that may be generated.
link("dir1/myfile", "dir2/new");
Generates an IN_ATTRIB event for myfile and an
IN_CREATE event for dir2.
rename("dir1/myfile", "dir2/myfile");
Generates an IN_MOVED_FROM event for dir1, an
IN_MOVED_TO event for dir2, and an IN_MOVE_SELF event
for myfile. The IN_MOVED_FROM and IN_MOVED_TO events
will have the same cookie value.
There was a problem hiding this comment.
cp should trigger CREATE and CLOSE_WRITE IIUC.
There was a problem hiding this comment.
btw I don't think we'll ever miss some notification as long as we consume the event as fast as possible.
There was a problem hiding this comment.
You are right, fsnotify however uses "CREATE" for both CREATE and IN_MOVED_TO calls: https://github.com/fsnotify/fsnotify/blob/c2828203cd70a50dcccfb2761f8b1f8ceef9a8e9/inotify.go#L321
if mask&unix.IN_CREATE == unix.IN_CREATE || mask&unix.IN_MOVED_TO == unix.IN_MOVED_TO {
e.Op |= Create
}
Which explains why when I move a file to the monitored directory we still only just see the create event,
There was a problem hiding this comment.
Added code to distinguish between between CREATE and IN_MOVED_TO events such that we support reloading on IN_MOVED_TO events and CLOSE_WRITE events.
Added appropriate UT as wel..
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Cherry-pick PR to 202505: #441 |
Why I did it
MS ADO: 33377547
If we have a slow writer to a cert file that is monitored such that they truncate the file, and write bytes at a time, we will see excessive amount of WRITE events such that our server will try to reload on each one. With CloseWrite supported, once a writer has finished writing to the file and closed the file we will then try the reload. We also add logic to distinguish between CREATE and IN_MOVED_TO events such that we will reload on IN_MOVED_TO but not CREATE events.
How I did it
Patched is based off fsnotify/fsnotify#252
How to verify it
Manual test, UT
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)