Skip to content

Support CloseWrite event such that we don't reload on every Write event#434

Merged
zbud-msft merged 4 commits intosonic-net:masterfrom
zbud-msft:closewrite_fsnotify
Jul 7, 2025
Merged

Support CloseWrite event such that we don't reload on every Write event#434
zbud-msft merged 4 commits intosonic-net:masterfrom
zbud-msft:closewrite_fsnotify

Conversation

@zbud-msft
Copy link
Copy Markdown
Contributor

@zbud-msft zbud-msft commented Jul 2, 2025

Why I did it

MS ADO: 33377547

If we have a slow writer to a cert file that is monitored such that they truncate the file, and write bytes at a time, we will see excessive amount of WRITE events such that our server will try to reload on each one. With CloseWrite supported, once a writer has finished writing to the file and closed the file we will then try the reload. We also add logic to distinguish between CREATE and IN_MOVED_TO events such that we will reload on IN_MOVED_TO but not CREATE events.

How I did it

  1. Added a patch for fsnotify to support CLOSE_WRITE events and distinguish between CREATE and IN_MOVED_TO events
  2. Change telemetry server to monitor for CLOSE_WRITE instead of just WRITE events and monitor IN_MOVED_TO instead of CREATE events.

Patched is based off fsnotify/fsnotify#252

How to verify it

Manual test, UT

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@zbud-msft zbud-msft requested a review from Copilot July 2, 2025 22:44
@mssonicbld
Copy link
Copy Markdown
Contributor

/azp run

@zbud-msft zbud-msft marked this pull request as ready for review July 2, 2025 22:44
@zbud-msft zbud-msft requested a review from hdwhdw July 2, 2025 22:44
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

This comment was marked as outdated.

@mssonicbld
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@zbud-msft zbud-msft requested a review from Copilot July 2, 2025 22:50

This comment was marked as outdated.

@zbud-msft zbud-msft requested review from make1980 and qiluo-msft July 2, 2025 23:09
@mssonicbld
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@zbud-msft zbud-msft requested a review from Copilot July 2, 2025 23:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the CloseWrite fsnotify event so that the telemetry server only reloads certificates once the writer has finished, and includes a test for slow writes to verify the new behavior.

  • Patch fsnotify to expose a CloseWrite event and update telemetry.go to listen for it instead of raw Write events.
  • Introduce TestINotifyCertMonitoringSlowWrites in telemetry_test.go with helper functions to simulate a slow writer.
  • Update Makefile to apply the new fsnotify patch during the build.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
telemetry/telemetry.go Switch from fsnotify.Write to fsnotify.CloseWrite events
telemetry/telemetry_test.go Add copyFile, writeSlowKey, and TestINotifyCertMonitoringSlowWrites
patches/0004-CloseWrite-event-support.patch Enable CloseWrite support in vendored fsnotify
Makefile Apply the CloseWrite patch
Comments suppressed due to low confidence (1)

telemetry/telemetry_test.go:21

  • The test uses time.Sleep but the "time" package is not imported; please add "time" to the imports to ensure the code compiles.
	"io"

hdwhdw
hdwhdw previously approved these changes Jul 3, 2025
filepath.Ext(event.Name) == ".key") {
log.V(1).Infof("Inotify watcher has received event: %v", event)
if event.Op&fsnotify.Write == fsnotify.Write || event.Op&fsnotify.Create == fsnotify.Create {
if event.Op&fsnotify.CloseWrite == fsnotify.CloseWrite || event.Op&fsnotify.Create == fsnotify.Create {
Copy link
Copy Markdown

@make1980 make1980 Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event.Op&fsnotify.Create == fsnotify.Create

if we trigger a reload when create signal is received wouldn't we have a race condition where the file is still being rewritten and we start reloading? it seems that when CREATE is received there will be a CLOSE_WRITE event later on as well so we may just wait for CLOSE_WRITE?

also if the file is moved from another location to the location that we're monitoring we should check IN_MOVED_TO event as well, right? #Closed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There is a potential timing issue where if the cert file is created and then written to that we could potentially miss the CLOSE_WRITE event thus reloading with the non-written or partial-written cert/key. I think the one way we can fix this is with a debounce timer such that we check for incoming events in the next couple of seconds and we use the latest event as the event to act on. We can't just use CLOSE_WRITE as the only option as there are certain operations where only CREATE is triggered but not CLOSE_WRITE such as when you execute "mv" from another location to the monitored directory.

  2. CREATE/CLOSE_WRITE are triggered when we move /copy/write to the cert files in the monitored directory.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for mv it's actually rename() syscall invoked which triggers IN_MOVED_TO event:

kema2@KEMA2-SL1:/mnt/c/Users/kema2$ strace mv temp1.csv.bak temp1.csv
renameat(AT_FDCWD, "temp1.csv.bak", AT_FDCWD, "temp1.csv") = 0

https://man7.org/linux/man-pages/man7/inotify.7.html

Suppose an application is watching the directories dir1 and dir2,
and the file dir1/myfile. The following examples show some events
that may be generated.

       link("dir1/myfile", "dir2/new");
              Generates an IN_ATTRIB event for myfile and an
              IN_CREATE event for dir2.

       rename("dir1/myfile", "dir2/myfile");
              Generates an IN_MOVED_FROM event for dir1, an
              IN_MOVED_TO event for dir2, and an IN_MOVE_SELF event
              for myfile.  The IN_MOVED_FROM and IN_MOVED_TO events
              will have the same cookie value.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp should trigger CREATE and CLOSE_WRITE IIUC.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I don't think we'll ever miss some notification as long as we consume the event as fast as possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, fsnotify however uses "CREATE" for both CREATE and IN_MOVED_TO calls: https://github.com/fsnotify/fsnotify/blob/c2828203cd70a50dcccfb2761f8b1f8ceef9a8e9/inotify.go#L321

if mask&unix.IN_CREATE == unix.IN_CREATE || mask&unix.IN_MOVED_TO == unix.IN_MOVED_TO {
	e.Op |= Create
}

Which explains why when I move a file to the monitored directory we still only just see the create event,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code to distinguish between between CREATE and IN_MOVED_TO events such that we support reloading on IN_MOVED_TO events and CLOSE_WRITE events.

Added appropriate UT as wel..

@mssonicbld
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@zbud-msft
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Contributor

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

@make1980 make1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mssonicbld
Copy link
Copy Markdown
Contributor

Cherry-pick PR to 202505: #441

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants