Skip to content

FSMonitor#273

Closed
dscho wants to merge 49 commits intofeatures/fsmonitor-2.27.0from
fsmonitor-vfs
Closed

FSMonitor#273
dscho wants to merge 49 commits intofeatures/fsmonitor-2.27.0from
fsmonitor-vfs

Conversation

@dscho
Copy link
Member

@dscho dscho commented Jun 1, 2020

This PR supersedes #256, and is rebased on top of vfs-2.27.0.

On purpose, it merges the FSMonitor branch thicket that is based on Git v2.27.0, to maintain easy "upstreamability".

@derrickstolee derrickstolee changed the base branch from vfs-2.27.0 to features/fsmonitor-2.27.0 June 1, 2020 20:23
dscho and others added 6 commits June 4, 2020 16:24
In 897d68e (Makefile: use curl-config --cflags, 2020-03-26), we
taught the build process to use `curl-config --cflags` to make sure that
it can find cURL's headers.

In the MSVC build, this is completely bogus because we're running in a
Git for Windows SDK whose `curl-config` supports the _GCC_ build.

Let's just ignore each and every `-I<path>` option where `<path>` points
to GCC/Clang specific headers.

Reported by Jeff Hostetler in
#275.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach packet_write_gently() to use a stack buffer rather than a static
buffer when composing the packet line message.  This helps get us ready
for threaded operations.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
So far, the (possibly indirect) callers of `get_packet_data()` can ask
that function to return an error instead of `die()`ing upon end-of-file.

However, when we call this function in a long-running daemon, we
absolutely want the daemon to live, still, even if there was a read
error when one random read failed.

So let's introduce an explicit option to tell the packet reader
machinery to please be nice and only return an error.

This change prepares pkt-line for the internal fsmonitor (which is
precisely such a daemon).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…uf()

This function currently has only one caller: `apply_multi_file_filter()`
in `convert.c`. That caller wants a flush packet to be written after
writing the payload.

However, we are about to introduce a user that wants to write many
packets before a final flush packet, so let's extend this function to
prepare for that scenario.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `read_packetized_to_strbuf()` function reads packets into a strbuf
until a flush packet has been received. So far, it has only one caller:
`apply_multi_file_filter()` in `convert.c`. This caller really only
needs the `PACKET_READ_GENTLE_ON_EOF` option to be passed to
`packet_read()` (which makes sense in the scenario where packets should
be read until a flush packet is received).

We are about to introduce a caller that wants to pass other options
through to `packet_read()`, so let's extend the function signature
accordingly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@sluongng
Copy link

Few questions:

  1. Do you guys have any intention to upstream this to Git Core?

  2. I see that aside from Window and MacOS, there is also Linux support? Does that mean this patch works on Linux? This concern arose from this being a fork from git-for-windows/git.

@dscho
Copy link
Member Author

dscho commented Jun 15, 2020

  • Do you guys have any intention to upstream this to Git Core?

Yes.

See #277 for the first part. We're still working on it.

  • I see that aside from Window and MacOS, there is also Linux support? Does that mean this patch works on Linux? This concern arose from this being a fork from git-for-windows/git.

Let me clarify one big misunderstanding: Git for Windows' main branch compiles on Windows, macOS and Linux, and passes the test suite. This has always been our aim. Almost all of the patches are meant for upstream (with the obvious exceptions as the changed README.md which reflects the nature of the fork).

So yes, there is experimental Linux support (with one known bug: when auto-spawning the daemon, the process is sometimes daemonize()d too fast, so that the caller cannot connect to the just-spawned daemon. This will need to be fixed somehow, but we're tackling other problems first.

@derrickstolee
Copy link

  • I see that aside from Window and MacOS, there is also Linux support? Does that mean this patch works on Linux? This concern arose from this being a fork from git-for-windows/git.

Let me clarify one big misunderstanding: Git for Windows' main branch compiles on Windows, macOS and Linux, and passes the test suite. This has always been our aim. Almost all of the patches are meant for upstream (with the obvious exceptions as the changed README.md which reflects the nature of the fork).

We have also started releasing .deb packages at https://github.com/microsoft/git/releases

dscho and others added 17 commits June 29, 2020 21:58
For the upcoming `fsmonitor--daemon` command, we need some way to send a
command from some Git process to that daemon, and to receive an answer.
In other words, we need to support Inter-Process Communication.

By nature, Inter-Process Communication requires OS-specific
implementations.

Instead of using full-blown TCP/UDP socket communication (which comes
with all kinds of challenges such as: restricting access permissions,
collisions when multiple instances might want to listen on the same
port, etc), we use named pipes on Windows, and plan on implementing a
backend based on Unix sockets for Linux/Unix later.

For now, we use a super-simple protocol: identified by a path into
.git/, the daemon listens for clients, reads one command, then sends the
answer and closes the connection.

To avoid redundant coding patterns, we use the `pkt-line` protocol that
we just libified to be able to use here: we really, really do not want
our simple IPC client to take down the entire Git process with it when
it could not talk to `fsmonitor--daemon`, but would rather instead fall
back to a full scan of all tracked files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We previously implemented simple Inter-Process Communication (IPC) on
Windows using named pipes. Create an equivalent implementation for
Unix-based platforms using Unix sockets.

Note that there are a variety of differences between Win32 Named Pipes
and Unix sockets, for example:

- a Win32 Named Pipe vanishes once the listener handle is closed. Unix
  sockets do not just disappear, even if the process that created them
  exited.

- Win32 Named Pipes exist in a name space parallel to the actual file
  system. In contrast, Unix sockets live in the same name space, i.e. a
  Unix socket cannot have the same path as an existing file and vice
  versa.

- Win32 Named Pipes have two communication modes: "message" and "byte".
  The former allows senders to define the size of messages to be sent;
  the receiving end will not receive partial messages. Unix sockets only
  operate in the mode equivalent to "byte".

- When a listener accepts a client via a Win32 Named Pipe, subsequent
  communication will use the same handle to the same pipe, i.e. it can
  become a bit tricky to keep the communication to several concurrent
  clients apart from one another. When a listener on a Unix socket
  accepts a new client, it gets handed a new file descriptor specific to
  that client.

The tests in `t0052-simple-ipc.sh` will automatically start running on
Unix platforms with this change due to the response from `test-tool
simple-ipc SUPPORTS_SIMPLE_IPC` changing automatically with the change
here in `simple-ipc.h`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This adds a very rudimentary test, just to make sure that the simple
Inter-Process Communication (where supported) works as intended (and
maybe also serving as an example how the command listener is supposed to
be implemented).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
prevent multiple listeners associated with the same path

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
TODO details here

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Just _starting_ to fix up the Unix Sockets part to match the new
internal simple IPC API.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch implements a framework for a simple Inter-Process
Communication API, using Unix sockets on Linux/Unix/macOS, and Named
Pipes on Windows.

This prepares for the upcoming patch series to add a built-in, Git-aware
FSMonitor.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The idea is to provide an efficient default implementation of the
`fsmonitor` hook for the major platforms, starting on the platform that
benefits from it most: Windows (because there is no native `stat()`, it
has to be emulated, which means that refreshing the Git index is a
rather hefty operation on that platform).

Originally, the intention of the `fsmonitor` extension was quite
obviously to use Facebook's `watchman`. However, it turned out that it
is just too hard to make it work reliably on Windows: none of the core
`watchman` developers has access to a Windows machine, and it is even
non-trivial to make the project build on Windows (which is not helped by
the hierarchy of build dependencies).

Besides: when it comes to Git's `fsmonitor` hook, `watchman` does both
too much (it allows pretty complicated queries, which we do not need in
the `fsmonitor` hook) and too little (`watchman` is totally unaware of
Git's internals, e.g. cannot use the `untracked` extension to initialize
its state efficiently upon start-up, therefore the first `git status` is
still slow, even if `watchman` is used via the `fsmonitor`).

Given all these considerations, it makes sense to "roll our own".

This commit merely adds the command without any functionality; Over the
course of the next commits we will implement it in bits and pieces.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch adds a scaffolding for an internal Git-aware FSMonitor, in
preparation for adding platform-specific backends.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
By nature, the backends for the fsmonitor daemon need to be
platform-specific: there is no common API (not even a POSIX one) for
monitoring directories for changes.

This patch adds the first such backend: a Windows one, based on the
Win32 API function `ReadDirectoryChangesW()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For the moment, there are only two more command modes (in addition to
the already-existing `--is-supported` one): `--run`. one mode to start
the daemon in the foreground, and another one to query a running daemon.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is quite useful to have a way to check whether there is an
already-running FSMonitor daemon; This functionality will be used in one
of the next commits to auto-start one when querying via the FSMonitor
hook.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
TODO: test!

It is quite common that the internal fsmonitor receives multiple events
for the same path. There is no need for the fsmonitor to list the path
multiple times, though, as Git just wants to verify via an `lstat()`
whether the file changed or not.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho and others added 25 commits June 29, 2020 22:09
TODO: add a test case

The `core.fsmonitor` setting is supposed to be set to a path pointing to
a script or executable that queries an fsmonitor such as watchman.

We are about to implement our own backend, and do not want to spawn an
extra process just to query it, but want to use Inter-Process
Communication to talk to the daemon (and start it if necessary).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Each worktree has a `.git` in its top-level directory, whether it is a
directory or a file. Once that directory or file is gone, it stops being
a Git worktree. So it is pointless to keep listening in that case.

This also helps the situation on Windows when a user wants to remove a
Git worktree altogether: a running daemon would prevent the deletion of
the worktree because its the latter is the former's working directory
(plus, it has to keep an open handle to that directory for that
`ReadDirectoryChangesW()` function to do its job).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…events

TODO: seq needs to be incremented under lock; see whether the code can
be simplified

Because file system events could come in before the query but not
processed yet, the code needs to write a cookie file and only return
the results of the query once the cookie file event has been seen so
that this will guarantee that the query will see all events that happened
before the query was sent.

With this change, we can finally guarantee that the clock ID that the
FSMonitor query reports is non-zero (previously, it could still be
initialized to zero because the first event might not yet have been
seen).

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We only need to test the simple IPC and the fsmonitor feature, and not
even everywhere: the two Windows builds, both macOS and one Linux build
are enough to build confidence.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…Events

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While it is quite possible to #include Apple's CoreServices.h when
compiling C source code with clang, trying to build it with GCC
currently fails with this error:

In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/Security.framework/Headers/AuthSession.h:32,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/Security.framework/Headers/Security.h:42,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/CoreServices.framework/Frameworks/OSServices.framework/Headers/CSIdentity.h:43,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/CoreServices.framework/Frameworks/OSServices.framework/Headers/OSServices.h:29,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Headers/IconsCore.h:23,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Headers/LaunchServices.h:23,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/CoreServices.framework/Headers/CoreServices.h:45,
 /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/System/Library/Frameworks/Security.framework/Headers/Authorization.h:193:7: error: variably modified 'bytes' at file scope
   193 | char bytes[kAuthorizationExternalFormLength];
       |      ^~~~~

The underlying reason is that GCC (rightfully) objects that an `enum`
value such as `kAuthorizationExternalFormLength` is not a constant
(because it is not, the preprocessor has no knowledge of it, only the
actual C compiler does) and can therefore not be used to define the size
of a C array.

This is a known problem and tracked in GCC's bug tracker:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93082

In the meantime, let's not block things and go the slightly ugly route
of declaring/defining the FSEvents constants, data structures and
functions that we need, so that we can avoid above-mentioned issue.

Let's do this _only_ for GCC, though, so that the CI/PR builds (which
build both with clang and with GCC) can guarantee that we _are_ using
the correct data types.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
TODO: we should be able to figure this out on the read-cache.c side
_without_ requiring a trailing `/`.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The previous commit is for debugging macOS issues only, and should only
be used if absolutely necessary because it is actually quite awkward to
abuse GitHub Actions' macOS agents for interactive debugging.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In this topic branch, we start implementing a built-in, Git-aware
FSMonitor.

Thus far, only a Windows-specific backend is implemented, and it will
have to be configured via setting `core.fsmonitor` to the full path of
the hard-linked `git-fsmonitor--daemon` executable in
`libexec/git-core/`; Both of these limitations will be lifted in later
patch series.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These patches are not strictly necessary for the internal FSMonitor to
work correctly, but improve its functionality and speed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Instead of setting `core.fsmonitor` to the absolute path of the
hard-linked built-in called `git-fsmonitor--daemon`, we teach Git to
accept the special value `:internal:`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In this topic branch, we add a macOS-specific backend to the internal
FSMonitor daemon.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In particular on macOS, our tests indicate that the events might arrive
with a noticable lag. To make sure that we consumed all events we
intended to receive, we now write a "cookie file", i.e. a temporary file
whose only purpose is to trigger a corresponding event; once the event
arrived, we know we're all caught up.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This adds a Linux-specific backend to the internal FSMonitor.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Pseudo topic branch, to keep track of the outstanding TODOs.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This causes merge conflicts with these changes:

- the RUNTIME_PREFIX is written to the GIT-BUILD-OPTIONS file to support
  testing the feature

- Git for Windows moved the config into the expected location; this
  change overlaps with the location in config.mak.uname where the
  path of the FSMONITOR_BACKEND is declared.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@jeffhostetler
Copy link

closing this in favor of #289.

@dscho dscho deleted the fsmonitor-vfs branch December 18, 2020 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants