Skip to content

Multithreaded Simple IPC mechanism#281

Closed
jeffhostetler wants to merge 788 commits intomicrosoft:vfs-2.27.0from
jeffhostetler:simple-ipc
Closed

Multithreaded Simple IPC mechanism#281
jeffhostetler wants to merge 788 commits intomicrosoft:vfs-2.27.0from
jeffhostetler:simple-ipc

Conversation

@jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Jul 9, 2020

This topic branch introduces a multi-threaded IPC called "Simple IPC".
It is intended to be a base for future Git-aware FSMonitor.

This version replaces the original version described in #277

There are a few TODO in this draft:
[ ] Finish documentation
[ ] Resolve the TODOs listed in the code.
[ ] Recover and repurpose some of commit message text @dscho had in the original version.
[ ] Include proper sign-offs and give proper attribution for this merged effort.
[ ] Include some of the commit message suggestions from #277
[ ] The FSMonitor layer appears to be sending a series of NUL terminated lines (raw pathnames rather than quoted).
This potentially conflicts with the documentation for the IPC API which describes the entire request or response as
a string.

[x] I've seen instances where (using t/helper/test-simple-ipc) the server will fail with a SIGPIPE on Linux/Mac. This should not happen since all of my writes are guarded with a thread-local sigmask. It only happens under extreme stress (like 100s or 1000s of concurrent connections or many connections and very long message bodies). --- I have a fixup commit in my FSMonitor branch that block SIGPIPE for the life of both the accept- and worker-threads (rather than turning it on and off around each pkt-line write). It appears that under extremely heaver loads a delayed SIGPIPE could be sent after the write() or close() returned (and we had re-enabled the signal). So I'm using a hammer here and just running the threads with it turned off.

@jeffhostetler
Copy link
Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines failed to run 1 pipeline(s).

@jeffhostetler
Copy link
Author

/azp run Microsoft.git

@azure-pipelines
Copy link

Azure Pipelines failed to run 1 pipeline(s).

bk2204 and others added 22 commits July 30, 2020 09:16
Now that we have a complete SHA-256 implementation in Git, let's enable
it so people can use it.  Remove the ENABLE_SHA256 define constant
everywhere it's used.  Add tests for initializing a repository with
SHA-256.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some tests, we have data files which are written with a particular
hash algorithm. Instead of keeping two copies of the test files, we can
keep one, and translate the value on the fly.

In order to do so, we'll need to read both the source algorithm and the
current algorithm, so add an optional flag to the test_oid helper that
lets us look up a value for a specified hash algorithm. This should
not cause any conflicts with existing tests, since key arguments to
test_oid are allowed to contains only shell identifier characters.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow developers to run the testsuite with a different algorithm than
the default, provide an environment variable, GIT_TEST_DEFAULT_HASH, to
specify the algorithm to use. Compute the fixed constants using
test_oid. Move the constant initialization down below the point where
test-lib-functions.sh is loaded so the functions are defined.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the SHA1 prerequisite depends on the output of git
hash-object.  However, in order for that to produce sane behavior, we
must be in a repository.  If we are not, the default will remain SHA-1,
and we'll produce wrong results if we're using SHA-256 for the testsuite
but the test assertion starts when we're not in a repository.

Check the environment variable we use for this purpose, leaving it to
default to SHA-1 if none is specified.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have Git supporting SHA-256, we'd like to make sure that we
don't regress that state.  Unfortunately, it's easy to do so, so to
help, let's add code to run one of our CI jobs with SHA-256 as the
default hash.  This will help us detect any problems that may occur.

We pick the linux-clang job because it's relatively fast and the
linux-gcc job already runs the testsuite twice.  We want our tests to
run as fast as possible, so we wouldn't want to add a third run to the
linux-gcc job.  To make sure we properly exercise the code, let's run
the tests in the default mode (SHA-1) first and then run a second time
with SHA-256.  We explicitly specify SHA-1 for the first run so that if
we change the default in the future, we make sure to test both cases.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the extensions.objectFormat config setting.  Warn users not to
modify it themselves.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we call test_oid_init in the setup for all test scripts,
there's no point in calling it individually.  Remove all of the places
where we've done so to help keep tests tidy.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 489947c, which
stopped treating merges into the 'master' branch as special when
preparing the default merge message.  As the goal was not to have
any single branch designated as special, it solved it by leaving the
"into <branchname>" at the end of the title of the default merge
message for any and all branches.  An obvious and easy alternative
to treat everybody equally could have been to remove it for every
branch, but that involves loss of information.

We'll introduce a new mechanism to let end-users specify merges into
which branches would omit the "into <branchname>" from the title of
the default merge message, and make the mechanism, when unconfigured,
treat the traditional 'master' special again, so all the changes to
the tests we made earlier will become unnecessary, as these tests
will be run without configuring the said new mechanism.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git 2.28, we stopped special casing 'master' when producing the
default merge message by just removing the code to squelch "into
'master'" at the end of the message.

Introduce multi-valued merge.suppressDest configuration variable
that gives a set of globs to match against the name of the branch
into which the merge is being made, to let users specify for which
branch fmt-merge-msg's output should be shortened.  When it is not
set, 'master' is used as the sole value of the variable by default.

The above move mostly reverts the pre-2.28 default in repositories
that have no relevant configuration.

Add a few tests to protect the behaviour with the new configuration
variable from future regression.

Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changed-path Bloom filter is improved using ideas from an
independent implementation.

* sg/commit-graph-cleanups:
  commit-graph: simplify write_commit_graph_file() #2
  commit-graph: simplify write_commit_graph_file() #1
  commit-graph: simplify parse_commit_graph() #2
  commit-graph: simplify parse_commit_graph() #1
  commit-graph: clean up #includes
  diff.h: drop diff_tree_oid() & friends' return value
  commit-slab: add a function to deep free entries on the slab
  commit-graph-format.txt: all multi-byte numbers are in network byte order
  commit-graph: fix parsing the Chunk Lookup table
  tree-walk.c: don't match submodule entries for 'submod/anything'
Updates to the changed-paths bloom filter.

* ds/commit-graph-bloom-updates:
  commit-graph: check all leading directories in changed path Bloom filters
  revision: empty pathspecs should not use Bloom filters
  revision.c: fix whitespace
  commit-graph: check chunk sizes after writing
  commit-graph: simplify chunk writes into loop
  commit-graph: unify the signatures of all write_graph_chunk_*() functions
  commit-graph: persist existence of changed-paths
  bloom: fix logic in get_bloom_filter()
  commit-graph: change test to die on parse, not load
  commit-graph: place bloom_settings in context
The test framework has been updated so that most tests will run
with predictable (artificial) timestamps.

* jk/tests-timestamp-fix:
  t9100: stop depending on commit timestamps
  test-lib: set deterministic default author/committer date
  t9100: explicitly unset GIT_COMMITTER_DATE
  t5539: make timestamp requirements more explicit
  t9700: loosen ident timezone regex
  t6000: use test_tick consistently
"git help log" has been enhanced by sharing more material from the
documentation for the underlying "git rev-list" command.

* pb/log-rev-list-doc:
  git-log.txt: include rev-list-description.txt
  git-rev-list.txt: move description to separate file
  git-rev-list.txt: tweak wording in set operations
  git-rev-list.txt: fix Asciidoc syntax
  revisions.txt: describe 'rev1 rev2 ...' meaning for ranges
  git-log.txt: add links to 'rev-list' and 'diff' docs
"git clone --separate-git-dir=$elsewhere" used to stomp on the
contents of the existing directory $elsewhere, which has been
taught to fail when $elsewhere is not an empty directory.

* bw/fail-cloning-into-non-empty:
  git clone: don't clone into non-empty directory
Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".

* hn/reftable:
  reflog: cleanse messages in the refs.c layer
  bisect: treat BISECT_HEAD as a pseudo ref
  t3432: use git-reflog to inspect the reflog for HEAD
  lib-t6000.sh: write tag using git-update-ref
With the base fix to 2.27 regresion, any new extensions in a v0
repository would still be silently honored, which is not quite
right.  Instead, complain and die loudly.

* jk/reject-newer-extensions-in-v0:
  verify_repository_format(): complain about new extensions in v0 repo
Dev support to limit the use of test_must_fail to only git commands.

* dl/test-must-fail-fixes-6:
  test-lib-functions: restrict test_must_fail usage
  t9400: don't use test_must_fail with cvs
  t9834: remove use of `test_might_fail p4`
  t7107: don't use test_must_fail()
  t5324: reorder `run_with_limited_open_files test_might_fail`
  t3701: stop using `env` in force_color()
Fetching from a lazily cloned repository resulted at the server
side in attempts to lazy fetch objects that the client side has,
many of which will not be available from the third-party anyway.

* jt/avoid-lazy-fetching-upon-have-check:
  upload-pack: do not lazy-fetch "have" objects
Fix to an ancient bug caused by an over-eager attempt for
optimization.

* rs/add-index-entry-optim-fix:
  read-cache: remove bogus shortcut
"git for-each-ref --format=<>" learned %(contents:size).

* cc/pretty-contents-size:
  ref-filter: add support for %(contents:size)
  t6300: test refs pointing to tree and blob
  Documentation: clarify %(contents:XXXX) doc
Pushing a ref whose name contains non-ASCII character with the
"--force-with-lease" option did not work over smart HTTP protocol,
which has been corrected.

* bc/push-cas-cquoted-refname:
  remote-curl: make --force-with-lease work with non-ASCII ref names
"git mv src dst", when src is an unmerged path, errored out
correctly but with an incorrect error message to claim that src is
not tracked, which has been clarified.

* ct/mv-unmerged-path-error:
  git-mv: improve error message for conflicted file
gitster and others added 7 commits August 24, 2020 14:54
Signed-off-by: Junio C Hamano <gitster@pobox.com>
API update.

* en/mem-pool:
  mem-pool: use consistent pool variable name
  mem-pool: use more standard initialization and finalization
  mem-pool: add convenience functions for strdup and strndup
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
Command line completion (in contrib/) usually omits redundant,
deprecated and/or dangerous options from its output; it learned to
optionally include all of them.

* rz/complete-more-options:
  completion: add GIT_COMPLETION_SHOW_ALL env var
  parse-options: add --git-completion-helper-all
The FETCH_HEAD is now always read from the filesystem regardless of
the ref backend in use, as its format is much richer than the
normal refs, and written directly by "git fetch" as a plain file..

* hn/refs-fetch-head-is-special:
  refs: read FETCH_HEAD and MERGE_HEAD generically
  refs: move gitdir into base ref_store
  refs: fix comment about submodule ref_stores
  refs: split off reading loose ref data in separate function
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the missing "e" in "de".  While it is possible in French to omit it,
that only occurs with an apostrophe and only when the next word starts
with a vowel or mute h, which is not the case here.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
jeffhostetler and others added 6 commits August 31, 2020 17:27
Teach packet_write_gently() to use a stack buffer rather than a static
buffer when composing the packet line message.  This helps get us ready
for threaded operations.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
So far, the (possibly indirect) callers of `get_packet_data()` can ask
that function to return an error instead of `die()`ing upon end-of-file.

However, when we call this function in a long-running daemon, we
absolutely want the daemon to live, still, even if there was a read
error when one random read failed.

So let's introduce an explicit option to tell the packet reader
machinery to please be nice and only return an error.

This change prepares pkt-line for the internal fsmonitor (which is
precisely such a daemon).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…uf()

This function currently has only one caller: `apply_multi_file_filter()`
in `convert.c`. That caller wants a flush packet to be written after
writing the payload.

However, we are about to introduce a user that wants to write many
packets before a final flush packet, so let's extend this function to
prepare for that scenario.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `read_packetized_to_strbuf()` function reads packets into a strbuf
until a flush packet has been received. So far, it has only one caller:
`apply_multi_file_filter()` in `convert.c`. This caller really only
needs the `PACKET_READ_GENTLE_ON_EOF` option to be passed to
`packet_read()` (which makes sense in the scenario where packets should
be read until a flush packet is received).

We are about to introduce a caller that wants to pass other options
through to `packet_read()`, so let's extend the function signature
accordingly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Create a gentle version of `unix_stream_listen()`.  This version does
not call `die()` if a socket-fd cannot be created and does not assume
that it is safe to `unlink()` an existing socket-inode.

`unix_stream_listen()` uses `unix_stream_socket()` helper function to
create the socket-fd.  Avoid that helper because it calls `die()` on
errors.

`unix_stream_listen()` always tries to `unlink()` the socket-path before
calling `bind()`.  If there is an existing server/daemon already bound
and listening on that socket-path, our `unlink()` would have the effect
of disassociating the existing server's bound-socket-fd from the socket-path
without notifying the existing server.  The existing server could continue
to service existing connections (accepted-socket-fd's), but would not
receive any futher new connections (since clients rendezvous via the
socket-path).  The existing server would effectively be offline but yet
appear to be active.

Furthermore, `unix_stream_listen()` creates an opportunity for a brief
race condition for connecting clients if they try to connect in the
interval between the forced `unlink()` and the subsequent `bind()` (which
recreates the socket-path that is bound to a new socket-fd in the current
process).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Optionally prevent `unix_stream_listen_gently()` from calling `chdir()`.

Calls to `chdir()` are dangerous in a multi-threaded context.  If
`unix_stream_listen()` is given a socket pathname that is too big to
fit in a `sockaddr_un` structure, it will `chdir()` to the parent
directory of the requested socket pathname, create the socket using a
relative pathname, and then `chdir()` back.  This is not thread-safe.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@jeffhostetler jeffhostetler force-pushed the simple-ipc branch 2 times, most recently from 7ef30c3 to df53599 Compare August 31, 2020 21:55
Brief design documentation for new IPC mechanism allowing
foreground Git client to talk with an existing daemon process
at a known location using a named pipe or unix domain socket.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Define client-side prototypes for simple-ipc API.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Define server-side prototypes for simple-ipc API.  This is a simplified
and synchronous interface within the server.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Define server-side prototypes for simple-ipc API.  This extends the API
with an asynchronous interface.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@jeffhostetler
Copy link
Author

I'm going to close this in favor of gitgitgadget#717
That version is based upon upstream master rather than a vfs branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.