Skip to content

[DO NOT MERGE] Rebase onto v2.29.0-rc2.windows.1#295

Closed
derrickstolee wants to merge 133 commits intovfs-2.29.0from
tentative/vfs-2.29.0
Closed

[DO NOT MERGE] Rebase onto v2.29.0-rc2.windows.1#295
derrickstolee wants to merge 133 commits intovfs-2.29.0from
tentative/vfs-2.29.0

Conversation

@derrickstolee
Copy link

@derrickstolee derrickstolee commented Oct 6, 2020

Here is the range-diff. Note that we took a lot of commits from upstream master quite a bit early for the maintenance builtin (and strvec), so I had to create a new merge commit that combined v2.28.0.windows.1, jk/strvec, ds/maintenance-part-1, and a few other topics in order to get a clean rebase.

Full range-diff from vfs-2.28.0:

  1:  c0be5644c8 =   1:  cde69e6ac4 reset --stdin: trim carriage return from the paths
  2:  7a8325d7fe !   2:  ecd93619a4 gvfs: start by adding the -gvfs suffix to the version
    @@ GIT-VERSION-GEN
      #!/bin/sh
      
      GVF=GIT-VERSION-FILE
    --DEF_VER=v2.28.0
    -+DEF_VER=v2.28.0.vfs.0.0
    +-DEF_VER=v2.29.0-rc2
    ++DEF_VER=v2.29.0.vfs.0.0
      
      LF='
      '
  3:  7880d330c7 =   3:  3bc2ce6ca7 gvfs: ensure that the version is based on a GVFS tag
  4:  7d31b66a28 =   4:  dd37042125 gvfs: add a GVFS-specific header file
  5:  54df4145f4 =   5:  f8993187bd gvfs: add the core.gvfs config setting
  6:  750ae83873 =   6:  13e59955a2 gvfs: add the feature to skip writing the index' SHA-1
  7:  50bc807083 =   7:  d0a1de0949 gvfs: add the feature that blobs may be missing
  8:  1658889db0 =   8:  d28d0b6721 gvfs: prevent files to be deleted outside the sparse checkout
  9:  12a83a4d1b !   9:  3d74f55ce6 gvfs: optionally skip reachability checks/upload pack during fetch
    @@ connected.c
      /*
       * If we feed all the commits we want to verify to this command
     @@ connected.c: int check_connected(oid_iterate_fn fn, void *cb_data,
    + 	struct transport *transport;
      	size_t base_len;
    - 	const unsigned hexsz = the_hash_algo->hexsz;
      
     +	/*
     +	 * Running a virtual file system there will be objects that are
    @@ gvfs.h
      static inline int gvfs_config_is_set(int mask) {
      	return (core_gvfs & mask) == mask;
     
    - ## t/t5582-vfs.sh (new) ##
    + ## t/t5583-vfs.sh (new) ##
     @@
     +#!/bin/sh
     +
 10:  33d43ceb88 =  10:  14427db28a gvfs: ensure all filters and EOL conversions are blocked
 11:  ae794d1bb8 !  11:  0b3a52c589 Add a new run_hook_argv() function
    @@ Metadata
     Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
     
      ## Commit message ##
    -    Add a new run_hook_argv() function
    +    Add a new run_hook_strvec() function
     
         The two existing members of the run_hook*() family, run_hook_ve() and
         run_hook_le(), are good for callers that know the precise number of
    -    parameters already. Let's introduce a new sibling that takes an argv
    -    array for callers that want to pass a variable number of parameters.
    +    parameters already. Let's introduce a new sibling that takes a strvec
    +    for callers that want to pass a variable number of parameters.
     
         Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     
    @@ run-command.c: const char *find_hook(const char *name)
      }
      
     -int run_hook_ve(const char *const *env, const char *name, va_list args)
    -+int run_hook_argv(const char *const *env, const char *name,
    -+		  const char **argv)
    ++int run_hook_strvec(const char *const *env, const char *name,
    ++		    struct strvec *argv)
      {
      	struct child_process hook = CHILD_PROCESS_INIT;
      	const char *p;
     @@ run-command.c: int run_hook_ve(const char *const *env, const char *name, va_list args)
      		return 0;
      
    - 	argv_array_push(&hook.args, p);
    + 	strvec_push(&hook.args, p);
     -	while ((p = va_arg(args, const char *)))
    --		argv_array_push(&hook.args, p);
    -+	argv_array_pushv(&hook.args, argv);
    +-		strvec_push(&hook.args, p);
    ++	strvec_pushv(&hook.args, argv->v);
      	hook.env = env;
      	hook.no_stdin = 1;
      	hook.stdout_to_stderr = 1;
    @@ run-command.c: int run_hook_ve(const char *const *env, const char *name, va_list
      
     +int run_hook_ve(const char *const *env, const char *name, va_list args)
     +{
    -+	struct argv_array argv = ARGV_ARRAY_INIT;
    ++	struct strvec argv = STRVEC_INIT;
     +	const char *p;
     +	int ret;
     +
     +	while ((p = va_arg(args, const char *)))
    -+		argv_array_push(&argv, p);
    ++		strvec_push(&argv, p);
     +
    -+	ret = run_hook_argv(env, name, argv.argv);
    -+	argv_array_clear(&argv);
    ++	ret = run_hook_strvec(env, name, &argv);
    ++	strvec_clear(&argv);
     +	return ret;
     +}
     +
    @@ run-command.h: const char *find_hook(const char *name);
      LAST_ARG_MUST_BE_NULL
      int run_hook_le(const char *const *env, const char *name, ...);
      int run_hook_ve(const char *const *env, const char *name, va_list args);
    -+int run_hook_argv(const char *const *env, const char *name, const char **argv);
    ++int run_hook_strvec(const char *const *env, const char *name,
    ++		    struct strvec *argv);
      
      /*
       * Trigger an auto-gc
 12:  e357bf9ece !  12:  a1d0128e90 gvfs: allow "virtualizing" objects
    @@ sha1-file.c: void disable_obj_read_lock(void)
      
     +static int run_read_object_hook(const struct object_id *oid)
     +{
    -+	struct argv_array args = ARGV_ARRAY_INIT;
    ++	struct strvec args = STRVEC_INIT;
     +	int ret;
     +	uint64_t start;
     +
     +	start = getnanotime();
    -+	argv_array_push(&args, oid_to_hex(oid));
    -+	ret = run_hook_argv(NULL, "read-object", args.argv);
    -+	argv_array_clear(&args);
    ++	strvec_push(&args, oid_to_hex(oid));
    ++	ret = run_hook_strvec(NULL, "read-object", &args);
    ++	strvec_clear(&args);
     +	trace_performance_since(start, "run_read_object_hook");
     +
     +	return ret;
 13:  13f412f613 !  13:  b07fa566fa Hydrate missing loose objects in check_and_freshen()
    @@ sha1-file.c: void prepare_alt_odb(struct repository *r)
     +	if (!p)
     +		return 1;
     +
    -+	argv_array_push(&hook.args, p);
    -+	argv_array_push(&hook.args, oid_to_hex(oid));
    ++	strvec_push(&hook.args, p);
    ++	strvec_push(&hook.args, oid_to_hex(oid));
     +	hook.env = NULL;
     +	hook.no_stdin = 1;
     +	hook.stdout_to_stderr = 1;
    @@ sha1-file.c: void disable_obj_read_lock(void)
      
     -static int run_read_object_hook(const struct object_id *oid)
     -{
    --	struct argv_array args = ARGV_ARRAY_INIT;
    +-	struct strvec args = STRVEC_INIT;
     -	int ret;
     -	uint64_t start;
     -
     -	start = getnanotime();
    --	argv_array_push(&args, oid_to_hex(oid));
    --	ret = run_hook_argv(NULL, "read-object", args.argv);
    --	argv_array_clear(&args);
    +-	strvec_push(&args, oid_to_hex(oid));
    +-	ret = run_hook_strvec(NULL, "read-object", &args);
    +-	strvec_clear(&args);
     -	trace_performance_since(start, "run_read_object_hook");
     -
     -	return ret;
 14:  01c01f4266 !  14:  9af634a595 Add support for read-object as a background process to retrieve missing objects
    @@ Commit message
         same sub-process module when implementing the read-object background
         process.
     
    +    The read-object hook feature was designed before the SHA-256 support was
    +    even close to be started. As a consequence, its protocol hard-codes the
    +    key `sha1`, even if we now also support SHA-256 object IDs.
    +
    +    Technically, this is wrong, and probably the best way forward would be
    +    to rename the key to `oid` (or `sha256`, but that is less future-proof).
    +
    +    However, there are existing setups out there, with existing read-object
    +    hooks that most likely have no idea what to do with `oid` requests. So
    +    let's leave the key as `sha1` for the time being, even if it will be
    +    technically incorrect in SHA-256 repositories.
    +
         Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
     
      ## Documentation/technical/read-object-protocol.txt (new) ##
    @@ sha1-file.c: void prepare_alt_odb(struct repository *r)
     +		}
     +	}
      
    --	argv_array_push(&hook.args, p);
    --	argv_array_push(&hook.args, oid_to_hex(oid));
    +-	strvec_push(&hook.args, p);
    +-	strvec_push(&hook.args, oid_to_hex(oid));
     -	hook.env = NULL;
     -	hook.no_stdin = 1;
     -	hook.stdout_to_stderr = 1;
    @@ t/t0410/read-object (new)
     +	my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
     +
     +	if ( $command eq "get" ) {
    -+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
    ++		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40,64})$/;
     +		packet_bin_read();
     +
     +		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
 15:  e0060c6baa =  15:  87e39d2dcc sha1_file: when writing objects, skip the read_object_hook
 16:  c59f1010ba !  16:  8f6a453b49 gvfs: add global command pre and post hook procs
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
      }
      
     +/* Runs pre/post-command hook */
    -+static struct argv_array sargv = ARGV_ARRAY_INIT;
    ++static struct strvec sargv = STRVEC_INIT;
     +static int run_post_hook = 0;
     +static int exit_code = -1;
     +
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +	setenv("COMMAND_HOOK_LOCK", "true", 1);
     +
     +	/* call the hook proc */
    -+	argv_array_pushv(&sargv, argv);
    -+	ret = run_hook_argv(NULL, "pre-command", sargv.argv);
    ++	strvec_pushv(&sargv, argv);
    ++	ret = run_hook_strvec(NULL, "pre-command", &sargv);
     +
     +	if (!ret)
     +		run_post_hook = 1;
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +	if (!lock || strcmp(lock, "true"))
     +		return 0;
     +
    -+	argv_array_pushf(&sargv, "--exit_code=%u", exit_code);
    -+	ret = run_hook_argv(NULL, "post-command", sargv.argv);
    ++	strvec_pushf(&sargv, "--exit_code=%u", exit_code);
    ++	ret = run_hook_strvec(NULL, "post-command", &sargv);
     +
     +	run_post_hook = 0;
    -+	argv_array_clear(&sargv);
    ++	strvec_clear(&sargv);
     +	setenv("COMMAND_HOOK_LOCK", "false", 1);
     +	return ret;
     +}
    @@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
      		return 0;
     @@ git.c: static void execv_dashed_external(const char **argv)
      	 */
    - 	trace_argv_printf(cmd.args.argv, "trace: exec:");
    + 	trace_argv_printf(cmd.args.v, "trace: exec:");
      
    -+	if (run_pre_command_hook(cmd.args.argv))
    ++	if (run_pre_command_hook(cmd.args.v))
     +		die("pre-command hook aborted command");
     +
      	/*
 17:  88f99deb29 =  17:  0c972e5a61 t0400: verify that the hook is called correctly from a subdirectory
 18:  1fc7cf205a !  18:  3d76205f7d Pass PID of git process to hooks.
    @@ git.c
     @@ git.c: static int run_pre_command_hook(const char **argv)
      
      	/* call the hook proc */
    - 	argv_array_pushv(&sargv, argv);
    -+	argv_array_pushf(&sargv, "--git-pid=%"PRIuMAX, (uintmax_t)getpid());
    - 	ret = run_hook_argv(NULL, "pre-command", sargv.argv);
    + 	strvec_pushv(&sargv, argv);
    ++	strvec_pushf(&sargv, "--git-pid=%"PRIuMAX, (uintmax_t)getpid());
    + 	ret = run_hook_strvec(NULL, "pre-command", &sargv);
      
      	if (!ret)
     
 19:  b6336d0c6d =  19:  d3bd1e7a75 pre-command: always respect core.hooksPath
 20:  6181cdfbf7 =  20:  844a3a1aec sparse-checkout: update files with a modify/delete conflict
 21:  3480fbd0b6 =  21:  d9d85e9ec9 sparse-checkout: avoid writing entries with the skip-worktree bit
 22:  3471176fef =  22:  f325fc411e Fix reset when using the sparse-checkout feature.
 23:  bd786bd14e =  23:  8d440670e5 Do not remove files outside the sparse-checkout
 24:  3ff93abafe =  24:  140b16db5c gvfs: refactor loading the core.gvfs config value
 25:  54a3307fc0 =  25:  944d732a94 send-pack: do not check for sha1 file when GVFS_MISSING_OK set
 26:  203fdb1863 =  26:  40c3aeeb58 cache-tree: remove use of strbuf_addf in update_one
 27:  3e8904de6e !  27:  e487e15e5c gvfs: block unsupported commands when running in a GVFS repo
    @@ Commit message
     
      ## builtin/gc.c ##
     @@
    - #include "blob.h"
    - #include "tree.h"
    - #include "promisor-remote.h"
    -+#include "gvfs.h"
    - 
    - #define FAILED_RUN "failed to run %s"
      
    + #include "builtin.h"
    + #include "repository.h"
    ++#include "gvfs.h"
    + #include "config.h"
    + #include "tempfile.h"
    + #include "lockfile.h"
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      	if (quiet)
    - 		argv_array_push(&repack, "-q");
    + 		strvec_push(&repack, "-q");
      
     +	if ((!auto_gc || (auto_gc && gc_auto_threshold > 0)) && gvfs_config_is_set(GVFS_BLOCK_COMMANDS))
     +		die(_("'git gc' is not supported on a GVFS repo"));
 28:  700ddef4a0 =  28:  aa21b54b07 gvfs: allow overriding core.gvfs
 29:  c887e49f89 =  29:  a5157c44af BRANCHES.md: Add explanation of branches and using forks
 30:  e000c61525 =  30:  dfacd6d956 vfs: disable `git update-git-for-windows`
 31:  966df06c31 =  31:  94735adb6d Add virtual file system settings and hook proc
 32:  1834f9bcc0 !  32:  841a3e549d Update the virtualfilesystem support
    @@ virtualfilesystem.c: static int vfs_hashmap_cmp(const void *unused_cmp_data,
     -	argv[1] = ver;
     -	argv[2] = NULL;
     -	cp.argv = argv;
    -+	argv_array_push(&cp.args, core_virtualfilesystem);
    -+	argv_array_pushf(&cp.args, "%d", HOOK_INTERFACE_VERSION);
    ++	strvec_push(&cp.args, core_virtualfilesystem);
    ++	strvec_pushf(&cp.args, "%d", HOOK_INTERFACE_VERSION);
      	cp.use_shell = 1;
     +	cp.dir = get_git_work_tree();
      
 33:  38a6649737 =  33:  8fd323ee56 virtualfilesystem: don't run the virtual file system hook if the index has been redirected
 34:  f589568321 =  34:  2d861c7493 virtualfilesystem: fix bug with symlinks being ignored
 35:  2e0326c00b =  35:  930174dab9 virtualfilesystem: check if directory is included
 36:  14ca8f9d9e =  36:  b75c0b3676 vfs: fix case where directories not handled correctly
 37:  f40d31c1b3 !  37:  e844478c96 backwards-compatibility: support the post-indexchanged hook
    @@ Commit message
         allow any `post-indexchanged` hook to run instead (if it exists).
     
      ## run-command.c ##
    -@@ run-command.c: int run_hook_argv(const char *const *env, const char *name,
    +@@ run-command.c: int run_hook_strvec(const char *const *env, const char *name,
      	const char *p;
      
      	p = find_hook(name);
 38:  166e7f1c61 =  38:  cee9328782 status: add status serialization mechanism
 39:  20f5f90e18 !  39:  6bd501315d Teach ahead-behind and serialized status to play nicely together
    @@ Commit message
         Teach ahead-behind and serialized status to play nicely together
     
      ## t/t7524-serialized-status.sh ##
    +@@ t/t7524-serialized-status.sh: test_expect_success 'setup' '
    + 	git commit -m"Adding original file." &&
    + 	mkdir untracked &&
    + 	touch ignored.ign ignored_dir/ignored_2.txt \
    +-	      untracked_1.txt untracked/untracked_2.txt untracked/untracked_3.txt
    ++	      untracked_1.txt untracked/untracked_2.txt untracked/untracked_3.txt &&
    ++
    ++	test_oid_cache <<-EOF
    ++	branch_oid sha1:68d4a437ea4c2de65800f48c053d4d543b55c410
    ++
    ++	branch_oid sha256:6b95e4b1ea911dad213f2020840f5e92d3066cf9e38cf35f79412ec58d409ce4
    ++	EOF
    + '
    + 
    + test_expect_success 'verify untracked-files=complete with no conversion' '
     @@ t/t7524-serialized-status.sh: test_expect_success 'verify serialized status handles path scopes' '
      	test_i18ncmp expect output
      '
      
     +test_expect_success 'verify no-ahead-behind and serialized status integration' '
     +	test_when_finished "rm serialized_status.dat new_change.txt output" &&
    -+	cat >expect <<-\EOF &&
    -+	# branch.oid 68d4a437ea4c2de65800f48c053d4d543b55c410
    ++	cat >expect <<-EOF &&
    ++	# branch.oid $(test_oid branch_oid)
     +	# branch.head alt_branch
     +	# branch.upstream master
     +	# branch.ab +1 -0
 40:  87314cc6f8 =  40:  f3c8a3550d status: serialize to path
 41:  6866f99320 !  41:  aa1997743a status: reject deserialize in V2 and conflicts
    @@ Commit message
         Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
     
      ## t/t7524-serialized-status.sh ##
    +@@ t/t7524-serialized-status.sh: test_expect_success 'setup' '
    + 
    + 	test_oid_cache <<-EOF
    + 	branch_oid sha1:68d4a437ea4c2de65800f48c053d4d543b55c410
    ++	x_base sha1:587be6b4c3f93f93c489c0111bba5596147a26cb
    ++	x_ours sha1:b68025345d5301abad4d9ec9166f455243a0d746
    ++	x_theirs sha1:975fbec8256d3e8a3797e7a3611380f27c49f4ac
    + 
    + 	branch_oid sha256:6b95e4b1ea911dad213f2020840f5e92d3066cf9e38cf35f79412ec58d409ce4
    ++	x_base sha256:14f5162e2fe3d240d0d37aaab0f90e4af9a7cfa79639f3bab005b5bfb4174d9f
    ++	x_ours sha256:3a404ba030a4afa912155c476a48a253d4b3a43d0098431b6d6ca6e554bd78fb
    ++	x_theirs sha256:44dc634218adec09e34f37839b3840bad8c6103693e9216626b32d00e093fa35
    + 	EOF
    + '
    + 
     @@ t/t7524-serialized-status.sh: test_expect_success 'verify no-ahead-behind and serialized status integration' '
      '
      
    @@ t/t7524-serialized-status.sh: test_expect_success 'verify new --serialize=path m
     +	# in each format.
     +
     +	cat >expect.v2 <<EOF &&
    -+u UU N... 100644 100644 100644 100644 587be6b4c3f93f93c489c0111bba5596147a26cb b68025345d5301abad4d9ec9166f455243a0d746 975fbec8256d3e8a3797e7a3611380f27c49f4ac x.txt
    ++u UU N... 100644 100644 100644 100644 $(test_oid x_base) $(test_oid x_ours) $(test_oid x_theirs) x.txt
     +EOF
     +	git -C conflicts status --porcelain=v2 >observed.v2 &&
     +	test_cmp expect.v2 observed.v2 &&
    @@ t/t7524-serialized-status.sh: test_expect_success 'verify new --serialize=path m
     +	# the cached data when there is an unresolved conflict.
     +
     +	cat >expect.v2.dirty <<EOF &&
    -+u UU N... 100644 100644 100644 100644 587be6b4c3f93f93c489c0111bba5596147a26cb b68025345d5301abad4d9ec9166f455243a0d746 975fbec8256d3e8a3797e7a3611380f27c49f4ac x.txt
    ++u UU N... 100644 100644 100644 100644 $(test_oid x_base) $(test_oid x_ours) $(test_oid x_theirs) x.txt
     +? dirt.txt
     +EOF
     +	git -C conflicts status --porcelain=v2 --deserialize=../serialized >observed.v2 &&
 42:  98a0407658 =  42:  f9c0db0ffb status: fix rename reporting when using serialization cache
 43:  44ede18171 =  43:  9c16bf4778 serialize-status: serialize global and repo-local exclude file metadata
 44:  b81e5a337f =  44:  3f379c07f8 status: deserialization wait
 45:  22a012e4df =  45:  a349de4e3f merge-recursive: avoid confusing logic in was_dirty()
 46:  8dbe120b93 =  46:  f8383908a1 merge-recursive: add some defensive coding to was_dirty()
 47:  61afbcab83 =  47:  057b91fff2 merge-recursive: teach was_dirty() about the virtualfilesystem
 48:  0d53140261 =  48:  7037da147d status: deserialize with -uno does not print correct hint
 49:  f83d0d98a0 =  49:  d5e0e56edc wt-status-deserialize: fix crash when -v is used
 50:  ec32a0fa12 =  50:  557ae48950 fsmonitor: check CE_FSMONITOR_VALID in ce_uptodate
 51:  164e9274a0 =  51:  c88b9fea13 fsmonitor: add script for debugging and update script for tests
 52:  4fd9dbc588 =  52:  b570113bdd status: disable deserialize when verbose output requested.
 53:  e1e68bcef0 =  53:  d8b1781609 t7524: add test for verbose status deserialzation
 54:  245a021880 =  54:  b38fc4ac12 deserialize-status: silently fallback if we cannot read cache file
 55:  56ac38a9dc =  55:  97820235ea gvfs:trace2:data: add trace2 tracing around read_object_process
 56:  e96864276f =  56:  537eb13b30 gvfs:trace2:data: status deserialization information
 57:  09095c7686 =  57:  29c462a51e gvfs:trace2:data: status serialization
 58:  74ecc7ddcd =  58:  232479980b gvfs:trace2:data: add vfs stats
 59:  5013470618 =  59:  1dc0b00e0e trace2: refactor setting process starting time
 60:  e1cf5c4ee4 =  60:  c508c88448 trace2:gvfs:experiment: clear_ce_flags_1
 61:  fb358b8404 =  61:  6fcc645db0 trace2:gvfs:experiment: traverse_trees
 62:  11e6ea1965 =  62:  dd8ed01e2b trace2:gvfs:experiment: report_tracking
 63:  ea803b407a =  63:  5257cbcd05 trace2:gvfs:experiment: read_cache: annotate thread usage in read-cache
 64:  df14cc90a4 =  64:  78c29ac4c7 trace2:gvfs:experiment: read-cache: time read/write of cache-tree extension
 65:  e1804e1562 =  65:  dd84577252 trace2:gvfs:experiment: add prime_cache_tree region
 66:  f3467cc586 =  66:  2abb9da12d trace2:gvfs:experiment: add region to apply_virtualfilesystem()
 67:  677ffaddb9 =  67:  d70d9daba2 trace2:gvfs:experiment: add region around unpack_trees()
 68:  3f12b7cb76 =  68:  a0bc7ce015 trace2:gvfs:experiment: add region to cache_tree_fully_valid()
 69:  d3e6a92335 =  69:  21fad845af trace2:gvfs:experiment: add unpack_entry() counter to unpack_trees() and report_tracking()
 70:  c543625b1d =  70:  eb4525d03f trace2:gvfs:experiment: increase default event depth for unpack-tree data
 71:  977e91b925 =  71:  f5a7e46a51 trace2:gvfs:experiment: add data for check_updates() in unpack_trees()
 72:  eb71066ac9 =  72:  80713221a2 Trace2:gvfs:experiment: capture more 'tracking' details
 73:  9a65d9e549 =  73:  702e7026ac credential: set trace2_child_class for credential manager children
 74:  4e5e86e3fd !  74:  3f732a50c1 sub-process: do not borrow cmd pointer from caller
    @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
      	process->clean_on_exit_handler = subprocess_exit_handler;
      	process->trace2_child_class = "subprocess";
      
    -+	entry->cmd = process->args.argv[0];
    ++	entry->cmd = process->args.v[0];
     +
      	err = start_command(process);
      	if (err) {
 75:  44b800a727 !  75:  f9b396e1fc sub-process: add subprocess_start_argv()
    @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
      	return 0;
      }
      
    -+int subprocess_start_argv(struct hashmap *hashmap,
    ++int subprocess_start_strvec(struct hashmap *hashmap,
     +			  struct subprocess_entry *entry,
     +			  int is_git_cmd,
    -+			  const struct argv_array *argv,
    ++			  const struct strvec *argv,
     +			  subprocess_start_fn startfn)
     +{
     +	int err;
    @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
     +	process = &entry->process;
     +
     +	child_process_init(process);
    -+	for (k = 0; k < argv->argc; k++)
    -+		argv_array_push(&process->args, argv->argv[k]);
    ++	for (k = 0; k < argv->nr; k++)
    ++		strvec_push(&process->args, argv->v[k]);
     +	process->use_shell = 1;
     +	process->in = -1;
     +	process->out = -1;
    @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
     +	process->clean_on_exit_handler = subprocess_exit_handler;
     +	process->trace2_child_class = "subprocess";
     +
    -+	sq_quote_argv_pretty(&quoted, argv->argv);
    ++	sq_quote_argv_pretty(&quoted, argv->v);
     +	entry->cmd = strbuf_detach(&quoted, 0);
     +
     +	err = start_command(process);
    @@ sub-process.h: typedef int(*subprocess_start_fn)(struct subprocess_entry *entry)
      int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
      		subprocess_start_fn startfn);
      
    -+int subprocess_start_argv(struct hashmap *hashmap,
    ++int subprocess_start_strvec(struct hashmap *hashmap,
     +			  struct subprocess_entry *entry,
     +			  int is_git_cmd,
    -+			  const struct argv_array *argv,
    ++			  const struct strvec *argv,
     +			  subprocess_start_fn startfn);
     +
      /* Kill a subprocess and remove it from the subprocess hashmap. */
 76:  e5dc71be6b =  76:  d0768bcb33 sha1-file: add function to update existing loose object cache
 77:  d75a48266f =  77:  c43a255850 packfile: add install_packed_git_and_mru()
 78:  e494b96af3 =  78:  96c21ab8f6 index-pack: avoid immediate object fetch while parsing packfile
 79:  5030214b6a !  79:  69d212fd9a gvfs-helper: create tool to fetch objects using the GVFS Protocol
    @@ config.c: int git_default_config(const char *var, const char *value, void *cb)
      	return 0;
      }
     
    + ## contrib/buildsystems/CMakeLists.txt ##
    +@@ contrib/buildsystems/CMakeLists.txt: if(NOT CURL_FOUND)
    + 	add_compile_definitions(NO_CURL)
    + 	message(WARNING "git-http-push and git-http-fetch will not be built")
    + else()
    +-	list(APPEND PROGRAMS_BUILT git-http-fetch git-http-push git-imap-send git-remote-http)
    ++	list(APPEND PROGRAMS_BUILT git-http-fetch git-http-push git-imap-send git-remote-http git-gvfs-helper)
    + 	if(CURL_VERSION_STRING VERSION_GREATER_EQUAL 7.34.0)
    + 		add_compile_definitions(USE_CURL_FOR_IMAP_SEND)
    + 	endif()
    +@@ contrib/buildsystems/CMakeLists.txt: if(CURL_FOUND)
    + 		add_executable(git-http-push ${CMAKE_SOURCE_DIR}/http-push.c)
    + 		target_link_libraries(git-http-push http_obj common-main ${CURL_LIBRARIES} ${EXPAT_LIBRARIES})
    + 	endif()
    ++
    ++	add_executable(git-gvfs-helper ${CMAKE_SOURCE_DIR}/gvfs-helper.c)
    ++	target_link_libraries(git-gvfs-helper http_obj common-main ${CURL_LIBRARIES} )
    + endif()
    + 
    + set(git_builtin_extra
    +
      ## environment.c ##
     @@ environment.c: int protect_hfs = PROTECT_HFS_DEFAULT;
      #endif
    @@ environment.c: int protect_hfs = PROTECT_HFS_DEFAULT;
      ## gvfs-helper-client.c (new) ##
     @@
     +#include "cache.h"
    -+#include "argv-array.h"
    ++#include "strvec.h"
     +#include "trace2.h"
     +#include "oidset.h"
     +#include "object.h"
    @@ gvfs-helper-client.c (new)
     +/*
     + * We expect:
     + *
    -+ *    <odb> 
    ++ *    <odb>
     + *    <data>*
     + *    <status>
     + *    <flush>
    @@ gvfs-helper-client.c (new)
     +{
     +	struct gh_server__process *entry;
     +	struct child_process *process;
    -+	struct argv_array argv = ARGV_ARRAY_INIT;
    ++	struct strvec argv = STRVEC_INIT;
     +	struct strbuf quoted = STRBUF_INIT;
     +	int nr_loose = 0;
     +	int nr_packfile = 0;
    @@ gvfs-helper-client.c (new)
     +	/*
     +	 * TODO decide what defaults we want.
     +	 */
    -+	argv_array_push(&argv, "gvfs-helper");
    -+	argv_array_push(&argv, "--fallback");
    -+	argv_array_push(&argv, "--cache-server=trust");
    -+	argv_array_pushf(&argv, "--shared-cache=%s",
    ++	strvec_push(&argv, "gvfs-helper");
    ++	strvec_push(&argv, "--fallback");
    ++	strvec_push(&argv, "--cache-server=trust");
    ++	strvec_pushf(&argv, "--shared-cache=%s",
     +			 gh_client__chosen_odb->path);
    -+	argv_array_push(&argv, "server");
    ++	strvec_push(&argv, "server");
     +
    -+	sq_quote_argv_pretty(&quoted, argv.argv);
    ++	sq_quote_argv_pretty(&quoted, argv.v);
     +
     +	if (!gh_server__subprocess_map_initialized) {
     +		gh_server__subprocess_map_initialized = 1;
    @@ gvfs-helper-client.c (new)
     +		entry = xmalloc(sizeof(*entry));
     +		entry->supported_capabilities = 0;
     +
    -+		err = subprocess_start_argv(
    ++		err = subprocess_start_strvec(
     +			&gh_server__subprocess_map, &entry->subprocess, 1,
     +			&argv, gh_client__start_fn);
     +		if (err) {
    @@ gvfs-helper-client.c (new)
     +	}
     +
     +leave_region:
    -+	argv_array_clear(&argv);
    ++	strvec_clear(&argv);
     +	strbuf_release(&quoted);
     +
     +	trace2_data_intmax("gh-client", the_repository,
    @@ gvfs-helper.c (new)
     +#include "pkt-line.h"
     +#include "string-list.h"
     +#include "sideband.h"
    -+#include "argv-array.h"
    ++#include "strvec.h"
     +#include "credential.h"
     +#include "oid-array.h"
     +#include "send-pack.h"
    @@ gvfs-helper.c (new)
     +	 * If we pass zero for the total to the "struct progress" API, we
     +	 * get simple numbers rather than percentages.  So our progress
     +	 * output format may vary depending.
    -+	 *     
    ++	 *
     +	 * It is unclear if CURL will give us a final callback after
     +	 * everything is finished, so we leave the progress handle open
     +	 * and let the caller issue the final stop_progress().
    @@ gvfs-helper.c (new)
     +		goto cleanup;
     +	}
     +
    -+	argv_array_push(&ip.args, "index-pack");
    ++	strvec_push(&ip.args, "index-pack");
     +	if (gh__cmd_opts.show_progress)
    -+		argv_array_push(&ip.args, "-v");
    -+	argv_array_pushl(&ip.args, "-o", idx_name_tmp.buf, NULL);
    -+	argv_array_push(&ip.args, pack_name_tmp.buf);
    ++		strvec_push(&ip.args, "-v");
    ++	strvec_pushl(&ip.args, "-o", idx_name_tmp.buf, NULL);
    ++	strvec_push(&ip.args, pack_name_tmp.buf);
     +	ip.git_cmd = 1;
     +	ip.no_stdin = 1;
     +	ip.no_stdout = 1;
    @@ gvfs-helper.c (new)
     +		ec = GH__ERROR_CODE__SUBPROCESS_SYNTAX;
     +		goto cleanup;
     +	}
    -+	
    ++
     +	for (k = 0; k < result_list.nr; k++)
     +		if (packet_write_fmt_gently(1, "%s\n",
     +					    result_list.items[k].string))
    @@ gvfs-helper.c (new)
     
      ## promisor-remote.c ##
     @@
    + #include "cache.h"
    + #include "object-store.h"
    ++#include "gvfs-helper-client.h"
      #include "promisor-remote.h"
      #include "config.h"
      #include "transport.h"
    -+#include "gvfs-helper-client.h"
    - 
    - static char *repository_format_partial_clone;
    - static const char *core_partial_clone_filter_default;
     @@ promisor-remote.c: static int fetch_objects(const char *remote_name,
    - 	struct ref *ref = NULL;
    - 	int i;
    + 		die(_("promisor-remote: unable to fork off fetch subprocess"));
    + 	child_in = xfdopen(child.in, "w");
      
     +
      	for (i = 0; i < oid_nr; i++) {
    - 		struct ref *new_ref = alloc_ref(oid_to_hex(&oids[i]));
    - 		oidcpy(&new_ref->old_oid, &oids[i]);
    + 		if (fputs(oid_to_hex(&oids[i]), child_in) < 0)
    + 			die_errno(_("promisor-remote: could not write to fetch subprocess"));
     @@ promisor-remote.c: struct promisor_remote *promisor_remote_find(const char *remote_name)
      
      int has_promisor_remote(void)
    @@ sha1-file.c: static int do_oid_object_info_extended(struct repository *r,
     
      ## t/helper/.gitignore ##
     @@
    ++/test-gvfs-protocol
      /test-tool
      /test-fake-ssh
    -+/test-gvfs-protocol
    - /test-line-buffer
    - /test-svn-fe
 80:  f327f3982b =  80:  4655c42c6a gvfs-helper: fix race condition when creating loose object dirs
 81:  c86d36c5f5 =  81:  6c339cdb87 sha1-file: create shared-cache directory if it doesn't exist
 82:  b26036885b =  82:  ef924bab4a gvfs-helper: better handling of network errors
 83:  477728b3db =  83:  307905144a gvfs-helper-client: properly update loose cache with fetched OID
 84:  6b7ed3453d !  84:  5a13ed0c76 gvfs-helper: V2 robust retry and throttling
    @@ gvfs-helper.c: static void create_tempfile_for_loose(
      		goto cleanup;
      	}
     @@ gvfs-helper.c: static void install_packfile(struct gh__response_status *status,
    - 	argv_array_push(&ip.args, "index-pack");
    + 	strvec_push(&ip.args, "index-pack");
      	if (gh__cmd_opts.show_progress)
    - 		argv_array_push(&ip.args, "-v");
    --	argv_array_pushl(&ip.args, "-o", idx_name_tmp.buf, NULL);
    --	argv_array_push(&ip.args, pack_name_tmp.buf);
    -+	argv_array_pushl(&ip.args, "-o", params->temp_path_idx.buf, NULL);
    -+	argv_array_push(&ip.args, params->temp_path_pack.buf);
    + 		strvec_push(&ip.args, "-v");
    +-	strvec_pushl(&ip.args, "-o", idx_name_tmp.buf, NULL);
    +-	strvec_push(&ip.args, pack_name_tmp.buf);
    ++	strvec_pushl(&ip.args, "-o", params->temp_path_idx.buf, NULL);
    ++	strvec_push(&ip.args, params->temp_path_pack.buf);
      	ip.git_cmd = 1;
      	ip.no_stdin = 1;
      	ip.no_stdout = 1;
 85:  12c6b929bb !  85:  7c25512c3b gvfs-helper: expose gvfs/objects GET and POST semantics
    @@ gvfs-helper-client.c: static void gh_client__update_packed_git(const char *line)
     - * We expect:
     + * Both CAP_OBJECTS verbs return the same format response:
       *
    -  *    <odb> 
    +  *    <odb>
       *    <data>*
     @@ gvfs-helper-client.c: static void gh_client__update_packed_git(const char *line)
       * grouped with a queued request for a blob.  The tree-walk *might* be
    @@ gvfs-helper-client.c: static void gh_client__choose_odb(void)
      {
      	struct gh_server__process *entry;
     -	struct child_process *process;
    - 	struct argv_array argv = ARGV_ARRAY_INIT;
    + 	struct strvec argv = STRVEC_INIT;
      	struct strbuf quoted = STRBUF_INIT;
     -	int nr_loose = 0;
     -	int nr_packfile = 0;
    @@ gvfs-helper-client.c: static void gh_client__choose_odb(void)
      
     @@ gvfs-helper-client.c: static int gh_client__get(enum gh_client__created *p_ghc)
      
    - 	sq_quote_argv_pretty(&quoted, argv.argv);
    + 	sq_quote_argv_pretty(&quoted, argv.v);
      
     +	/*
     +	 * Find an existing long-running process with the above command
    @@ gvfs-helper-client.c: static int gh_client__get(enum gh_client__created *p_ghc)
      		entry = xmalloc(sizeof(*entry));
      		entry->supported_capabilities = 0;
      
    --		err = subprocess_start_argv(
    +-		err = subprocess_start_strvec(
     -			&gh_server__subprocess_map, &entry->subprocess, 1,
     -			&argv, gh_client__start_fn);
     -		if (err) {
     -			free(entry);
     -			goto leave_region;
     -		}
    -+		if (subprocess_start_argv(&gh_server__subprocess_map,
    ++		if (subprocess_start_strvec(&gh_server__subprocess_map,
     +					  &entry->subprocess, 1,
     +					  &argv, gh_client__start_fn))
     +			FREE_AND_NULL(entry);
    @@ gvfs-helper-client.c: static int gh_client__get(enum gh_client__created *p_ghc)
      	}
      
     -leave_region:
    - 	argv_array_clear(&argv);
    + 	strvec_clear(&argv);
      	strbuf_release(&quoted);
      
     -	trace2_data_intmax("gh-client", the_repository,
 86:  8bca98d861 =  86:  a293db30da gvfs-helper: dramatically reduce progress noise
 87:  3bae69916c =  87:  f574a9d2b1 gvfs-helper-client.h: define struct object_id
 88:  8125dab1f8 =  88:  36bb9e5c51 gvfs-helper: handle pack-file after single POST request
 89:  bfe4852bd6 !  89:  cf21d2e69f test-gvfs-prococol, t5799: tests for gvfs-helper
    @@ Makefile: else
      	REMOTE_CURL_PRIMARY = git-remote-http$X
      	REMOTE_CURL_ALIASES = git-remote-https$X git-remote-ftp$X git-remote-ftps$X
     
    + ## contrib/buildsystems/CMakeLists.txt ##
    +@@ contrib/buildsystems/CMakeLists.txt: set(wrapper_scripts
    + set(wrapper_test_scripts
    + 	test-fake-ssh test-tool)
    + 
    ++if(CURL_FOUND)
    ++	list(APPEND wrapper_test_scripts test-gvfs-protocol)
    ++
    ++	add_executable(test-gvfs-protocol ${CMAKE_SOURCE_DIR}/t/helper/test-gvfs-protocol.c)
    ++	target_link_libraries(test-gvfs-protocol common-main)
    ++
    ++	if(MSVC)
    ++		set_target_properties(test-gvfs-protocol
    ++					PROPERTIES RUNTIME_OUTPUT_DIRECTORY_DEBUG ${CMAKE_BINARY_DIR}/t/helper)
    ++		set_target_properties(test-gvfs-protocol
    ++					PROPERTIES RUNTIME_OUTPUT_DIRECTORY_RELEASE ${CMAKE_BINARY_DIR}/t/helper)
    ++	endif()
    ++endif()
    ++
    + 
    + foreach(script ${wrapper_scripts})
    + 	file(STRINGS ${CMAKE_SOURCE_DIR}/wrap-for-bin.sh content NEWLINE_CONSUME)
    +
      ## gvfs-helper.c ##
     @@ gvfs-helper.c: static void install_loose(struct gh__request_params *params,
      	/*
    @@ t/helper/test-gvfs-protocol.c (new)
     +	enum worker_result wr;
     +	int result;
     +
    -+	argv_array_push(&pack_objects.args, "git");
    -+	argv_array_push(&pack_objects.args, "pack-objects");
    -+	argv_array_push(&pack_objects.args, "-q");
    -+	argv_array_push(&pack_objects.args, "--revs");
    -+	argv_array_push(&pack_objects.args, "--delta-base-offset");
    -+	argv_array_push(&pack_objects.args, "--window=0");
    -+	argv_array_push(&pack_objects.args, "--depth=4095");
    -+	argv_array_push(&pack_objects.args, "--compression=1");
    -+	argv_array_push(&pack_objects.args, "--stdout");
    ++	strvec_push(&pack_objects.args, "git");
    ++	strvec_push(&pack_objects.args, "pack-objects");
    ++	strvec_push(&pack_objects.args, "-q");
    ++	strvec_push(&pack_objects.args, "--revs");
    ++	strvec_push(&pack_objects.args, "--delta-base-offset");
    ++	strvec_push(&pack_objects.args, "--window=0");
    ++	strvec_push(&pack_objects.args, "--depth=4095");
    ++	strvec_push(&pack_objects.args, "--compression=1");
    ++	strvec_push(&pack_objects.args, "--stdout");
     +
     +	pack_objects.in = -1;
     +	pack_objects.out = -1;
    @@ t/helper/test-gvfs-protocol.c (new)
     +			cradle = &blanket->next;
     +}
     +
    -+static struct argv_array cld_argv = ARGV_ARRAY_INIT;
    ++static struct strvec cld_argv = STRVEC_INIT;
     +static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
     +{
     +	struct child_process cld = CHILD_PROCESS_INIT;
    @@ t/helper/test-gvfs-protocol.c (new)
     +		char buf[128] = "";
     +		struct sockaddr_in *sin_addr = (void *) addr;
     +		inet_ntop(addr->sa_family, &sin_addr->sin_addr, buf, sizeof(buf));
    -+		argv_array_pushf(&cld.env_array, "REMOTE_ADDR=%s", buf);
    -+		argv_array_pushf(&cld.env_array, "REMOTE_PORT=%d",
    ++		strvec_pushf(&cld.env_array, "REMOTE_ADDR=%s", buf);
    ++		strvec_pushf(&cld.env_array, "REMOTE_PORT=%d",
     +				 ntohs(sin_addr->sin_port));
     +#ifndef NO_IPV6
     +	} else if (addr->sa_family == AF_INET6) {
     +		char buf[128] = "";
     +		struct sockaddr_in6 *sin6_addr = (void *) addr;
     +		inet_ntop(AF_INET6, &sin6_addr->sin6_addr, buf, sizeof(buf));
    -+		argv_array_pushf(&cld.env_array, "REMOTE_ADDR=[%s]", buf);
    -+		argv_array_pushf(&cld.env_array, "REMOTE_PORT=%d",
    ++		strvec_pushf(&cld.env_array, "REMOTE_ADDR=[%s]", buf);
    ++		strvec_pushf(&cld.env_array, "REMOTE_PORT=%d",
     +				 ntohs(sin6_addr->sin6_port));
     +#endif
     +	}
     +
     +	if (mayhem_list.nr) {
    -+		argv_array_pushf(&cld.env_array, "MAYHEM_CHILD=%d",
    ++		strvec_pushf(&cld.env_array, "MAYHEM_CHILD=%d",
     +				 mayhem_child++);
     +	}
     +
    -+	cld.argv = cld_argv.argv;
    ++	cld.argv = cld_argv.v;
     +	cld.in = incoming;
     +	cld.out = dup(incoming);
     +
    @@ t/helper/test-gvfs-protocol.c (new)
     +	 * The magic here is made possible because `cld_argv` is static
     +	 * and handle() (called by service_loop()) knows about it.
     +	 */
    -+	argv_array_push(&cld_argv, argv[0]);
    -+	argv_array_push(&cld_argv, "--worker");
    ++	strvec_push(&cld_argv, argv[0]);
    ++	strvec_push(&cld_argv, "--worker");
     +	for (i = 1; i < argc; ++i)
    -+		argv_array_push(&cld_argv, argv[i]);
    ++		strvec_push(&cld_argv, argv[i]);
     +
     +	/*
     +	 * Setup primary instance to listen for connections.
 90:  1063cc311f =  90:  f5efc95691 gvfs-helper: move result-list construction into install functions
 91:  161e878890 =  91:  214742cbc0 t5799: add support for POST to return either a loose object or packfile
 92:  bac306aadd =  92:  fa55f00e59 t5799: cleanup wc-l and grep-c lines
 93:  33f78d3ee0 !  93:  b8e45f3f37 gvfs-helper: add prefetch support
    @@ gvfs-helper-client.c: static void gh_client__update_packed_git(const char *line)
     - * Both CAP_OBJECTS verbs return the same format response:
     + * CAP_OBJECTS verbs return the same format response:
       *
    -  *    <odb> 
    +  *    <odb>
       *    <data>*
     @@ gvfs-helper-client.c: static int gh_client__objects__receive_response(
      	const char *v1;
    @@ gvfs-helper.c: static int create_loose_pathname_in_odb(struct strbuf *buf_path,
     +	struct strbuf ip_stdout = STRBUF_INIT;
      
     -	gh__response_status__zero(status);
    -+	argv_array_push(&ip.args, "git");
    -+	argv_array_push(&ip.args, "index-pack");
    ++	strvec_push(&ip.args, "git");
    ++	strvec_push(&ip.args, "index-pack");
     +	if (gh__cmd_opts.show_progress)
    -+		argv_array_push(&ip.args, "-v");
    -+	argv_array_pushl(&ip.args, "-o", temp_path_idx->buf, NULL);
    -+	argv_array_push(&ip.args, temp_path_pack->buf);
    ++		strvec_push(&ip.args, "-v");
    ++	strvec_pushl(&ip.args, "-o", temp_path_idx->buf, NULL);
    ++	strvec_push(&ip.args, temp_path_pack->buf);
     +	ip.no_stdin = 1;
     +	ip.out = -1;
     +	ip.err = -1;
    @@ gvfs-helper.c: static void create_tempfile_for_loose(
      		goto cleanup;
      	}
      
    --	argv_array_push(&ip.args, "index-pack");
    +-	strvec_push(&ip.args, "index-pack");
     -	if (gh__cmd_opts.show_progress)
    --		argv_array_push(&ip.args, "-v");
    --	argv_array_pushl(&ip.args, "-o", params->temp_path_idx.buf, NULL);
    --	argv_array_push(&ip.args, params->temp_path_pack.buf);
    +-		strvec_push(&ip.args, "-v");
    +-	strvec_pushl(&ip.args, "-o", params->temp_path_idx.buf, NULL);
    +-	strvec_push(&ip.args, params->temp_path_pack.buf);
     -	ip.git_cmd = 1;
     -	ip.no_stdin = 1;
     -	ip.no_stdout = 1;
 94:  c2c0bfe4d7 =  94:  af9e66a8cd gvfs-helper: add prefetch .keep file for last packfile
 95:  223673646e =  95:  fb0cc1d394 gvfs-helper: do one read in my_copy_fd_len_tail()
 96:  e3598bd97f =  96:  1494685925 gvfs-helper: move content-type warning for prefetch packs
 97:  7cb2f2109a =  97:  2dfd860eae fetch: use gvfs-helper prefetch under config
 98:  53c5dd7aa2 =  98:  e7c478c21b gvfs-helper: better support for concurrent packfile fetches
 99:  deaf2adb5a <   -:  ---------- remote-curl: do not call fetch-pack when using gvfs-helper
  -:  ---------- >  99:  c77c4dac01 remote-curl: do not call fetch-pack when using gvfs-helper
100:  447e94175b = 100:  fa2da1f774 fetch: reprepare packs before checking connectivity
101:  9b8df979ee = 101:  7feb4df9ee gvfs-helper: retry when creating temp files
102:  1d10404613 = 102:  b53665ddfd gvfs-helper: fix support for NTLM
103:  11c105960a = 103:  8e2d96f4e6 gvfs-helper: cleanup NTLM fix
104:  c7093e88fd = 104:  71163d6242 upload-pack: fix race condition in error messages
120:  7a62e224cf = 105:  2ef221f28c maintenance: add prefetch task
121:  f3a16fd324 = 106:  4365e78bf7 maintenance: add loose-objects task
122:  b3e577f587 = 107:  024389f79f maintenance: create auto condition for loose-objects
123:  5d8f40c2b5 = 108:  fb4e3df31c midx: enable core.multiPackIndex by default
124:  c705a2699f ! 109:  3b64a1c680 midx: use start_delayed_progress()
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      	for (i = 0; i < num_chunks; i++) {
      		if (written != chunk_offsets[i])
     @@ midx.c: int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
    - 		return 0;
    + 	}
      
      	if (flags & MIDX_PROGRESS)
     -		progress = start_progress(_("Looking for referenced packfiles"),
125:  33677a1107 ! 110:  16aaecd97c maintenance: add incremental-repack task
    @@ t/t5319-multi-pack-index.sh
     +GIT_TEST_MULTI_PACK_INDEX=0
      objdir=.git/objects
      
    - midx_read_expect () {
    + HASH_LEN=$(test_oid rawsz)
     
      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_description='git maintenance builtin'
126:  46c2216b25 = 111:  c99e296edd maintenance: auto-size incremental-repack batch
127:  9fc7c0e246 = 112:  3e82179846 maintenance: add incremental-repack auto condition
128:  8feb6f4180 = 113:  ba653c48dd maintenance: optionally skip --auto process
129:  9cce5768c2 = 114:  58c639a1d0 maintenance: add --schedule option and config
130:  acfa269be7 = 115:  3915fc66a6 for-each-repo: run subcommands on configured repos
131:  6b9822fc43 = 116:  4526cc41e0 maintenance: add [un]register subcommands
132:  db861329ef ! 117:  348d22d6a3 maintenance: add start/stop subcommands
    @@ Documentation/git-maintenance.txt: run::
      	only removes the repository from the configured list. It does not
     
      ## Makefile ##
    -@@ Makefile: TEST_BUILTINS_OBJS += test-advise.o
    - TEST_BUILTINS_OBJS += test-bloom.o
    +@@ Makefile: TEST_BUILTINS_OBJS += test-bloom.o
      TEST_BUILTINS_OBJS += test-chmtime.o
    + TEST_BUILTINS_OBJS += test-cmp.o
      TEST_BUILTINS_OBJS += test-config.o
     +TEST_BUILTINS_OBJS += test-crontab.o
      TEST_BUILTINS_OBJS += test-ctype.o
    @@ t/helper/test-crontab.c (new)
     
      ## t/helper/test-tool.c ##
     @@ t/helper/test-tool.c: static struct test_cmd cmds[] = {
    - 	{ "bloom", cmd__bloom },
      	{ "chmtime", cmd__chmtime },
    + 	{ "cmp", cmd__cmp },
      	{ "config", cmd__config },
     +	{ "crontab", cmd__crontab },
      	{ "ctype", cmd__ctype },
    @@ t/helper/test-tool.c: static struct test_cmd cmds[] = {
      	{ "delta", cmd__delta },
     
      ## t/helper/test-tool.h ##
    -@@ t/helper/test-tool.h: int cmd__advise_if_enabled(int argc, const char **argv);
    - int cmd__bloom(int argc, const char **argv);
    +@@ t/helper/test-tool.h: int cmd__bloom(int argc, const char **argv);
      int cmd__chmtime(int argc, const char **argv);
    + int cmd__cmp(int argc, const char **argv);
      int cmd__config(int argc, const char **argv);
     +int cmd__crontab(int argc, const char **argv);
      int cmd__ctype(int argc, const char **argv);
133:  7c7698da17 = 118:  870d43ac78 maintenance: recommended schedule in register/start
134:  a7030467d6 = 119:  7e2b9051c5 maintenance: add troubleshooting guide to docs
105:  f496cd02d5 = 120:  545edad6fe homebrew: add GitHub workflow to release Cask
106:  4d78edbff9 <   -:  ---------- fixup! Add a new run_hook_argv() function
107:  6555aa7d5c <   -:  ---------- fixup! remote-curl: do not call fetch-pack when using gvfs-helper
108:  4105190eef <   -:  ---------- gvfs-helper: update to use strvec
109:  5a64304ed5 <   -:  ---------- vcbuild: fix library name for expat with make MSVC=1
110:  5cf2176398 <   -:  ---------- vcbuild: fix batch file name in README
111:  2cfaf3f901 <   -:  ---------- contrib/buildsystems: fix expat library name for generated vcxproj
112:  6a8142ef22 <   -:  ---------- squash! Add support for read-object as a background process to retrieve missing objects
113:  938c973906 <   -:  ---------- fixup! Teach ahead-behind and serialized status to play nicely together
114:  bfecfd05f5 <   -:  ---------- fixup! status: reject deserialize in V2 and conflicts
115:  9910a2ec06 <   -:  ---------- fixup! pack-objects (mingw): demonstrate a segmentation fault with large deltas
116:  b179cbbafc <   -:  ---------- fixup! mingw: add a cache below mingw's lstat and dirent implementations
117:  e5890e143f <   -:  ---------- cmake: ignore generated files
118:  96a717b914 <   -:  ---------- fixup! gvfs-helper: create tool to fetch objects using the GVFS Protocol
119:  60b30069eb <   -:  ---------- fixup! test-gvfs-prococol, t5799: tests for gvfs-helper

@dscho
Copy link
Member

dscho commented Oct 6, 2020

 11:  ae794d1bb8 !  11:  1326949d9f Add a new run_hook_argv() function

We probably want to adjust the commit message, too (it still talks about _argv instead of _strvec).

 99:  deaf2adb5a <   -:  ---------- remote-curl: do not call fetch-pack when using gvfs-helper
  -:  ---------- >  99:  2970512f3b remote-curl: do not call fetch-pack when using gvfs-helper

With --creation-factor=99, I get this (which looks good):

1:  deaf2adb5a51 ! 1:  2970512f3b0e remote-curl: do not call fetch-pack when using gvfs-helper
    @@ Commit message
     
      ## remote-curl.c ##
     @@ remote-curl.c: static int fetch_git(struct discovery *heads,
    - 	struct argv_array args = ARGV_ARRAY_INIT;
    + 	struct strvec args = STRVEC_INIT;
      	struct strbuf rpc_result = STRBUF_INIT;
      
     +	if (core_use_gvfs_helper)
     +		return 0;
     +
    - 	argv_array_pushl(&args, "fetch-pack", "--stateless-rpc",
    - 			 "--stdin", "--lock-pack", NULL);
    + 	strvec_pushl(&args, "fetch-pack", "--stateless-rpc",
    + 		     "--stdin", "--lock-pack", NULL);
      	if (options.followtags)
108:  4105190eef <   -:  ---------- gvfs-helper: update to use strvec
[...]
  -:  ---------- > 106:  4ceb624e77 gvfs-helper: update to use strvec

Again using --creation-factor=99, I get this:

1:  4105190eeff2 ! 1:  4ceb624e7787 gvfs-helper: update to use strvec
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     
      ## gvfs-helper-client.c ##
    -@@
    - #include "cache.h"
    --#include "argv-array.h"
    -+#include "strvec.h"
    - #include "trace2.h"
    - #include "oidset.h"
    - #include "object.h"
     @@ gvfs-helper-client.c: static void gh_client__update_packed_git(const char *line)
      /*
       * CAP_OBJECTS verbs return the same format response:
    @@ gvfs-helper-client.c: static void gh_client__update_packed_git(const char *line)
       *    <data>*
       *    <status>
       *    <flush>
    -@@ gvfs-helper-client.c: static struct gh_server__process *gh_client__find_long_running_process(
    - 	unsigned int cap_needed)
    - {
    - 	struct gh_server__process *entry;
    --	struct argv_array argv = ARGV_ARRAY_INIT;
    -+	struct strvec argv = STRVEC_INIT;
    - 	struct strbuf quoted = STRBUF_INIT;
    - 
    - 	gh_client__choose_odb();
    -@@ gvfs-helper-client.c: static struct gh_server__process *gh_client__find_long_running_process(
    - 	/*
    - 	 * TODO decide what defaults we want.
    - 	 */
    --	argv_array_push(&argv, "gvfs-helper");
    --	argv_array_push(&argv, "--fallback");
    --	argv_array_push(&argv, "--cache-server=trust");
    --	argv_array_pushf(&argv, "--shared-cache=%s",
    -+	strvec_push(&argv, "gvfs-helper");
    -+	strvec_push(&argv, "--fallback");
    -+	strvec_push(&argv, "--cache-server=trust");
    -+	strvec_pushf(&argv, "--shared-cache=%s",
    - 			 gh_client__chosen_odb->path);
    --	argv_array_push(&argv, "server");
    -+	strvec_push(&argv, "server");
    - 
    --	sq_quote_argv_pretty(&quoted, argv.argv);
    -+	sq_quote_argv_pretty(&quoted, argv.v);
    - 
    - 	/*
    - 	 * Find an existing long-running process with the above command
     @@ gvfs-helper-client.c: static struct gh_server__process *gh_client__find_long_running_process(
      		entry = xmalloc(sizeof(*entry));
      		entry->supported_capabilities = 0;
    @@ gvfs-helper-client.c: static struct gh_server__process *gh_client__find_long_run
      					  &entry->subprocess, 1,
      					  &argv, gh_client__start_fn))
      			FREE_AND_NULL(entry);
    -@@ gvfs-helper-client.c: static struct gh_server__process *gh_client__find_long_running_process(
    - 		FREE_AND_NULL(entry);
    - 	}
    - 
    --	argv_array_clear(&argv);
    -+	strvec_clear(&argv);
    - 	strbuf_release(&quoted);
    - 
    - 	return entry;
     
      ## gvfs-helper.c ##
    -@@
    - #include "pkt-line.h"
    - #include "string-list.h"
    - #include "sideband.h"
    --#include "argv-array.h"
    -+#include "strvec.h"
    - #include "credential.h"
    - #include "oid-array.h"
    - #include "send-pack.h"
     @@ gvfs-helper.c: static int gh__curl_progress_cb(void *clientp,
      	 * If we pass zero for the total to the "struct progress" API, we
      	 * get simple numbers rather than percentages.  So our progress
    @@ gvfs-helper.c: static int gh__curl_progress_cb(void *clientp,
      	 * It is unclear if CURL will give us a final callback after
      	 * everything is finished, so we leave the progress handle open
      	 * and let the caller issue the final stop_progress().
    -@@ gvfs-helper.c: static void my_run_index_pack(struct gh__request_params *params,
    - 	struct child_process ip = CHILD_PROCESS_INIT;
    - 	struct strbuf ip_stdout = STRBUF_INIT;
    - 
    --	argv_array_push(&ip.args, "git");
    --	argv_array_push(&ip.args, "index-pack");
    -+	strvec_push(&ip.args, "git");
    -+	strvec_push(&ip.args, "index-pack");
    - 	if (gh__cmd_opts.show_progress)
    --		argv_array_push(&ip.args, "-v");
    --	argv_array_pushl(&ip.args, "-o", temp_path_idx->buf, NULL);
    --	argv_array_push(&ip.args, temp_path_pack->buf);
    -+		strvec_push(&ip.args, "-v");
    -+	strvec_pushl(&ip.args, "-o", temp_path_idx->buf, NULL);
    -+	strvec_push(&ip.args, temp_path_pack->buf);
    - 	ip.no_stdin = 1;
    - 	ip.out = -1;
    - 	ip.err = -1;
     @@ gvfs-helper.c: static enum gh__error_code do_server_subprocess__objects(const char *verb_line)
      		ec = GH__ERROR_CODE__SUBPROCESS_SYNTAX;
      		goto cleanup;
    @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
     +int subprocess_start_strvec(struct hashmap *hashmap,
      			  struct subprocess_entry *entry,
      			  int is_git_cmd,
    - 			  const struct argv_array *argv,
    + 			  const struct strvec *argv,
     
      ## sub-process.h ##
     @@ sub-process.h: typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
    @@ sub-process.h: typedef int(*subprocess_start_fn)(struct subprocess_entry *entry)
     +int subprocess_start_strvec(struct hashmap *hashmap,
      			  struct subprocess_entry *entry,
      			  int is_git_cmd,
    --			  const struct argv_array *argv,
    -+			  const struct strvec *argv,
    - 			  subprocess_start_fn startfn);
    - 
    - /* Kill a subprocess and remove it from the subprocess hashmap. */
    + 			  const struct strvec *argv,

Which means that most (but not all) of 4ceb624 basically was squashed into earlier commits. Looking at the reminder, I suspect that we want to squash even the rest of those fixups?

Finally, it appears that the build is failing because of a duplicate test number, t5582. "Our t5582" should probably be renamed to t5583.

To put my money where my mouth is, I added a couple of commits on top, most of them fixup!/squash! ones (intended to be autosquashed before finalizing this PR).

@dscho
Copy link
Member

dscho commented Oct 6, 2020

I added a couple of commits on top, most of them fixup!/squash! ones (intended to be autosquashed before finalizing this PR).

I did the autosquashing rebase, and the (treesame) result is here: https://github.com/microsoft/git/commits/vfs-2.29.0-rc0.

@derrickstolee
Copy link
Author

I added a couple of commits on top, most of them fixup!/squash! ones (intended to be autosquashed before finalizing this PR).

I did the autosquashing rebase, and the (treesame) result is here: https://github.com/microsoft/git/commits/vfs-2.29.0-rc0.

Thank you, @dscho! I have reset to your branch and pushed to this PR. I appreciate your attention to detail (and rebase skills).

@derrickstolee derrickstolee force-pushed the tentative/vfs-2.29.0 branch 2 times, most recently from c071a5b to 4135e3a Compare October 9, 2020 18:36
@derrickstolee derrickstolee changed the title [DO NOT MERGE] Rebase onto v2.29.0-rc0.windows.1 [DO NOT MERGE] Rebase onto v2.29.0-rc1.windows.1 Oct 12, 2020
@derrickstolee derrickstolee changed the title [DO NOT MERGE] Rebase onto v2.29.0-rc1.windows.1 [DO NOT MERGE] Rebase onto v2.29.0-rc2.windows.1 Oct 16, 2020
Kevin Willford and others added 19 commits October 19, 2020 15:21
While using the reset --stdin feature on windows path added may have a
\r at the end of the path that wasn't getting removed so didn't match
the path in the index and wasn't reset.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Saeed Noursalehi <sanoursa@microsoft.com>
Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
This header file will accumulate GVFS-specific definitions.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
This does not do anything yet. The next patches will add various values
for that config setting that correspond to the various features
offered/required by GVFS.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
This takes a substantial amount of time, and if the user is reasonably
sure that the files' integrity is not compromised, that time can be saved.

Git no longer verifies the SHA-1 by default, anyway.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Prevent the sparse checkout to delete files that were marked with
skip-worktree bit and are not in the sparse-checkout file.

This is because everything with the skip-worktree bit turned on is being
virtualized and will be removed with the change of HEAD.

There was only one failing test when running with these changes that was
checking to make sure the worktree narrows on checkout which was
expected since we would no longer be narrowing the worktree.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
While performing a fetch with a virtual file system we know that there
will be missing objects and we don't want to download them just because
of the reachability of the commits.  We also don't want to download a
pack file with commits, trees, and blobs since these will be downloaded
on demand.

This flag will skip the first connectivity check and by returning zero
will skip the upload pack. It will also skip the second connectivity
check but continue to update the branches to the latest commit ids.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Ensure all filters and EOL conversions are blocked when running under
GVFS so that our projected file sizes will match the actual file size
when it is hydrated on the local machine.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
The two existing members of the run_hook*() family, run_hook_ve() and
run_hook_le(), are good for callers that know the precise number of
parameters already. Let's introduce a new sibling that takes a strvec
for callers that want to pass a variable number of parameters.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The idea is to allow blob objects to be missing from the local repository,
and to load them lazily on demand.

After discussing this idea on the mailing list, we will rename the feature
to "lazy clone" and work more on this.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Hydrate missing loose objects in check_and_freshen() when running
virtualized. Add test cases to verify read-object hook works when
running virtualized.

This hook is called in check_and_freshen() rather than
check_and_freshen_local() to make the hook work also with alternates.

Helped-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
…ng objects

This commit converts the existing read_object hook proc model for
downloading missing blobs to use a background process that is started
the first time git encounters a missing blob and stays running until git
exits.  Git and the read-object process communicate via stdin/stdout and
a versioned, capability negotiated interface as documented in
Documentation/technical/read-object-protocol.txt.  The advantage of this
over the previous hook proc is that it saves the overhead of spawning a
new hook process for every missing blob.

The model for the background process was refactored from the recent git
LFS work.  I refactored that code into a shared module (sub-process.c/h)
and then updated convert.c to consume the new library.  I then used the
same sub-process module when implementing the read-object background
process.

The read-object hook feature was designed before the SHA-256 support was
even close to be started. As a consequence, its protocol hard-codes the
key `sha1`, even if we now also support SHA-256 object IDs.

Technically, this is wrong, and probably the best way forward would be
to rename the key to `oid` (or `sha256`, but that is less future-proof).

However, there are existing setups out there, with existing read-object
hooks that most likely have no idea what to do with `oid` requests. So
let's leave the key as `sha1` for the time being, even if it will be
technically incorrect in SHA-256 repositories.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
If we are going to write an object there is no use in calling
the read object hook to get an object from a potentially remote
source.  We would rather just write out the object and avoid the
potential round trip for an object that doesn't exist.

This change adds a flag to the check_and_freshen() and
freshen_loose_object() functions' signatures so that the hook
is bypassed when the functions are called before writing loose
objects. The check for a local object is still performed so we
don't overwrite something that has already been written to one
of the objects directories.

Based on a patch by Kevin Willford.

Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
These were done in private, before microsoft/git.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
This adds hard-coded call to GVFS.hooks.exe before and after each Git
command runs.

To make sure that this is only called on repositories cloned with GVFS, we
test for the tell-tale .gvfs.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Suggested by Ben Peart.

Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
Signed-off-by: Alejandro Pauly <alpauly@microsoft.com>
derrickstolee and others added 19 commits October 19, 2020 15:29
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a502 (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The core.multiPackIndex setting has been around since c4d2522
(config: create core.multiPackIndex setting, 2018-07-12), but has been
disabled by default. If a user wishes to use the multi-pack-index
feature, then they must enable this config and run 'git multi-pack-index
write'.

The multi-pack-index feature is relatively stable now, so make the
config option true by default. For users that do not use a
multi-pack-index, the only extra cost will be a file lookup to see if a
multi-pack-index file exists (once per process, per object directory).

Also, this config option will be referenced by an upcoming
"incremental-repack" task in the maintenance builtin, so move the config
option into the repository settings struct. Note that if
GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option
and treat core.multiPackIndex as enabled.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().

Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximates how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.

Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

If a user sets any 'maintenance.<task>.schedule' config value, then
they have chosen a specific schedule for themselves and Git should
respect that.

To make this process extremely simple for users, assume a default
schedule when no 'maintenance.<task>.schedule' or '...enabled' config
settings are concretely set. This is only an in-process assumption, so
future versions of Git could adjust this expected schedule.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The 'git maintenance run' subcommand takes a lock on the object database
to prevent concurrent processes from competing for resources. This is an
important safety measure to prevent possible repository corruption and
data loss.

This feature can lead to confusing behavior if a user is not aware of
it. Add a TROUBLESHOOTING section to the 'git maintenance' builtin
documentation that discusses these tradeoffs. The short version of this
section is that Git will not corrupt your repository, but if the list of
scheduled tasks takes longer than an hour then some scheduled tasks may
be dropped due to this object database collision. For example, a
long-running "daily" task at midnight might prevent an "hourly" task
from running at 1AM.

The opposite is also possible, but less likely as long as the "hourly"
tasks are much faster than the "daily" and "weekly" tasks.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Includes commits from these pull requests:

	#188

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Add a GitHub workflow that is triggered on the `release` event to
automatically update the `microsoft-git` Homebrew Cask on the
`microsoft/git` Tap.

A secret `HOMEBREW_TOKEN` with push permissions to the
`microsoft/homebrew-git` repository must exist. A pull request will be
created at the moment to allow for last minute manual verification.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
This includes all changes from #292, but then also `ds/maintenance-part-3` from upstream. This is _not_ the final maintenance builtin, but is very close. It's time to start making full updates in Scalar that depend on them.
It is possible that a loose object that is written from a GVFS protocol
"get object" request does not match the expected hash. Error out in this
case.
@dscho
Copy link
Member

dscho commented Oct 19, 2020

@derrickstolee if you want, you can already rebase onto git-for-windows@add3ceb (i.e. https://github.com/dscho/git/tree/rebase-to-v2.29.0). This will become v2.29.0.windows.1.

@derrickstolee
Copy link
Author

@derrickstolee if you want, you can already rebase onto git-for-windows@add3ceb (i.e. https://github.com/dscho/git/tree/rebase-to-v2.29.0). This will become v2.29.0.windows.1.

I already did! See https://github.com/microsoft/git/commits/vfs-2.29.0.

Teach helper/test-gvfs-protocol to be able to send corrupted
loose blobs.

Add unit test for gvfs-helper to detect receipt of a corrupted loose blob.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Oct 20, 2020
See microsoft/git#295.

Scalar already merged these changes to the PackfileMaintenanceStepTests.

The other changes were reviewed privately.
derrickstolee added a commit to microsoft/scalar that referenced this pull request Oct 20, 2020
@dscho dscho deleted the tentative/vfs-2.29.0 branch October 20, 2020 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants