Skip to content

Conversation

@dscho
Copy link

@dscho dscho commented Mar 6, 2018

A couple of suggested touchups.

Signed-off-by: Johannes Schindelin johannes.schindelin@gmx.de

A couple of suggested touchups.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@derrickstolee derrickstolee merged commit ef77b24 into derrickstolee:contributing Mar 6, 2018
derrickstolee pushed a commit that referenced this pull request Apr 2, 2018
The function ce_write_entry() uses a 'self-initialised' variable
construct, for the symbol 'saved_namelen', to suppress a gcc
'-Wmaybe-uninitialized' warning, given that the warning is a false
positive.

For the purposes of this discussion, the ce_write_entry() function has
three code blocks of interest, that look like so:

        /* block #1 */
        if (ce->ce_flags & CE_STRIP_NAME) {
                saved_namelen = ce_namelen(ce);
                ce->ce_namelen = 0;
        }

        /* block #2 */
        /*
	 * several code blocks that contain, among others, calls
         * to copy_cache_entry_to_ondisk(ondisk, ce);
         */

        /* block #3 */
        if (ce->ce_flags & CE_STRIP_NAME) {
                ce->ce_namelen = saved_namelen;
                ce->ce_flags &= ~CE_STRIP_NAME;
        }

The warning implies that gcc thinks it is possible that the first
block is not entered, the calls to copy_cache_entry_to_ondisk()
could toggle the CE_STRIP_NAME flag on, thereby entering block #3
with saved_namelen unset. However, the copy_cache_entry_to_ondisk()
function does not write to ce->ce_flags (it only reads). gcc could
easily determine this, since that function is local to this file,
but it obviously doesn't.

In order to suppress this warning, we make it clear to the reader
(human and compiler), that block #3 will only be entered when the
first block has been entered, by introducing a new 'stripped_name'
boolean variable. We also take the opportunity to change the type
of 'saved_namelen' to 'unsigned int' to match ce->ce_namelen.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Dec 26, 2019
…ev()

In 'builtin/name-rev.c' in the name_rev() function there is a loop
iterating over all parents of the given commit, and the loop body
looks like this:

  if (parent_number > 1) {
      if (generation > 0)
          // branch #1
          new_name = ...
      else
          // branch #2
          new_name = ...
      name_rev(parent, new_name, ...);
  } else {
      // branch #3
      name_rev(...);
  }

These conditions are not covered properly in the test suite.  As far
as purely test coverage goes, they are all executed several times over
in 't6120-describe.sh'.  However, they don't directly influence the
command's output, because the repository used in that test script
contains several branches and tags pointing somewhere into the middle
of the commit DAG, and thus result in a better name for the
to-be-named commit.  This can hide bugs: e.g. by replacing the
'new_name' parameter of the first recursive name_rev() call with
'tip_name' (effectively making both branch #1 and #2 a noop) 'git
name-rev --all' shows thousands of bogus names in the Git repository,
but the whole test suite still passes successfully.  In an early
version of a later patch in this series I managed to mess up all three
branches (at once!), but the test suite still passed.

So add a new test case that operates on the following history:

  A--------------master
   \            /
    \----------M2
     \        /
      \---M1-C
       \ /
        B

and names the commit 'B' to make sure that all three branches are
crucial to determine 'B's name:

  - There is only a single ref, so all names are based on 'master',
    without any undesired interference from other refs.

  - Each time name_rev() follows the second parent of a merge commit,
    it appends "^2" to the name.  Following 'master's second parent
    right at the start ensures that all commits on the ancestry path
    from 'master' to 'B' have a different base name from the original
    'tip_name' of the very first name_rev() invocation.  Currently,
    while name_rev() is recursive, it doesn't matter, but it will be
    necessary to properly cover all three branches after the recursion
    is eliminated later in this series.

  - Following 'M2's second parent makes sure that branch #2 (i.e. when
    'generation = 0') affects 'B's name.

  - Following the only parent of the non-merge commit 'C' ensures that
    branch #3 affects 'B's name, and that it increments 'generation'.

  - Coming from 'C' 'generation' is 1, thus following 'M1's second
    parent makes sure that branch #1 affects 'B's name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Feb 7, 2020
Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:

  make CC=gcc-10 SANITIZE=address test

results in failures in t4018, t3206, and t4062.

The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.

The failure in t4062 is an ASan warning:

  expecting success of 4062.2 '-G matches':
  	git diff --name-only -G "^(0{64}){64}$" HEAD^ >out &&
  	test 4096-zeroes.txt = "$(cat out)"

  =================================================================
  ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
  READ of size 4097 at 0x7fa76f672000 thread T0
      #0 0x7fa7726f75b5  (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
      #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
      #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
      #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
      [...]

In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:

  diff --git a/diff.c b/diff.c
  index 8e2914c..cfae60c120 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
           */
          if (ce_uptodate(ce) ||
              (!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0)))
  -               return 1;
  +               return 0;

          return 0;
   }

to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.

The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:

  diff --git a/xdiff-interface.c b/xdiff-interface.c
  index 8509f9e..f6c3dc1986 100644
  --- a/xdiff-interface.c
  +++ b/xdiff-interface.c
  @@ -197,6 +197,7 @@ struct ff_regs {
   	struct ff_reg {
   		regex_t re;
   		int negate;
  +		char *printable;
   	} *array;
   };

  @@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,

   	for (i = 0; i < regs->nr; i++) {
   		struct ff_reg *reg = regs->array + i;
  -		if (!regexec_buf(&reg->re, line, len, 2, pmatch, 0)) {
  +		int ret = regexec_buf(&reg->re, line, len, 2, pmatch, 0);
  +		warning("regexec %s:\n  regex: %s\n  buf: %.*s",
  +			ret == 0 ? "matched" : "did not match",
  +			reg->printable,
  +			(int)len, line);
  +		if (!ret) {
   			if (reg->negate)
   				return -1;
   			break;
  @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
   			expression = value;
   		if (regcomp(&reg->re, expression, cflags))
   			die("Invalid regexp to look for hunk header: %s", expression);
  +		reg->printable = xstrdup(expression);
   		free(buffer);
   		value = ep + 1;
   	}

then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:

  $ git diff -U1 cpp-skip-access-specifiers
  warning: regexec did not match:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: private:
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ private:
          void DoSomething();
          int ChangeMe;
  };
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.

The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:

  warning: regexec matched:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  [...more lines that we end up not using...]
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: class RIGHT : public Baseclass
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ class RIGHT : public Baseclass
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.

Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!

We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jun 5, 2020
In write_commit_graph_file() we now have one block of code filling the
array of 'struct chunk_info' with the IDs and sizes of chunks to be
written, and an other block of code calling the functions responsible
for writing individual chunks.  In case of optional chunks like Extra
Edge List an Base Graphs List there is also a condition checking
whether that chunk is necessary/desired, and that same condition is
repeated in both blocks of code.  This patch series is about to add
more optional chunks, so there would be even more repeated conditions.

Eliminate these repeated conditions by storing the function pointers
responsible for writing individual chunks in the 'struct chunk_info'
array as well, and calling them in a loop to write the commit-graph
file.  This will open up the possibility for a bit of foolproofing in
the following patch.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
derrickstolee pushed a commit that referenced this pull request Oct 8, 2020
The OFFSETOF_VAR(var, member) macro is implemented in terms of
offsetof(typeof(*var), member) with compilers that know typeof(),
but its fallback implemenation compares &(var->member) and (var) and
count the distance in bytes, i.e.

    ((uintptr_t)&(var)->member - (uintptr_t)(var))

MSVC's runtime check, when fed an uninitialized 'var', flags this as
a use of an uninitialized variable (and that is legit---uninitialized
contents of 'var' is subtracted) in a debug build.

After auditing all 6 uses of OFFSETOF_VAR(), 1 of them does feed a
potentially uninitialized 'var' to the macro in the beginning of the
for() loop:

    #define hashmap_for_each_entry(map, iter, var, member) \
            for (var = hashmap_iter_first_entry_offset(map, iter, \
                                                    OFFSETOF_VAR(var, member)); \
                    var; \
                    var = hashmap_iter_next_entry_offset(iter, \
                                                    OFFSETOF_VAR(var, member)))

We can work around this by making sure that var has _some_ value
when OFFSETOF_VAR() is called.  Strictly speaking, it invites
undefined behaviour to use NULL here if we end up with pointer
comparison, but MSVC runtime seems to be happy with it, and most
other systems have typeof() and don't even need pointer comparison
fallback code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Oct 8, 2020
Using the CMake support we added some time ago for real with Visual
Studio build revealed there were lot of usability improvements
possible, which have been carried out.

* js/cmake-vs:
  hashmap_for_each_entry(): workaround MSVC's runtime check failure #3
  cmake (Windows): recommend using Visual Studio's built-in CMake support
  cmake (Windows): initialize vcpkg/build dependencies automatically
  cmake (Windows): complain when encountering an unknown compiler
  cmake (Windows): let the `.dll` files be found when running the tests
  cmake: quote the path accurately when editing `test-lib.sh`
  cmake: fall back to using `vcpkg`'s `msgfmt.exe` on Windows
  cmake: ensure that the `vcpkg` packages are found on Windows
  cmake: do find Git for Windows' shell interpreter
  cmake: ignore files generated by CMake as run in Visual Studio
@dscho dscho deleted the contributing branch October 21, 2020 11:39
derrickstolee pushed a commit that referenced this pull request Dec 23, 2020
Test 5572.63 ("branch has no merge base with remote-tracking
counterpart") was introduced in 4d36f88 (submodule: do not pass null
OID to setup_revisions, 2018-05-24), as a regression test for the bug
this commit was fixing (preventing a 'fatal: bad object' error when the
current branch and the remote-tracking branch we are pulling have no
merge-base).

However, the commit message for 4d36f88 does not describe in which
real-life situation this bug was encountered. The brief discussion on the
mailing list [1] does not either.

The regression test is not really representative of a real-life
scenario: both the local repository and its upstream have only a single
commit, and the "no merge-base" scenario is simulated by recreating this
root commit in the local repository using 'git commit-tree' before
calling 'git pull --rebase --recurse-submodules'. The rebase succeeds
and results in the local branch being reset to the same root commit as
the upstream branch.

The fix in 4d36f88 modifies 'submodule.c::submodule_touches_in_range'
so that if 'excl_oid' is null, which is the case when the 'git merge-base
--fork-point' invocation in 'builtin/pull.c::get_rebase_fork_point'
errors (no fork-point), then instead of 'incl_oid --not excl_oid' being
passed to setup_revisions, only 'incl_oid' is passed, and
'submodule_touches_in_range' examines 'incl_oid' and all its ancestors
to verify that they do not touch the submodule.

In test 5572.63, the recreated lone root commit in the local repository is
thus the only commit being examined by 'submodule_touches_in_range', and
this commit *adds* the submodule. However, 'submodule_touches_in_range'
*succeeds* because 'combine-diff.c::diff_tree_combined' (see the
backtrace below) returns early since this commit is the root commit
and has no parents.

  #0  diff_tree_combined at combine-diff.c:1494
  #1  0x0000000100150cbe in diff_tree_combined_merge at combine-diff.c:1649
  #2  0x00000001002c7147 in collect_changed_submodules at submodule.c:869
  #3  0x00000001002c7d6f in submodule_touches_in_range at submodule.c:1268
  #4  0x00000001000ad58b in cmd_pull at builtin/pull.c:1040

In light of all this, add a note in t5572 documenting this peculiar
test.

[1] https://lore.kernel.org/git/20180524204729.19896-1-jonathantanmy@google.com/t/#u

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee added a commit that referenced this pull request Jan 4, 2021
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 4, 2021
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 7, 2021
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 20, 2021
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

By running the debugger for 'git status -uno' after that change, we find
two instances of ensure_full_index() that were added for extra safety,
but can be removed without issue.

In refresh_index(), we loop through the index entries. The
refresh_cache_ent() method copies the sparse directories into the
refreshed index without issue.

The loop within run_diff_files() skips things that are in stage 0 and
have skip-worktree enabled, so seems safe to disable ensure_full_index()
here.

TODO: performance numbers at this point are confusing:

Benchmark #1: full index (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      2.522 s ±  0.079 s    [User: 1.511 s, System: 0.833 s]
  Range (min … max):    2.454 s …  2.715 s    10 runs

Benchmark #2: sparse index, old (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      3.370 s ±  0.036 s    [User: 3.166 s, System: 0.294 s]
  Range (min … max):    3.318 s …  3.428 s    10 runs

Benchmark #3: sparse index (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      5.196 s ±  0.056 s    [User: 5.189 s, System: 0.185 s]
  Range (min … max):    5.138 s …  5.269 s    10 runs

This shows that the previous change (Benchmark #2) had some overhead with
ensure_full_index() compared to a full index (Benchmark #1) but the
current change got much slower for some reason! Note that
ensure_full_index() is not called anywhere in this process! What is
going on?

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

By running the debugger for 'git status -uno' after that change, we find
two instances of ensure_full_index() that were added for extra safety,
but can be removed without issue.

In refresh_index(), we loop through the index entries. The
refresh_cache_ent() method copies the sparse directories into the
refreshed index without issue.

The loop within run_diff_files() skips things that are in stage 0 and
have skip-worktree enabled, so seems safe to disable ensure_full_index()
here.

TODO: performance numbers at this point are confusing:

Benchmark #1: full index (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      2.522 s ±  0.079 s    [User: 1.511 s, System: 0.833 s]
  Range (min … max):    2.454 s …  2.715 s    10 runs

Benchmark #2: sparse index, old (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      3.370 s ±  0.036 s    [User: 3.166 s, System: 0.294 s]
  Range (min … max):    3.318 s …  3.428 s    10 runs

Benchmark #3: sparse index (git -c core.fsmonitor= status -uno)
  Time (mean ± σ):      5.196 s ±  0.056 s    [User: 5.189 s, System: 0.185 s]
  Range (min … max):    5.138 s …  5.269 s    10 runs

This shows that the previous change (Benchmark #2) had some overhead with
ensure_full_index() compared to a full index (Benchmark #1) but the
current change got much slower for some reason! Note that
ensure_full_index() is not called anywhere in this process! What is
going on?

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jan 25, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Feb 17, 2021
This is modelled on the version of handle_directory_level_conflicts()
from merge-recursive.c, but is massively simplified due to the following
factors:
  * strmap API provides simplifications over using direct hashmap
  * we have a dirs_removed field in struct rename_info that we have an
    easy way to populate from collect_merge_info(); this was already
    used in compute_rename_counts() and thus we do not need to check
    for condition #2.
  * The removal of condition #2 by handling it earlier in the code also
    obviates the need to check for condition #3 -- if both sides renamed
    a directory, meaning that the directory no longer exists on either
    side, then neither side could have added any new files to that
    directory, and thus there are no files whose locations we need to
    move due to such a directory rename.

In fact, the same logic that makes condition #3 irrelevant means
condition #1 is also irrelevant so we could drop this function.
However, it is cheap to check if both sides rename the same directory,
and doing so can save future computation.  So, simply remove any
directories that both sides renamed from the list of directory renames.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Feb 17, 2021
Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"
  $ git config merge.renameLimit 30000
  $ git config merge.directoryRenames true

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                       merge-recursive           merge-ort
    no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s

A single re-run of each with some breakdowns:

                                    ---  no-renames  ---
                              merge-recursive   merge-ort
    overall runtime:              19.302 s        14.257 s
    inexact rename detection:      7.603 s         7.906 s
    everything else:              11.699 s         6.351 s

                                    --- mega-renames ---
                              merge-recursive   merge-ort
    overall runtime:            5950.195 s      5499.672 s
    inexact rename detection:   5746.309 s      5487.120 s
    everything else:             203.886 s        17.552 s

                                    --- just-one-mega ---
                              merge-recursive   merge-ort
    overall runtime:             151.001 s       158.582 s
    inexact rename detection:    143.448 s       157.835 s
    everything else:               7.553 s         0.747 s

=== Timing observations ===

0) Maximum speedup

The "everything else" row represents the maximum speedup we could
achieve if we were to somehow infinitely parallelize inexact rename
detection, but leave everything else alone.  The fact that this is so
much smaller than the real runtime (even in the case with virtually no
renames) makes it clear just how overwhelmingly large the time spent on
rename detection can be.

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 17s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort and into fast-rebase already.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee added a commit that referenced this pull request Feb 17, 2021
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Mar 24, 2021
…sponse

query_result can be be an empty strbuf (STRBUF_INIT) - in that case
trying to read 3 bytes triggers a buffer overflow read (as
query_result.buf = '\0').

Therefore we need to check query_result's length before trying to read 3
bytes.

This overflow was introduced in:
  940b94f (fsmonitor: log invocation of FSMonitor hook to trace2, 2021-02-03)
It was found when running the test-suite against ASAN, and can be most
easily reproduced with the following command:

make GIT_TEST_OPTS="-v" DEFAULT_TEST_TARGET="t7519-status-fsmonitor.sh" \
SANITIZE=address DEVELOPER=1 test

==2235==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000019e6e5e at pc 0x00000043745c bp 0x7fffd382c520 sp 0x7fffd382bcc8
READ of size 3 at 0x0000019e6e5e thread T0
    #0 0x43745b in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7
    #1 0x43786d in bcmp /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:887:10
    #2 0x80b146 in fsmonitor_is_trivial_response /home/ahunt/oss-fuzz/git/fsmonitor.c:192:10
    #3 0x80b146 in query_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:175:7
    #4 0x80a749 in refresh_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:267:21
    #5 0x80bad1 in tweak_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:429:4
    #6 0x90f040 in read_index_from /home/ahunt/oss-fuzz/git/read-cache.c:2321:3
    #7 0x8e5d08 in repo_read_index_preload /home/ahunt/oss-fuzz/git/preload-index.c:164:15
    #8 0x52dd45 in prepare_index /home/ahunt/oss-fuzz/git/builtin/commit.c:363:6
    #9 0x52a188 in cmd_commit /home/ahunt/oss-fuzz/git/builtin/commit.c:1588:15
    #10 0x4ce77e in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #11 0x4ccb18 in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #12 0x4cb01c in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #13 0x4cb01c in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #14 0x6aca8d in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #15 0x7fb027bf5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #16 0x4206b9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

0x0000019e6e5e is located 2 bytes to the left of global variable 'strbuf_slopbuf' defined in 'strbuf.c:51:6' (0x19e6e60) of size 1
  'strbuf_slopbuf' is ascii string ''
0x0000019e6e5e is located 126 bytes to the right of global variable 'signals' defined in 'sigchain.c:11:31' (0x19e6be0) of size 512
SUMMARY: AddressSanitizer: global-buffer-overflow /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x000080334d70: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x000080334d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334db0: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
=>0x000080334dc0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9[f9]01 f9 f9 f9
  0x000080334dd0: f9 f9 f9 f9 03 f9 f9 f9 f9 f9 f9 f9 02 f9 f9 f9
  0x000080334de0: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
  0x000080334df0: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x000080334e00: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 01 f9 f9 f9
  0x000080334e10: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 10, 2021
shorten_unambiguous_ref() returns an allocated string. We have to
track it separately from the const refname.

This leak has existed since:
9ab55da (git symbolic-ref --delete $symref, 2012-10-21)

This leak was found when running t0001 with LSAN, see also LSAN output
below:

Direct leak of 19 byte(s) in 1 object(s) allocated from:
    #0 0x486514 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9ab048 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x8b452f in refs_shorten_unambiguous_ref /home/ahunt/oss-fuzz/git/refs.c
    #3 0x8b47e8 in shorten_unambiguous_ref /home/ahunt/oss-fuzz/git/refs.c:1287:9
    #4 0x679fce in check_symref /home/ahunt/oss-fuzz/git/builtin/symbolic-ref.c:28:14
    #5 0x679ad8 in cmd_symbolic_ref /home/ahunt/oss-fuzz/git/builtin/symbolic-ref.c:70:9
    #6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #10 0x69cc6e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #11 0x7f98388a4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 10, 2021
dwim_ref() allocs a new string into ref. Instead of setting to NULL to
discard it, we can FREE_AND_NULL.

This leak appears to have been introduced in:
4cf76f6 (builtin/reset: compute checkout metadata for reset, 2020-03-16)

This leak was found when running t0001 with LSAN, see also LSAN output below:

Direct leak of 5 byte(s) in 1 object(s) allocated from:
    #0 0x486514 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9a7108 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x8add6b in expand_ref /home/ahunt/oss-fuzz/git/refs.c:670:12
    #3 0x8ad777 in repo_dwim_ref /home/ahunt/oss-fuzz/git/refs.c:644:22
    #4 0x6394af in dwim_ref /home/ahunt/oss-fuzz/git/./refs.h:162:9
    #5 0x637e5c in cmd_reset /home/ahunt/oss-fuzz/git/builtin/reset.c:426:4
    #6 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #7 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #8 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #9 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #10 0x69c5ce in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #11 0x7f57ebb9d349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 4, 2024
The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:

    + git index-pack --fix-thin --stdin
    fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)

    =================================================================
    ==3904583==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
        #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
        #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
        #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
        #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
        #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
        #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
    Aborted

What happens is this:

  0. We construct a bogus pack with a duplicate object in it and trigger
     index-pack.

  1. We spawn a bunch of worker threads to resolve deltas (on my system
     it is 16 threads).

  2. One of the threads sees the duplicate object and bails by calling
     exit(), taking down all of the threads. This is expected and is the
     point of the test.

  3. At the time exit() is called, we may still be spawning threads from
     the main process via pthread_create(). LSan hooks thread creation
     to update its book-keeping; it has to know where each thread's
     stack is (so it can find entry points for reachable memory). So it
     calls pthread_getattr_np() to get information about the new thread.
     That may allocate memory that must be freed with a matching call to
     pthread_attr_destroy(). Probably LSan does that immediately, but
     if you're unlucky enough, the exit() will happen while it's between
     those two calls, and the allocated pthread_attr_t appears as a
     leak.

This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.

It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).

In the meantime, it's pretty easy to avoid the race by making creation
of the worker threads "atomic". That is, we'll spawn all of them before
letting any of them start to work. That's easy to do because we already
have a work_lock() mutex for handing out that work. If the main process
takes it, then all of the threads will immediately block until we've
finished spawning and released it.

This shouldn't make any practical difference for non-LSan runs. The
thread spawning is quick, and could happen before any worker thread gets
scheduled anyway.

Probably other spots that use threads are subject to the same issues.
But since we have to manually insert locking (and since this really is
kind of a hack), let's not bother with them unless somebody experiences
a similar racy false-positive in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jul 3, 2024
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.

In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.

But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].

When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.

In 795006f (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:

    ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
    #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
    #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
    #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
    #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
    #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
    #6 0x55c5ccc8fac8 in run_builtin git.c:474:11

which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.

Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps.  In the latter case, we never read the pack_int_id field, so
the choice of '-1' is intentional as a "garbage in, garbage out"
measure.

Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.

[^1]: This can happen for a couple of reasons, either because the
  repository is configured with 'pack.allowPackReuse=(true|single)', or
  because the MIDX was generated prior to the introduction of the BTMP
  chunk, which contains information necessary to perform multi-pack
  reuse.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jul 3, 2024
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    #6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    #8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    #9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    #14 0x5651f321d327 in run_builtin git/git.c:474:11
    #15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    #16 0x5651f321a792 in run_argv git/git.c:793:4
    #17 0x5651f321a792 in cmd_main git/git.c:928:19
    #18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 12, 2024
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Sep 5, 2024
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Oct 9, 2024
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
dscho added a commit that referenced this pull request Dec 10, 2024
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
This one is a little bit more curious. In t6112, we have a test that
exercises the `git rev-list --filter` option with invalid filters. We
execute git-rev-list(1) via `test_must_fail`, which means that we check
for leaks even though Git exits with an error code. This causes the
following leak:

    Direct leak of 27 byte(s) in 1 object(s) allocated from:
        #0 0x5555555e6946 in realloc.part.0 lsan_interceptors.cpp.o
        #1 0x5555558fb4b6 in xrealloc wrapper.c:137:8
        #2 0x5555558b6e06 in strbuf_grow strbuf.c:112:2
        #3 0x5555558b7550 in strbuf_add strbuf.c:311:2
        #4 0x5555557c1a88 in strbuf_addstr strbuf.h:310:2
        #5 0x5555557c1d4c in parse_list_objects_filter list-objects-filter-options.c:261:3
        #6 0x555555885ead in handle_revision_pseudo_opt revision.c:2899:3
        #7 0x555555884e20 in setup_revisions revision.c:3014:11
        #8 0x5555556c4b42 in cmd_rev_list builtin/rev-list.c:588:9
        #9 0x5555555ec5e3 in run_builtin git.c:483:11
        #10 0x5555555eb1e4 in handle_builtin git.c:749:13
        #11 0x5555555ec001 in run_argv git.c:819:4
        #12 0x5555555eaf94 in cmd_main git.c:954:19
        #13 0x5555556fd569 in main common-main.c:64:11
        #14 0x7ffff7ca714d in __libc_start_call_main (.../lib/libc.so.6+0x2a14d)
        #15 0x7ffff7ca7208 in __libc_start_main@GLIBC_2.2.5 (.../libc.so.6+0x2a208)
        #16 0x5555555ad064 in _start (git+0x59064)

This leak is valid, as we call `die()` and do not clean up the memory at
all. But what's curious is that this is the only leak reported, because
we don't clean up any other allocated memory, either, and I have no idea
why the leak sanitizer treats this buffer specially.

In any case, we can work around the leak by shuffling things around a
bit. Instead of calling `gently_parse_list_objects_filter()` and dying
after we have modified the filter spec, we simply do so beforehand. Like
this we don't allocate the buffer in the error case, which makes the
reported leak go away.

It's not pretty, but it manages to make t6112 leak free.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
When running t5601 with the leak checker enabled we can see a hang in
our CI systems. This hang seems to be system-specific, as I cannot
reproduce it on my own machine.

As it turns out, the issue is in those testcases that exercise cloning
of `~repo`-style paths. All of the testcases that hang eventually end up
interpreting "repo" as the username and will call getpwnam(3p) with that
username. That should of course be fine, and getpwnam(3p) should just
return an error. But instead, the leak sanitizer seems to be recursing
while handling a call to `free()` in the NSS modules:

    #0  0x00007ffff7fd98d5 in _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:720
    #1  0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916
    #2  0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
    #3  0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27
    #4  0x00007ffff7a2b33a in __lsan::Deallocate (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127
    #5  __lsan::lsan_free (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220
    ...
    #261505 0x00007ffff7fd99f2 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
    #261506 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:822
    #261507 0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916
    #261508 0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
    #261509 0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27
    #261510 0x00007ffff7a2b33a in __lsan::Deallocate (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127
    #261511 __lsan::lsan_free (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220
    #261512 0x00007ffff793da25 in module_load (module=0x515000000280) at ./nss/nss_module.c:188
    #261513 0x00007ffff793dee5 in __nss_module_load (module=0x515000000280) at ./nss/nss_module.c:302
    #261514 __nss_module_get_function (module=0x515000000280, name=name@entry=0x7ffff79b9128 "getpwnam_r") at ./nss/nss_module.c:328
    #261515 0x00007ffff793e741 in __GI___nss_lookup_function (fct_name=<optimized out>, ni=<optimized out>) at ./nss/nsswitch.c:137
    #261516 __GI___nss_next2 (ni=ni@entry=0x7fffffffa458, fct_name=fct_name@entry=0x7ffff79b9128 "getpwnam_r", fct2_name=fct2_name@entry=0x0, fctp=fctp@entry=0x7fffffffa460,
        status=status@entry=0, all_values=all_values@entry=0) at ./nss/nsswitch.c:120
    #261517 0x00007ffff794c6a7 in __getpwnam_r (name=name@entry=0x501000000060 "repo", resbuf=resbuf@entry=0x7ffff79fb320 <resbuf>, buffer=<optimized out>,
        buflen=buflen@entry=1024, result=result@entry=0x7fffffffa4b0) at ../nss/getXXbyYY_r.c:343
    #261518 0x00007ffff794c4d8 in getpwnam (name=0x501000000060 "repo") at ../nss/getXXbyYY.c:140
    #261519 0x00005555557e37ff in getpw_str (username=0x5020000001a1 "repo", len=4) at path.c:613
    #261520 0x00005555557e3937 in interpolate_path (path=0x5020000001a0 "~repo", real_home=0) at path.c:654
    #261521 0x00005555557e3aea in enter_repo (path=0x501000000040 "~repo", strict=0) at path.c:718
    #261522 0x000055555568f0ba in cmd_upload_pack (argc=1, argv=0x502000000100, prefix=0x0, repo=0x0) at builtin/upload-pack.c:57
    #261523 0x0000555555575ba8 in run_builtin (p=0x555555a20c98 <commands+3192>, argc=2, argv=0x502000000100, repo=0x555555a53b20 <the_repo>) at git.c:481
    #261524 0x0000555555576067 in handle_builtin (args=0x7fffffffaab0) at git.c:742
    #261525 0x000055555557678d in cmd_main (argc=2, argv=0x7fffffffac58) at git.c:912
    #261526 0x00005555556963cd in main (argc=2, argv=0x7fffffffac58) at common-main.c:64

Note that this stack is more than 260000 function calls deep. Run under
the debugger this will eventually segfault, but in our CI systems it
seems like this just hangs forever.

I assume that this is a bug either in the leak sanitizer or in glibc, as
I cannot reproduce it on my machine. In any case, let's work around the
bug for now by marking those tests with the "!SANITIZE_LEAK" prereq.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
There's a race with LSan when spawning threads and one of the threads
calls die(). We worked around one such problem with index-pack in the
previous commit, but it exists in git-grep, too. You can see it with:

  make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
  cd t
  ./t0003-attributes.sh --stress

which fails pretty quickly with:

  ==git==4096424==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
      #0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
      #1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
      #2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
      #3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
      #4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
      #5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
      #6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447
      #7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

As with the previous commit, we can fix this by inserting a barrier that
makes sure all threads have finished their setup before continuing. But
there's one twist in this case: the thread which calls die() is not one
of the worker threads, but the main thread itself!

So we need the main thread to wait in the barrier, too, until all
threads have gotten to it. And thus we initialize the barrier for
num_threads+1, to account for all of the worker threads plus the main
one.

If we then test as above, t0003 should run indefinitely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
In 1b9e9be (csum-file.c: use unsafe SHA-1 implementation when
available, 2024-09-26) we have converted our `struct hashfile` to use
the unsafe SHA1 backend, which results in a significant speedup. One
needs to be careful with how to use that structure now though because
callers need to consistently use either the safe or unsafe variants of
SHA1, as otherwise one can easily trigger corruption.

As it turns out, we have one inconsistent usage in our tree because we
directly initialize `struct hashfile_checkpoint::ctx` with the safe
variant of SHA1, but end up writing to that context with the unsafe
ones. This went unnoticed so far because our CI systems do not exercise
different hash functions for these two backends, and consequently safe
and unsafe variants are equivalent. But when using SHA1DC as safe and
OpenSSL as unsafe backend this leads to a crash an t1050:

    ++ git -c core.compression=0 add large1
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==1367==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x507000000db0 sp 0x7fffffff5690 T0)
    ==1367==The signal is caused by a READ memory access.
    ==1367==Hint: address points to the zero page.
        #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
        #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
        #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
        #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
        #4 0x555555b9905d in deflate_blob_to_pack ../bulk-checkin.c:286:4
        #5 0x555555b98ae9 in index_blob_bulk_checkin ../bulk-checkin.c:362:15
        #6 0x555555ddab62 in index_blob_stream ../object-file.c:2756:9
        #7 0x555555dda420 in index_fd ../object-file.c:2778:9
        #8 0x555555ddad76 in index_path ../object-file.c:2796:7
        #9 0x555555e947f3 in add_to_index ../read-cache.c:771:7
        #10 0x555555e954a4 in add_file_to_index ../read-cache.c:804:9
        #11 0x5555558b5c39 in add_files ../builtin/add.c:355:7
        #12 0x5555558b412e in cmd_add ../builtin/add.c:578:18
        #13 0x555555b1f493 in run_builtin ../git.c:480:11
        #14 0x555555b1bfef in handle_builtin ../git.c:740:9
        #15 0x555555b1e6f4 in run_argv ../git.c:807:4
        #16 0x555555b1b87a in cmd_main ../git.c:947:19
        #17 0x5555561649e6 in main ../common-main.c:64:11
        #18 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #19 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #20 0x555555772c84 in _start (git+0x21ec84)

    ==1367==Register values:
    rax = 0x0000511000001080  rbx = 0x0000000000000000  rcx = 0x000000000000000c  rdx = 0x0000000000000000
    rdi = 0x0000000000000000  rsi = 0x0000507000000db0  rbp = 0x0000507000000db0  rsp = 0x00007fffffff5690
     r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x00007ffff7a01a30
    r12 = 0x0000000000000000  r13 = 0x00007fffffff6b38  r14 = 0x00007ffff7ffd000  r15 = 0x00005555563b9910
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
    ==1367==ABORTING
    ./test-lib.sh: line 1023:  1367 Aborted                 git $config add large1
    error: last command exited with $?=134
    not ok 4 - add with -c core.compression=0

Fix the issue by using the unsafe variant instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
Same as with the preceding commit, git-fast-import(1) is using the safe
variant to initialize a hashfile checkpoint. This leads to a segfault
when passing the checkpoint into the hashfile subsystem because it would
use the unsafe variants instead:

    ++ git --git-dir=R/.git fast-import --big-file-threshold=1
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0)
    ==577126==The signal is caused by a READ memory access.
    ==577126==Hint: address points to the zero page.
        #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
        #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
        #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
        #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
        #4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2
        #5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3
        #6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5
        #7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4
        #8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4
        #9 0x555555b1f493 in run_builtin ../git.c:480:11
        #10 0x555555b1bfef in handle_builtin ../git.c:740:9
        #11 0x555555b1e6f4 in run_argv ../git.c:807:4
        #12 0x555555b1b87a in cmd_main ../git.c:947:19
        #13 0x5555561649e6 in main ../common-main.c:64:11
        #14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #16 0x555555772c84 in _start (git+0x21ec84)

    ==577126==Register values:
    rax = 0x0000511000000cc0  rbx = 0x0000000000000000  rcx = 0x000000000000000c  rdx = 0x0000000000000000
    rdi = 0x0000000000000000  rsi = 0x00005070000009c0  rbp = 0x00005070000009c0  rsp = 0x00007fffffff5b30
     r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x00007ffff7a01a30
    r12 = 0x0000000000000000  r13 = 0x00007fffffff6b60  r14 = 0x00007ffff7ffd000  r15 = 0x00005555563b9910
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
    ==577126==ABORTING
    ./test-lib.sh: line 1039: 577126 Aborted                 git --git-dir=R/.git fast-import --big-file-threshold=1 < input
    error: last command exited with $?=134
    not ok 167 - R: blob bigger than threshold

The segfault is only exposed in case the unsafe and safe backends are
different from one another.

Fix the issue by initializing the context with the unsafe SHA1 variant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 6, 2025
Our CI jobs sometimes see false positive leaks like this:

        =================================================================
        ==3904583==ERROR: LeakSanitizer: detected memory leaks

        Direct leak of 32 byte(s) in 1 object(s) allocated from:
            #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
            #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
            #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
            #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
            #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
            #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
            #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
            #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This is not a leak in our code, but appears to be a race between one
thread calling exit() while another one is in LSan's stack setup code.
You can reproduce it easily by running t0003 or t5309 with --stress
(these trigger it because of the threading in git-grep and index-pack
respectively).

This may be a bug in LSan, but regardless of whether it is eventually
fixed, it is useful to work around it so that we stop seeing these false
positives.

We can recognize it by the mention of the sanitizer functions in the
DEDUP_TOKEN line. With this patch, the scripts mentioned above should
run with --stress indefinitely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Feb 22, 2025
When trying to create a Unix socket in a path that exceeds the maximum
socket name length we try to first change the directory into the parent
folder before creating the socket to reduce the length of the name. When
this fails we error out of `unix_sockaddr_init()` with an error code,
which indicates to the caller that the context has not been initialized.
Consequently, they don't release that context.

This leads to a memory leak: when we have already populated the context
with the original directory that we need to chdir(3p) back into, but
then the chdir(3p) into the socket's parent directory fails, then we
won't release the original directory's path. The leak is exposed by
t0301, but only when running tests in a directory hierarchy whose path
is long enough to make the socket name length exceed the maximum socket
name length:

    Direct leak of 129 byte(s) in 1 object(s) allocated from:
        #0 0x5555555e85c6 in realloc.part.0 lsan_interceptors.cpp.o
        #1 0x55555590e3d6 in xrealloc ../wrapper.c:140:8
        #2 0x5555558c8fc6 in strbuf_grow ../strbuf.c:114:2
        #3 0x5555558cacab in strbuf_getcwd ../strbuf.c:605:3
        #4 0x555555923ff6 in unix_sockaddr_init ../unix-socket.c:65:7
        #5 0x555555923e42 in unix_stream_connect ../unix-socket.c:84:6
        #6 0x55555562a984 in send_request ../builtin/credential-cache.c:46:11
        #7 0x55555562a89e in do_cache ../builtin/credential-cache.c:108:6
        #8 0x55555562a655 in cmd_credential_cache ../builtin/credential-cache.c:178:3
        #9 0x555555700547 in run_builtin ../git.c:480:11
        #10 0x5555556ff0e0 in handle_builtin ../git.c:740:9
        #11 0x5555556ffee8 in run_argv ../git.c:807:4
        #12 0x5555556fee6b in cmd_main ../git.c:947:19
        #13 0x55555593f689 in main ../common-main.c:64:11
        #14 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
        #15 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
        #16 0x5555555ad1d4 in _start (git+0x591d4)

    DEDUP_TOKEN: ___interceptor_realloc.part.0--xrealloc--strbuf_grow--strbuf_getcwd--unix_sockaddr_init--unix_stream_connect--send_request--do_cache--cmd_credential_cache--run_builtin--handle_builtin--run_argv--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
    SUMMARY: LeakSanitizer: 129 byte(s) leaked in 1 allocation(s).

Fix this leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Feb 22, 2025
We don't free the result of `remote_default_branch()`, leading to a
memory leak. This leak is exposed by t9211, but only when run with Meson
with the `-Db_sanitize=leak` option:

    Direct leak of 5 byte(s) in 1 object(s) allocated from:
        #0 0x5555555cfb93 in malloc (scalar+0x7bb93)
        #1 0x5555556b05c2 in do_xmalloc ../wrapper.c:55:8
        #2 0x5555556b06c4 in do_xmallocz ../wrapper.c:89:8
        #3 0x5555556b0656 in xmallocz ../wrapper.c:97:9
        #4 0x5555556b0728 in xmemdupz ../wrapper.c:113:16
        #5 0x5555556b07a7 in xstrndup ../wrapper.c:119:9
        #6 0x5555555d3a4b in remote_default_branch ../scalar.c:338:14
        #7 0x5555555d20e6 in cmd_clone ../scalar.c:493:28
        #8 0x5555555d196b in cmd_main ../scalar.c:992:14
        #9 0x5555557c4059 in main ../common-main.c:64:11
        #10 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
        #11 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
        #12 0x555555592054 in _start (scalar+0x3e054)

    DEDUP_TOKEN: __interceptor_malloc--do_xmalloc--do_xmallocz--xmallocz--xmemdupz--xstrndup--remote_default_branch--cmd_clone--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
    SUMMARY: LeakSanitizer: 5 byte(s) leaked in 1 allocation(s).

As the `branch` variable may contain a string constant obtained from
parsing command line arguments we cannot free the leaking variable
directly. Instead, introduce a new `branch_to_free` variable that only
ever gets assigned the allocated string and free that one to plug the
leak.

It is unclear why the leak isn't flagged when running the test via our
Makefile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jun 20, 2025
The GitHub's CI workflow uses 'actions/checkout@v4' to checkout the
repository. This action defaults to using the GitHub REST API to obtain
the repository if the `git` executable isn't available.

The step to build Git in the GitHub workflow can be summarized as:

  ...
  - uses: actions/checkout@v4 #1
  - run: ci/install-dependencies.sh #2
  ...
  - run: sudo --preserve-env --set-home --user=builder ci/run-build-and-tests.sh #3
  ...

Step #1, clones the repository, since the `git` executable isn't present
at this step, it uses GitHub's REST API to obtain a tar of the
repository.

Step #2, installs all dependencies, which includes the `git` executable.

Step #3, sets up the build, which includes setting up meson in the meson
job. At this point the `git` executable is present.

This means while the `git` executable is present, the repository doesn't
contain the '.git' folder. To keep both the CI's (GitLab and GitHub)
behavior consistent and to ensure that the build is performed on a
real-world scenario, install `git` before the repository is checked out.
This ensures that 'actions/checkout@v4' will clone the repository
instead of using a tarball. We also update the package cache while
installing `git`, this is because some distros will fail to locate the
package without updating the cache.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 5, 2025
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
derrickstolee pushed a commit that referenced this pull request Aug 15, 2025
find_cfg_ent() allocates a struct reflog_expire_entry_option via
FLEX_ALLOC_MEM and inserts it into a linked list in the
reflog_expire_options structure. The entries in this list are never
freed, resulting in a leak in cmd_reflog_expire and the gc reflog expire
maintenance task:

Direct leak of 39 byte(s) in 1 object(s) allocated from:
    #0 0x7ff975ee6883 in calloc (/lib64/libasan.so.8+0xe6883)
    #1 0x0000010edada in xcalloc ../wrapper.c:154
    #2 0x000000df0898 in find_cfg_ent ../reflog.c:28
    #3 0x000000df0898 in reflog_expire_config ../reflog.c:70
    #4 0x00000095c451 in configset_iter ../config.c:2116
    #5 0x0000006d29e7 in git_config ../config.h:724
    #6 0x0000006d29e7 in cmd_reflog_expire ../builtin/reflog.c:205
    #7 0x0000006d504c in cmd_reflog ../builtin/reflog.c:419
    #8 0x0000007e4054 in run_builtin ../git.c:480
    #9 0x0000007e4054 in handle_builtin ../git.c:746
    #10 0x0000007e8a35 in run_argv ../git.c:813
    #11 0x0000007e8a35 in cmd_main ../git.c:953
    #12 0x000000441e8f in main ../common-main.c:9
    #13 0x7ff9754115f4 in __libc_start_call_main (/lib64/libc.so.6+0x35f4)
    #14 0x7ff9754116a7 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x36a7)
    #15 0x000000444184 in _start (/home/jekeller/libexec/git-core/git+0x444184)

Close this leak by adding a reflog_clear_expire_config() function which
iterates the linked list and frees its elements. Call it upon exit of
cmd_reflog_expire() and reflog_expire_condition().

Add a basic test which covers this leak. While at it, cover the
functionality from commit commit 3cb22b8 (Per-ref reflog expiry
configuration, 2008-06-15). We've had this support for years, but lacked
any tests.

Co-developed-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee added a commit that referenced this pull request Aug 28, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure is disconnected from the real fix because it the bad
pointers are accessed later when closing the packfiles from the context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee added a commit that referenced this pull request Aug 28, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs. This test has a very
minor check at the end confirming only one packfile remaining. The
failing nature of this test actually relies on auto-GC cleaning up some
packfiles during the creation of the commits, as tests setting gc.auto
to zero make the packfile count match the number of added commits but
also avoids hitting the memory issue.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because it
the bad pointers are accessed later when closing the packfiles from the
context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee added a commit that referenced this pull request Aug 28, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs. This test has a very
minor check at the end confirming only one packfile remaining. The
failing nature of this test actually relies on auto-GC cleaning up some
packfiles during the creation of the commits, as tests setting gc.auto
to zero make the packfile count match the number of added commits but
also avoids hitting the memory issue.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because it
the bad pointers are accessed later when closing the packfiles from the
context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee added a commit that referenced this pull request Aug 30, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because the bad
pointers are accessed later when closing the packfiles from the context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee pushed a commit that referenced this pull request Sep 2, 2025
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
derrickstolee added a commit that referenced this pull request Sep 2, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs. This test has a very
minor check at the end confirming only one packfile remaining. The
failing nature of this test actually relies on auto-GC cleaning up some
packfiles during the creation of the commits, as tests setting gc.auto
to zero make the packfile count match the number of added commits but
also avoids hitting the memory issue.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because it
the bad pointers are accessed later when closing the packfiles from the
context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee added a commit that referenced this pull request Sep 2, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs. This test has a very
minor check at the end confirming only one packfile remaining. The
failing nature of this test actually relies on auto-GC cleaning up some
packfiles during the creation of the commits, as tests setting gc.auto
to zero make the packfile count match the number of added commits but
also avoids hitting the memory issue.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because it
the bad pointers are accessed later when closing the packfiles from the
context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
derrickstolee added a commit that referenced this pull request Nov 7, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.

The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.

Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.

The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.

When it fails under SANITIZE=address, it provides the following error:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
    #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
    #1 0x562d5d82d3ab in close_pack packfile.c:354
    #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
    #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
    #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
    ...

This failure stack trace is disconnected from the real fix because the bad
pointers are accessed later when closing the packfiles from the context.

There are a few different aspects to this fix that are worth noting:

 1. We return to the previous behavior of fill_packs_from_midx to not
    rely on the incremental flag or existence of a preferred pack.

 2. The behavior to scan all layers of an incremental midx is kept, so
    this is not a full revert of the change.

 3. We skip allocating more room in the pack_info array if the pack
    fails prepare_midx_pack().

 4. The method has always returned 0 for success and 1 for failure, but
    the condition checking for error added a check for a negative result
    for failure, so that is now updated.

 5. The call to open_pack_index() is removed, but this is needed later
    in the case of a preferred pack. That call is moved to immediately
    before its result is needed (checking for the object count).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Nov 7, 2025
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
derrickstolee pushed a commit that referenced this pull request Jan 8, 2026
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.

There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.

Which is *not* the theme of this patch ;-)

For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking.  When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.

    ==1080970==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 421 byte(s) in 2 object(s) allocated from:
	#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
	#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
	#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
	#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
	#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
	#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
	#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
	#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
	#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
	#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
	#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
	#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
	#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
	#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
	#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
	#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

    SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).

Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.

The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2026
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.

This logic has a memory leak though:

  Direct leak of 16 byte(s) in 1 object(s) allocated from:
      #0 0x55555562e433 in malloc (git+0xda433)
      #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
      #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
      #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
      #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
      #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
      #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
      #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
      #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
      #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
      #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
      #11 0x55555575166a in run_builtin ../git.c:506:11
      #12 0x5555557502f0 in handle_builtin ../git.c:779:9
      #13 0x555555751127 in run_argv ../git.c:862:4
      #14 0x55555575007b in cmd_main ../git.c:984:19
      #15 0x5555557523aa in main ../common-main.c:9:11
      #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      #18 0x5555555f0934 in _start (git+0x9c934)

The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.

This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.

While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.

Fix both of these bugs so that we properly count objects without leaking
any memory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2026
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):

  Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x55555562e726 in calloc (git+0xda726)
    #1 0x555555964734 in xcalloc ../wrapper.c:154:8
    #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
    #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
    #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
    #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
    #6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
    #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
    #8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
    #9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
    #10 0x5555558540bd in odb_read_object ../odb.c:920:6
    #11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
    #12 0x55555580a13a in grep_source_load ../grep.c:1986:10
    #13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
    #14 0x555555807574 in grep_source_1 ../grep.c:1625:8
    #15 0x555555807322 in grep_source ../grep.c:1837:10
    #16 0x5555556a5c58 in run ../builtin/grep.c:208:10
    #17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
    #18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
    #19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)

The root caues of this leak is the way we set up and release the
submodule:

  1. We use `repo_submodule_init()` to initialize a new repository. This
     repository is stored in `repos_to_free`.

  2. We now read data from the submodule repository.

  3. We then call `repo_clear()` on the submodule repositories.

  4. `repo_clear()` calls `odb_free()`.

  5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.

The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.

Fix the leak by closing the store before we free the sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants