tree -> index diff for status by Byron · Pull Request #1363 · GitoxideLabs/gitoxide

Byron · 2024-05-07T19:38:26Z

Based on #1350

diff-correctness → `gix-status` → gix reset

Improve gix status to the point where it's suitable for use in reset functionality.
Leads to a proper worktree reset implementation, eventually leading to a high-level reset similar to how git supports it.

Architecture

The reason this PR deals quite a bit with gix status is that for a safe implementation of reset() we need to be sure that the files we would want to touch don't don't carry modifications or are untracked files. In order to know what would need to be done, we have to diff the current-index with target-index. The set of files to touch can then be used to lookup information provided by git-status, like worktree modifications, index modifications, and untracked files, to know if we can proceed or not. Here is also where the reset-modes would affect the outcome, i.e. what to change and how.

This is a very modular approach which facilitates testing and understanding of what otherwise would be a very complex algorithm. Having a set of changes as output also allows to one day parallelize applying these changes.

This leaves us in a situation where the current checkout() implementation wants to become a fastpath for situations where the reset involves an empty tree as source (i.e. create everything and overwrite local changes).

Extra Tasks

Out-of-band tasks that just should finally be done, with potential for great impact.

support for hasconfig as part of resolve_includes() without actual lookahead.

Tasks

diff index with index to learn what we would want to do in the worktree, or alternatively,
diff tree with index (with reverse-diff functionality to simulate diff of index with tree), for better performance as it
would avoid having to allocate a whole index even though we are only interested in a diff.
- Must include rename tracking.
how to make diff results available from status with all transformations applied, to allow user to obtain diffs of any kind? Compare to tree-tree-diff which has that functionality already.
update is_dirty() and Submodule::status() to do full status.

Status Enables

cargo package and its use of complete status information.
Add gitoxide backend Byron/built#1
starship native dirty check (but needs diffstats of worktree-vs-index)
built can get fully-functional is-dirty flag for 'describe()'

Inbetween

sort out Oxidize Radicle-Git #1319

Next PR: Reset

reset() that checks if it's allowed to perform a worktree modification is allowed, or if an entry should be skipped. That way we can postpone safety checks like --hard

Postponed

What follows is important for resets, but won't be needed for cargo worktree resets.

a way to expand sparse dirs (but figure out if this is truly always necessary) - probably not, unless sparse dirs can be empty, but even then no expansion is needed
- wire it up in gix index entries to optionally expand sparse entries
gix status with implemented 'porcelain-v2` display mode

Research

Ignored files are considered expandable and can be overwritten on reset
How to integrate submodules - probably easy to answer once gix status can deal a little better with submodules. Even though in this case a lot of submodule-related information is needed for a complete reset, probably only doable by a higher-level caller which orchestrates it.
How to deal with various modes like merge and keep? How to control refresh? Maybe partial (only the files we touch), and full, to also update the files we don't touch as part of status? Maybe it's part of status if that is run before.
Worthwhile to make explicit the difference between git reset and git checkout in terms of HEAD modifications. With the former changing HEADs referent, and the latter changing HEAD itself.
figure out how this relates to the current checkout() method as technically that's a reset --hard with optional overwrite check. Could it be rolled into one, with pathspec support added?
- just keep them separate until it's clear that reset() performs just as well, which is unlikely as there is more overhead. But maybe it's not worth to maintain two versions over it. But if so, one should probably rename it.
for git status: what about rename tracking? It's available for tree-diffs and quite complex on its own. Probably only needs HEAD-vs-index rename tracking. No, also can have worktree rename tracking, even though it's hard to imagine how this can be fast unless it's tightly integrated with untracked-files handling. This screams for a generalization of the tracking code though as the testing and implementation is complex, but should be generalisable.

Re-learn

pathspecs normalize themselves to turn from any kind of specification into repo-root relative patterns.
attribute/ignore file sources are naturally relative to the root of the repo, which remains relative (i.e. can be .. and that root will be always be used to open files like ../.gitignore, which is useful for display to the user)

turns out that Powershell is used by default there, and variable substitutions are different there. What's worse is that it doesn't seem to fail if something doesn't work as one would expect.

EliahKagan · 2024-05-08T07:07:31Z

Regarding 91b6549, is it required to use PowerShell? Otherwise, shell: bash can be used, even on Windows, which gives Git Bash, so anything that is due to PowerShell rather than a POSIX-compatible shell being used may be fixable that way.

Byron · 2024-05-08T07:30:27Z

Thanks for the reminder! Yes, that would have worked too. Overall, I think using the ${{ env.var }} syntax is more portable and should be preferred. There might be other cases where it's definitely better to change the shell though.

EliahKagan · 2024-05-08T07:55:35Z

Yes, there is nothing wrong with ${{ env.var }}. The only reason I mentioned shell: bash was this bit from the commit message:

What's worse is that it doesn't seem to fail if something doesn't work as one
would expect.

Achieving the effect of set -e or set -u is less obvious and automatic in PowerShell than in POSIX-compatible shells, so I figured the ability to use bash might be relevant there.

Byron · 2024-05-08T14:18:20Z

Achieving the effect of set -e or set -u is less obvious and automatic in PowerShell than in POSIX-compatible shells, so I figured the ability to use bash might be relevant there.

And after writing the bit about powershell I also realized that I am so used to set -euo pipefail that maybe I forgot how bash would have done the same in that situation. But I am not sure how the defaults are set on CI, where one would hope a failing command indeed triggers failure by default.

Previousoly it would make an additional syscall to change permissions, which is slower than necessary.

…ng a single index. The motivating example is here: praetorian-inc/noseyparker#179 Previously, it was possible for a trampling herd of threads to consolidate the disk state. Most of them would be 'needs-init' threads which could notice that the initialization already happened, and just use that. But a thread might be late for the party and somehow manages to not get any newly loaded index, and thus tries to consolidate with what's on disk again. Then it would again determine no change, and return nothing, causing the caller to abort and not find objects it should find because it wouldn't see the index that it should have seen. The reason the thread got into this mess is that the 'is-load-ongoing' flagging was racy itself, so it would not wait for ongoing loads and just conclude nothing happened. An extra delay (by yielding) now assures it either seees the loading state and waits for it, sees the newly loaded indices. Note that this issue can be reproduced with: ``` './target/release/gix -r repo-with-one-pack -t10 --trace odb stats --extra-header-lookup' ```

Related to helix-editor/helix#10660 which runs into object types that are unknown. I have looked into this and [couldn't find evidence of a new pack-entry type](https://github.com/git/git/blob/0f3415f1f8478b05e64db11eb8aaa2915e48fef6/packfile.c#L1303-L1358) in the Git codebase. It also looks like that Git [will never write packs that aren't V2](https://github.com/git/git/blob/0f3415f1f8478b05e64db11eb8aaa2915e48fef6/pack-write.c#L352) - initially I thought it might write V3 based on some other criteria. For now, the thesis is that one would have to be able to mark bad objects to be able to handle this more gracefully, but all we can do is try to fail.

…1354) This behaviour is the same as in Git.

Less IO operations, and less that can go wrong.

When the metadata of a symlink's target cannot be obtained, even if the error is something other than `NotFound`, this falls back to creating file symbolic links. This only affects scenarios where either the checkout would fail entirely or where the symlink would have been treated as a collision and skipped (even though it was not really a collision, since only its target had an error). Other cases are not affected, and all exisitng scenarios where directory symlink would be created will still create directory symlinks. This builds on 31d02a8 (GitoxideLabs#1363) by supporting dangling symlinks even when the target filenames are unusual, such as when the name is invalid or reserved. Windows permits such symlinks to be created, and going ahead and creating the matches the Git behavior. This should also support other errors beisdes `NotFound`. For example, some permissions-related errors, in some cases where traversal or acccess (even to access metadata) are not allowed, would fail to create a symlink. This should address that as well. This works by using `Path::is_dir()` in the standard library, which automatically converts all errors (not just `NotFound`) into `false`. The logic here is thus quite similar to what was already present, just more tolerant, even though the code itself is shorter and simpler.

When the metadata of a symlink's target cannot be obtained, even if the error is something other than `NotFound`, this falls back to creating file symbolic links. This only affects scenarios where either the checkout would fail entirely or where the symlink would have been treated as a collision and skipped (even though it was not really a collision, since only its target had an error). Other cases are not affected, and all exisitng scenarios where directory symlink would be created will still create directory symlinks. This builds on 31d02a8 (GitoxideLabs#1363) by supporting dangling symlinks even when the target filenames are unusual, such as when the name is invalid or reserved. Windows permits such symlinks to be created, and going ahead and creating the matches the Git behavior. This should also support other errors beisdes `NotFound`. For example, some permissions-related errors, in some cases where traversal or acccess (even to access metadata) are not allowed, would fail to create a symlink. This should address that as well. This works by using `Path::is_dir()` in the standard library, which automatically converts all errors (not just `NotFound`) into `false`. The logic here is thus quite similar to what was already present, just more tolerant, even though the code itself is shorter and simpler. This fixes GitoxideLabs#1420, and also fixes GitoxideLabs#1421.

EliahKagan · 2026-01-05T20:01:46Z

But I am not sure how the defaults are set on CI, where one would hope a failing command indeed triggers failure by default.

I had meant to replied to this, but it looks like I forgot. This is relevant to #2340 (comment), which reminded me of it, but I think other aspects related to the problem are more central there. So I figured I'd reply here, as I'd meant to. (I'll also reply there, about those aspects I think are more central to what we're discussing there.)

See also #1559, which fixed a bug related to fail-fast behavior inside a script step. (That shouln't be confused with #2342, which fixed a conceptually similar but worse bug that was not caused by script-specific semantics.)

The what

PowerShell does not generally treat an external command that fails with a nonzero exit status as an error. So while it can stop on errors, and it can be made to return the exit status of the last command it runs, as far as I know there is no simple foolproof way to implicitly cause intermediate external commands in a multi-command script to terminate the script (and indicate failure).

The way PowerShell is run by default on CI does not attempt to do that, both back when this PR was merged and today. For example, consider this test workflow:

name: Test if PowerShell fails fast

on: [push, pull_request, workflow_dispatch]

jobs:
  powershell-failfast:
    runs-on: windows-latest

    steps:
      - name: Check out repository
        uses: actions/checkout@v4
      - name: External command reporting failure
        run: |
          echo a >t
          git add t
          echo b >>t
          git diff --exit-code -- t  # Should fail.
          echo $LASTEXITCODE
          git log --oneline -1  # Should succeed, if run.
      - name: Cmdlet reporting an error
        run: |
          cat foo
          echo done

As shown in a test run:

The git diff command has an exit status of 1, but the first script step that contains it continues--and reports success, since the last command it runs has an exit status of 0.
In contrast, since cat in PowerShell on Windows is an alias for the Get-Content cmdlet, when it fails to open a file, this is an error closer to PowerShell. That fails the step that runs it, and does so immediately, not running the subsequent command.

The why

This doesn't entirely answer the question of why PowerShell is not run by default on GitHub Actions runners (with the pwsh or powershell values of the shell key, or behavior on Windows that defaults to them based on availability in that order) with something like -e.

I think the reason is that PowerShell doesn't easily support doing this. My recollection is that it can be done for simple cases, but that this feature in PowerShell deviates from the behavior of POSIX -e in significant ways that developers would find unexpected if it were turned on automatically, relating to the distinction in PowerShell between an error and the result of a test.

Actually, POSIX-compatible shells also have subtleties where the behavior of -e is unexpected a significant fraction fo the time. But my understanding is that making PowerShell approximate the behavior of set -e neither well-approximates the real behavior of set -e nor approximates the effect people tend to assume set -e has.

In practice, I don't usually see problems in GHA steps arising from unexpected mismatch between what set -e does and what one may intuitively think it does. In contrast, the absence of fail-fast behavior in PowerShell is very much unexpected, when PowerShell is used implicitly on CI for script steps that are conceptually "lists of commands to run in any way." Ironically, this seems not to cause problems in big complex PowerShell script steps, where the semantics of PowerShell are front and center.

Byron force-pushed the status branch from ce2a9d4 to 044dd74 Compare May 8, 2024 06:49

fix MSRV check on Windows

91b6549

turns out that Powershell is used by default there, and variable substitutions are different there. What's worse is that it doesn't seem to fail if something doesn't work as one would expect.

Byron force-pushed the status branch from 044dd74 to 91b6549 Compare May 8, 2024 06:52

Byron mentioned this pull request May 8, 2024

Bump the github-actions group with 6 updates #1362

Merged

Byron added 2 commits May 9, 2024 16:27

improve docs to be more approachable from git2

5197b5a

fix: assure writing invalid dates doesn't panic (#1367).

3448fd9

Byron linked an issue May 11, 2024 that may be closed by this pull request

Assertion failure crash in gix_date::time::write::<impl gix_date::Time>::write_to #1367

Closed

fix: use tempfile permissions support to set the correct mode on unix.

7b3dc92

Previousoly it would make an additional syscall to change permissions, which is slower than necessary.

Byron force-pushed the status branch 5 times, most recently from 9ca81df to 7c9186a Compare May 11, 2024 19:39

Byron force-pushed the status branch from 7c9186a to addf446 Compare May 12, 2024 09:24

Byron force-pushed the status branch 5 times, most recently from 25e43ec to db2acf9 Compare May 13, 2024 12:29

Byron linked an issue May 13, 2024 that may be closed by this pull request

Checking out a dangling symlink on Windows is treated as a hard error #1354

Closed

Byron added 3 commits May 13, 2024 15:16

adapt to changes in gix-pack

bad5b48

fix: default to creating file-symlinks if it is dangling on Windows (#…

31d02a8

…1354) This behaviour is the same as in Git.

improve the symlink probing by simplifying it

7a3c583

Less IO operations, and less that can go wrong.

Byron force-pushed the status branch from db2acf9 to 7a3c583 Compare May 14, 2024 07:40

Byron merged commit 04ef31e into main May 14, 2024

Byron mentioned this pull request May 14, 2024

tree -> index diff for status #1368

Merged

9 tasks

EliahKagan mentioned this pull request May 17, 2024

Work around nextest macro lib compatibility bug #1372

Merged

This was referenced Jun 26, 2024

Checkout fails when Windows symlinks have strangely named targets #1420

Closed

PermissionDenied checking Windows symlink target is misinterpreted as collision #1421

Closed

Test suite does not assert directory symlink creation #1422

Closed

EliahKagan mentioned this pull request Jan 5, 2026

Replace dtolnay/rust-toolchain with manual rustup #2340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tree -> index diff for status#1363

tree -> index diff for status#1363
Byron merged 9 commits intomainfrom
status

Byron commented May 7, 2024

Uh oh!

EliahKagan commented May 8, 2024

Uh oh!

Byron commented May 8, 2024

Uh oh!

EliahKagan commented May 8, 2024

Uh oh!

Byron commented May 8, 2024

Uh oh!

EliahKagan commented Jan 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Byron commented May 7, 2024

diff-correctness → gix-status → gix reset

Architecture

Extra Tasks

Tasks

Status Enables

Inbetween

Next PR: Reset

Postponed

Research

Re-learn

Uh oh!

EliahKagan commented May 8, 2024

Uh oh!

Byron commented May 8, 2024

Uh oh!

EliahKagan commented May 8, 2024

Uh oh!

Byron commented May 8, 2024

Uh oh!

EliahKagan commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The what

The why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diff-correctness → `gix-status` → gix reset

EliahKagan commented Jan 5, 2026 •

edited

Loading