Conversation
f71396e to
8432e97
Compare
2f0d55d to
8bb7d66
Compare
|
For the parallelism: Another rating you can consider is a workstealking executed (basically rayon nut I believe you said you already have one in core). That will call the equivalent of rayon_core:Scope::spawn for every dir entry and update any state. You can abstract the join function to switch between single threaded and multi threaded. Then you can just transverse the directories with std and have full manual control. |
|
Thanks for sharing, I am glad you think this can be parallelised! I was thinking to have the following progression:
The above would already satisfy as the integration into the top-level status call will then run a bunch of different status-tasks in parallel and we'll see what that does to the overall performance (after all, there are 3 different status operations needed to get what I already checked the potential for parallelisation on the WebKit repository, and on my machien it was close to 2X faster when having a parallel walk, merely by comparing The simplest way, in theory, would be to see if the algorithm in all its complexity can work with an iterator, as then one could just Of course, it would be more optimal to use parallelise directly, and but at least for now I am so ignorant towards Long story short, I definitely plan for future upgrades and will do my best to have a bullet-proof test-suite that assures everything still works even after massive refactors that might come with these changes. |
|
Just now I realised that the aforementioned 3 steps have interdependencies, as the |
|
Hmm I didn't match the git implementation quite as closely when I worked on the status stuff but it did seems like this was more of a superficial limitation of the implementation rathee than an actual interdependence. Doing serial first makes lots of sense ofcourse. |
It changed quite a bit since then and doesn't have mutable access to the index anymore either, instead it emits change requests (stat updates) specifically. Thus the one orchestrating these calls must know about this flag and set it before calling the dirwalk. Effectively, having this allows for an optimisation as otherwise another At least that's my current understanding, thus far I think it will all be fine and optimisation comes later. Even now I am very optimistic that the implementation here will be en-par with Git, while |
a9ece59 to
b4691eb
Compare
84e1b5f to
4ed507e
Compare
cf311e3 to
109505a
Compare
That way it's easier to use them without adding own dependnecies.
This way it's possible to match partial input against a pathspec to see if this root would have a chance to actually match.
…atch`. With it the caller can learn how or why the pathspec matched, which allows to make decisions based on it that are relevant to the user interface.
…f a pathspec matches. That way it's possible to see if some paths can never match.
That way, the caller can obtain the amount of patterns more easily.
85b30d3 to
f3aea6a
Compare
The new implementation does the same as Git, and initializes an alternative lookup location. All previous implementations with `_icase` suffix were removed without replacement.
…ns were available. These are only relevant when reading, and have to be regenerated after writing. During reading, they can speed up performance by allowing multi-threading.
8d413ba to
fa1eda0
Compare
That way it's possible to obtain the current working directory with which the repository was opened.
That way it's possible to perform arbitrary directory walks, useful for status, clean, and add.
9a11d02 to
9f9b0d7
Compare
Based on #1219
diff-correctness →
gix-status→ gix resetImprove
gix statusto the point where it's suitable for use inresetfunctinoality.Leads to a proper worktree reset implementation, eventually leading to a high-level reset similar to how git supports it.
Architecture
The reason this PR deals quite a bit with
gix statusis that for a safe implementation ofreset()we need to be sure that the files we would want to touch don't don't carry modifications or are untracked files. In order to know what would need to be done, we have to diff thecurrent-index with target-index. The set of files to touch can then be used to lookup information provided bygit-status, like worktree modifications, index modifications, and untracked files, to know if we can proceed or not. Here is also where the reset-modes would affect the outcome, i.e. what to change and how.This is a very modular approach which facilitates testing and understanding of what otherwise would be a very complex algorithm. Having a set of changes as output also allows to one day parallelize applying these changes.
This leaves us in a situation where the current
checkout()implementation wants to become a fastpath for situations where the reset involves an empty tree as source (i.e. create everything and overwrite local changes).On the way to
reset()it's a valid choice to warm up more with the matter by improving on the currentgix statusimplementation and assure correctness of what's there, which currently doesn't seem to be the case in comparison. Further, implementinggix statussimilarly togit statusshould be made possible.Research of Walk
index,untrackedandignoredsupport, as well as empty-dir and.gitdir/file handlinggix addthenWebKitthis optimization is very important, even though we might be able to have some easy wins with parallelization of dirwalk + modification check, as mod-check takes 1.8s at best, and dirwalk takes 1.5s single-threaded.Tasks
dir/*). It's probably about being exact matches and what not, also works with prefix-matches likedir/ordir(but it knows how much matched to know what to return, files or the containing directory).also figure out how pathspecs are used when determining if recursion is allowed - can it force recursion?- postponedgix cleanclean -nxd '*.o'should only list matching files, not collapse directories (in delete mode)generated/*.oclean -nx d/dmatches and forces the directory to be shown even though no-dis given:!*generated*seems to expand to muchgixgix cleanuses walkdir fromgixNext PR
dirwalk()intogix statusto learn more about it (and its performance)gixcrate with index-worktreediff tree with index (with reverse-diff functionality to simulate diff of index with tree), for better performance as it
would avoid having to allocate a whole index even though we are only interested in a diff.
Benchmarks
gixcan hold its own and comes in close toGit, but only if it doesn't do any additional work by means of precomposing unicode or building an ignore-case hash table (which Git always builds). Interestingly, Git doesn't get slowed at all by any of these, which is surprising as precompose unicode costs about 20% of the time in our case.Multithreading the hashtable generation seems to be the way to go.
Data
Status Enables
gitoxidebackend Byron/built#1Next PR: Reset
reset()that checks if it's allowed to perform a worktree modification is allowed, or if an entry should be skipped. That way we can postpone safety checks like --hardPostponed
What follows is important for resets, but won't be needed for
cargoworktree resets.gix index entriesto optionally expand sparse entriesgix statuswith actual submodule support - needsstatusingix(crate) effectivelygix statuswith actual conflict supportResearch
gix statuscan deal a little better with submodules. Even though in this case a lot of submodule-related information is needed for a complete reset, probably only doable by a higher-level caller which orchestrates it.mergeandkeep? How to controlrefresh? Maybe partial (only the files we touch), and full, to also update the files we don't touch as part of status? Maybe it's part of status if that is run before.git resetandgit checkoutin terms ofHEADmodifications. With the former changingHEADs referent, and the latter changingHEADitself.checkout()method as technically that's areset --hardwith optional overwrite check. Could it be rolled into one, with pathspec support added?reset()performs just as well, which is unlikely as there is more overhead. But maybe it's not worth to maintain two versions over it. But if so, one should probably rename it.git status: what about rename tracking? It's available for tree-diffs and quite complex on its own. Probably only needs HEAD-vs-index rename tracking. No, also can have worktree rename tracking, even though it's hard to imagine how this can be fast unless it's tightly integrated with untracked-files handling. This screams for a generalization of the tracking code though as the testing and implementation is complex, but should be generalisable.Re-learn
pathspecsnormalize themselves to turn from any kind of specification into repo-root relative patterns...and that root will be always be used to open files like../.gitignore, which is useful for display to the user)