improve git-diff documentation and A...B handling#804
improve git-diff documentation and A...B handling#804chris3torek wants to merge 3 commits intogit:masterfrom
Conversation
Welcome to GitGitGadgetHi @chris3torek, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that your Pull Request has a good description, as it will be used as cover letter. Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txtTo iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description): To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
|
/allow |
|
User chris3torek is now allowed to use GitGitGadget. |
6627964 to
8c78277
Compare
|
/preview |
|
Preview email sent as pull.804.git.git.1591650208.gitgitgadget@gmail.com |
|
/preview |
|
Preview email sent as pull.804.git.git.1591652002.gitgitgadget@gmail.com |
2801c1d to
9318365
Compare
|
/submit |
|
Submitted as pull.804.git.git.1591661021.gitgitgadget@gmail.com |
The autosquash-and-exec test used "git diff HEAD^!" to mean "git diff HEAD^ HEAD". Use these directly instead of relying on the undefined but actual-current behavior of "HEAD^!". Signed-off-by: Chris Torek <chris.torek@gmail.com>
|
/preview |
|
Preview email sent as pull.804.v2.git.git.1591724505.gitgitgadget@gmail.com |
|
/submit |
|
This must have been sent to me by mistake.
…On Tue, Jun 9, 2020, 3:02 PM gitgitgadget-git[bot] ***@***.***> wrote:
Submitted as ***@***.***
***@***.***>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#804 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQATXK3LVFJW5EXD2SVGGDRV2BK7ANCNFSM4NYSO44A>
.
|
b9b4c6f to
a7da92c
Compare
|
/submit |
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).
This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.
Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:
* If there is a symmetric difference merge base, this is used
as the left side of the diff. The last final ref is used as
the right side.
* If there is no merge base, the symmetric status is completely
lost. We will produce a combined diff instead.
Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.
To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense. As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Document the usage for producing combined commits with "git diff". This includes updating the synopsis section. While here, add the three-dot notation to the synopsis. Make "git diff -h" print the same usage summary as the manual page synopsis, minus the "A..B" form, which is now discouraged. Signed-off-by: Chris Torek <chris.torek@gmail.com>
a7da92c to
d7bc9ac
Compare
|
/submit |
| @@ -6,6 +6,7 @@ | |||
| #define USE_THE_INDEX_COMPATIBILITY_MACROS | |||
There was a problem hiding this comment.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +struct symdiff {
> + struct bitmap *skip;
> + int warn;
> + const char *base, *left, *right;
> +};
> +
> +/*
> + * Check for symmetric-difference arguments, and if present, arrange
> + * everything we need to know to handle them correctly. As a bonus,
> + * weed out all bogus range-based revision specifications, e.g.,
> + * "git diff A..B C..D" or "git diff A..B C" get rejected.
> + *
> + * For an actual symmetric diff, *symdiff is set this way:
> + *
> + * - its skip is non-NULL and marks *all* rev->pending.objects[i]
> + * indices that the caller should ignore (extra merge bases, of
> + * which there might be many, and A in A...B). Note that the
> + * chosen merge base and right side are NOT marked.
> + * - warn is set if there are multiple merge bases.
> + * - base, left, and right point to the names to use in a
> + * warning about multiple merge bases.
> + *
> + * If there is no symmetric diff argument, sym->skip is NULL and
> + * sym->warn is cleared. The remaining fields are not set.
> + */
OK.
> +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
> +{
> + int i, is_symdiff = 0, basecount = 0, othercount = 0;
> + int lpos = -1, rpos = -1, basepos = -1;
> + struct bitmap *map = NULL;
> +
> + /*
> + * Use the whence fields to find merge bases and left and
> + * right parts of symmetric difference, so that we do not
> + * depend on the order that revisions are parsed. If there
> + * are any revs that aren't from these sources, we have a
> + * "git diff C A...B" or "git diff A...B C" case. Or we
> + * could even get "git diff A...B C...E", for instance.
> + *
> + * If we don't have just one merge base, we pick one
> + * at random.
> + *
> + * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> + * so we must check for SYMMETRIC_LEFT too. The two arrays
> + * rev->pending.objects and rev->cmdline.rev are parallel.
> + */
> + for (i = 0; i < rev->cmdline.nr; i++) {
> + struct object *obj = rev->pending.objects[i].item;
> + switch (rev->cmdline.rev[i].whence) {
> + case REV_CMD_MERGE_BASE:
> + if (basepos < 0)
> + basepos = i;
> + basecount++;
> + break; /* do mark all bases */
We find and use the first found merge base (i.e. "pick at random" as
promised in the comment before the function), but for warning, keep
track of how many merge bases there are.
> + case REV_CMD_LEFT:
> + if (lpos >= 0)
> + usage(builtin_diff_usage);
A range (either A..B or A...B) has already been seen, and we have
another, which is now rejected.
> + lpos = i;
> + if (obj->flags & SYMMETRIC_LEFT) {
> + is_symdiff = 1;
> + break; /* do mark A */
Inside this switch statement, "continue" is a sign that the caller
should use the rev, and "break" is a sign that the rev is to be
ignored. We obviously do not ignore "A" in ...
> + }
> + continue;
... "A..B" notation, so we "continue" here.
> + case REV_CMD_RIGHT:
> + if (rpos >= 0)
> + usage(builtin_diff_usage);
Here is the same "we reject having two or more ranges".
I actually suspect that this usage() would become dead code---we
would already have died when we saw the matching left end of the
second range (so this could become BUG(), even though usage() does
not hurt).
> + rpos = i;
> + continue; /* don't mark B */
And of course, whether "A..B" or "A...B", B will be used as the
"result" side of the diff, so won't be marked for skipping.
> + case REV_CMD_PARENTS_ONLY:
> + case REV_CMD_REF:
> + case REV_CMD_REV:
> + othercount++;
> + continue;
I wonder if we want to use "default" instead of these three
individual cases. Pros and cons?
- If we forgot to list a whence REV_CMD_* here, it will be silently
marked to be skipped with this code. With "default", it will be
counted to be diffed (which may trigger "giving too many revs to
be diffed" error from the diff machinery, which is good).
- With "default", when we add new type of whence to REV_CMD_* and
forget to adjust this code, it will be counted to be diffed.
With the current code, it will be skipped.
We probably could get the best of the both words by keeping the
above three for "counted in othercount and kept), and then add a
default arm to the switch() that just says
default:
BUG("forgot to handle %d",
rev->cmdline.rev[i].whence);
That way, every time we add a new type of whence, we would be forced
to think what should be done to them.
> + }
> + if (map == NULL)
> + map = bitmap_new();
> + bitmap_set(map, i);
> + }
> +
> + /*
> + * Forbid any additional revs for both A...B and A..B.
> + */
> + if (lpos >= 0 && othercount > 0)
> + usage(builtin_diff_usage);
Meaning "git diff A..B C" is bad. Reasonable.
> + if (!is_symdiff) {
> + bitmap_free(map);
It is not wrong per-se to free it unconditionally, but wouldn't it
be a bug if (map != NULL) at this point in the flow?
The merge bases would only be stuffed in the revs when A...B is
given, and we are not skipping anything involved in A..B or
non-range revs.
> + sym->warn = 0;
> + sym->skip = NULL;
Clearing these two fields are as promised to the callers in the
comment above, which is good.
> + return;
> + }
> +
> + sym->left = rev->pending.objects[lpos].name;
> + sym->right = rev->pending.objects[rpos].name;
> + sym->base = rev->pending.objects[basepos].name;
> + if (basecount == 0)
> + die(_("%s...%s: no merge base"), sym->left, sym->right);
Good.
> + bitmap_unset(map, basepos); /* unmark the base we want */
Hmph. You could
case REV_CMD_MERGE_BASE:
basecount++;
if (basepos < 0) {
basepos = i;
continue; /* keep this one */
}
break; /* skip all others */
and lose this unset(). I do not think it makes too much of a
difference, but it probably is easier to follow if we avoided this
"do something and then come back to correct" pattern.
> + sym->warn = basecount > 1;
> + sym->skip = map;
> +}
> +
> int cmd_diff(int argc, const char **argv, const char *prefix)
> {
> int i;
> @@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> struct object_array_entry *blob[2];
> int nongit = 0, no_index = 0;
> int result = 0;
> + struct symdiff sdiff;
>
> /*
> * We could get N tree-ish in the rev.pending_objects list.
> @@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> }
> }
>
> + symdiff_prepare(&rev, &sdiff);
> for (i = 0; i < rev.pending.nr; i++) {
> struct object_array_entry *entry = &rev.pending.objects[i];
> struct object *obj = entry->item;
> @@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> obj = &get_commit_tree(((struct commit *)obj))->object;
>
> if (obj->type == OBJ_TREE) {
> + if (sdiff.skip && bitmap_get(sdiff.skip, i))
> + continue;
By the way, I cannot shake this feeling that, given that
rev.pending/cmdline.nr will not be an unreasonably large number, if
it is overkill to use the bitmap here. If I were writing this code,
I would have made symdiff_prepare() to fill a separate object array
by copying the elements to be used in the final "diff" out of the
rev.pending array and updated this loop to iterate over that array.
I am not saying that such an approach is better than the use of
bitmap code here. It just was a bit unexpected to see the bitmap
code used for set of objects that is typically less than a dozen.
Thanks.
There was a problem hiding this comment.
On the Git mailing list, Chris Torek wrote (reply to this):
Ugh, forgot to tweak gmail reply to go to the mailing list. Also
I typed in a wrong word ("commit" should be "comment").
Corrected reply:
On Fri, Jun 12, 2020 at 11:31 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +struct symdiff {
> > + struct bitmap *skip;
> > + int warn;
> > + const char *base, *left, *right;
> > +};
> > +
> > +/*
> > + * Check for symmetric-difference arguments, and if present, arrange
> > + * everything we need to know to handle them correctly. As a bonus,
> > + * weed out all bogus range-based revision specifications, e.g.,
> > + * "git diff A..B C..D" or "git diff A..B C" get rejected.
> > + *
> > + * For an actual symmetric diff, *symdiff is set this way:
> > + *
> > + * - its skip is non-NULL and marks *all* rev->pending.objects[i]
> > + * indices that the caller should ignore (extra merge bases, of
> > + * which there might be many, and A in A...B). Note that the
> > + * chosen merge base and right side are NOT marked.
> > + * - warn is set if there are multiple merge bases.
> > + * - base, left, and right point to the names to use in a
> > + * warning about multiple merge bases.
> > + *
> > + * If there is no symmetric diff argument, sym->skip is NULL and
> > + * sym->warn is cleared. The remaining fields are not set.
> > + */
>
> OK.
>
> > +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
> > +{
> > + int i, is_symdiff = 0, basecount = 0, othercount = 0;
> > + int lpos = -1, rpos = -1, basepos = -1;
> > + struct bitmap *map = NULL;
> > +
> > + /*
> > + * Use the whence fields to find merge bases and left and
> > + * right parts of symmetric difference, so that we do not
> > + * depend on the order that revisions are parsed. If there
> > + * are any revs that aren't from these sources, we have a
> > + * "git diff C A...B" or "git diff A...B C" case. Or we
> > + * could even get "git diff A...B C...E", for instance.
> > + *
> > + * If we don't have just one merge base, we pick one
> > + * at random.
> > + *
> > + * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> > + * so we must check for SYMMETRIC_LEFT too. The two arrays
> > + * rev->pending.objects and rev->cmdline.rev are parallel.
> > + */
> > + for (i = 0; i < rev->cmdline.nr; i++) {
> > + struct object *obj = rev->pending.objects[i].item;
> > + switch (rev->cmdline.rev[i].whence) {
> > + case REV_CMD_MERGE_BASE:
> > + if (basepos < 0)
> > + basepos = i;
> > + basecount++;
> > + break; /* do mark all bases */
>
> We find and use the first found merge base (i.e. "pick at random" as
> promised in the comment before the function), but for warning, keep
> track of how many merge bases there are.
>
> > + case REV_CMD_LEFT:
> > + if (lpos >= 0)
> > + usage(builtin_diff_usage);
>
> A range (either A..B or A...B) has already been seen, and we have
> another, which is now rejected.
>
> > + lpos = i;
> > + if (obj->flags & SYMMETRIC_LEFT) {
> > + is_symdiff = 1;
> > + break; /* do mark A */
>
> Inside this switch statement, "continue" is a sign that the caller
> should use the rev, and "break" is a sign that the rev is to be
> ignored. We obviously do not ignore "A" in ...
>
> > + }
> > + continue;
>
> ... "A..B" notation, so we "continue" here.
>
> > + case REV_CMD_RIGHT:
> > + if (rpos >= 0)
> > + usage(builtin_diff_usage);
>
> Here is the same "we reject having two or more ranges".
>
> I actually suspect that this usage() would become dead code---we
> would already have died when we saw the matching left end of the
> second range (so this could become BUG(), even though usage() does
> not hurt).
Right - I considered it both ways and figured a second usage() was
simpler and straightforward, but I'm fine with either method.
> > + rpos = i;
> > + continue; /* don't mark B */
>
> And of course, whether "A..B" or "A...B", B will be used as the
> "result" side of the diff, so won't be marked for skipping.
>
> > + case REV_CMD_PARENTS_ONLY:
> > + case REV_CMD_REF:
> > + case REV_CMD_REV:
> > + othercount++;
> > + continue;
>
> I wonder if we want to use "default" instead of these three
> individual cases. Pros and cons?
>
> - If we forgot to list a whence REV_CMD_* here, it will be silently
> marked to be skipped with this code. With "default", it will be
> counted to be diffed (which may trigger "giving too many revs to
> be diffed" error from the diff machinery, which is good).
>
> - With "default", when we add new type of whence to REV_CMD_* and
> forget to adjust this code, it will be counted to be diffed.
> With the current code, it will be skipped.
>
> We probably could get the best of the both words by keeping the
> above three for "counted in othercount and kept), and then add a
> default arm to the switch() that just says
>
> default:
> BUG("forgot to handle %d",
> rev->cmdline.rev[i].whence);
I'm fine with this as well. I did it the way I did because at least some
compilers give a warning or error if you forgot an enum. Using default
(or adding one) defeats this, but the BUG method is reasonable.
> That way, every time we add a new type of whence, we would be forced
> to think what should be done to them.
>
> > + }
> > + if (map == NULL)
> > + map = bitmap_new();
> > + bitmap_set(map, i);
> > + }
> > +
> > + /*
> > + * Forbid any additional revs for both A...B and A..B.
> > + */
> > + if (lpos >= 0 && othercount > 0)
> > + usage(builtin_diff_usage);
>
> Meaning "git diff A..B C" is bad. Reasonable.
>
> > + if (!is_symdiff) {
> > + bitmap_free(map);
>
> It is not wrong per-se to free it unconditionally, but wouldn't it
> be a bug if (map != NULL) at this point in the flow?
Yes, because the A in A...B will have been marked. It didn't
seem worth a BUG() call though.
> The merge bases would only be stuffed in the revs when A...B is
> given, and we are not skipping anything involved in A..B or
> non-range revs.
>
> > + sym->warn = 0;
> > + sym->skip = NULL;
>
> Clearing these two fields are as promised to the callers in the
> comment above, which is good.
>
> > + return;
> > + }
> > +
> > + sym->left = rev->pending.objects[lpos].name;
> > + sym->right = rev->pending.objects[rpos].name;
> > + sym->base = rev->pending.objects[basepos].name;
> > + if (basecount == 0)
> > + die(_("%s...%s: no merge base"), sym->left, sym->right);
>
> Good.
>
> > + bitmap_unset(map, basepos); /* unmark the base we want */
>
> Hmph. You could
>
> case REV_CMD_MERGE_BASE:
> basecount++;
> if (basepos < 0) {
> basepos = i;
> continue; /* keep this one */
> }
> break; /* skip all others */
>
> and lose this unset(). I do not think it makes too much of a
> difference, but it probably is easier to follow if we avoided this
> "do something and then come back to correct" pattern.
OK, I'll change it to do this in the re-roll.
> > + sym->warn = basecount > 1;
> > + sym->skip = map;
> > +}
> > +
> > int cmd_diff(int argc, const char **argv, const char *prefix)
> > {
> > int i;
> > @@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> > struct object_array_entry *blob[2];
> > int nongit = 0, no_index = 0;
> > int result = 0;
> > + struct symdiff sdiff;
> >
> > /*
> > * We could get N tree-ish in the rev.pending_objects list.
> > @@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> > }
> > }
> >
> > + symdiff_prepare(&rev, &sdiff);
> > for (i = 0; i < rev.pending.nr; i++) {
> > struct object_array_entry *entry = &rev.pending.objects[i];
> > struct object *obj = entry->item;
> > @@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> > obj = &get_commit_tree(((struct commit *)obj))->object;
> >
> > if (obj->type == OBJ_TREE) {
> > + if (sdiff.skip && bitmap_get(sdiff.skip, i))
> > + continue;
>
> By the way, I cannot shake this feeling that, given that
> rev.pending/cmdline.nr will not be an unreasonably large number, if
> it is overkill to use the bitmap here. If I were writing this code,
> I would have made symdiff_prepare() to fill a separate object array
> by copying the elements to be used in the final "diff" out of the
> rev.pending array and updated this loop to iterate over that array.
>
> I am not saying that such an approach is better than the use of
> bitmap code here. It just was a bit unexpected to see the bitmap
> code used for set of objects that is typically less than a dozen.
I am sure it is overkill -- but at the same time, it's already coded
and cheap enough to use that rolling our own separate array felt
worse. I can add a comment (NOT COMMIT) if you like.
Thanks,
Chris
|
This branch is now known as |
|
This patch series was integrated into pu via 6984f1c. |
|
This patch series was integrated into pu via 2672de6. |
|
This patch series was integrated into pu via 752dee3. |
|
This patch series was integrated into next via e0b54a0. |
|
This patch series was integrated into seen via 15654e7. |
|
This patch series was integrated into seen via 80e98c5. |
|
This patch series was integrated into seen via 1457886. |
|
This patch series was integrated into next via 1457886. |
|
This patch series was integrated into master via 1457886. |
|
Closed via 1457886. |
git diff -hhelp is succinct, but perhaps too much so.The symmetric-diff syntax,
git diff A...B, is defined by the documentation to compare the merge base of A and B to commit B. It does so just fine when there is a merge base. It compares A and B directly if there is no merge base, and it is overly forgiving of bad arguments after which it can produce nonsensical diffs. It also behaves badly with other odd/incorrect usages, such asgit diff A...B C..D.The first patch simply adjusts a test that will fail if the second patch is accepted. The second patch adds special handling for the symmetric and range diff syntax so that the option parsing works, plus a small test suite. The third patch updates the documentation, including adding a section for combined commits, and makes the help output more verbose (to match the SYNOPSIS and provide common diff options like
git-diff-files, for instance).Changes since v3:
>/>=goof