Fix GetChanges issue with changes in both sides#530
Fix GetChanges issue with changes in both sides#530dpordomingo wants to merge 3 commits intosrc-d:masterfrom
Conversation
4e47261 to
20be8b5
Compare
|
Should we keep this PR open until we implement a fix? Otherwise CI will always fail. |
|
@carlosms yes, we should not merge till I commit the fix; |
20be8b5 to
94d2814
Compare
|
@carlosms I don't like the idea of using (anyhow, integration tests pass !) |
|
I don't think using The main concern I have with the PR is here, 94d2814#diff-e0472a496fbaf199da26f7eb2280bd86R88 path := path.Join(libRoot, base.Repository().Host, base.Repository().Owner, base.Repository().Name)Because it's duplicating something done by func (l *Library) repositoryPath(url *lookout.RepositoryInfo) string {
return fmt.Sprintf("%s/%s", url.Host, url.FullName)
}And if we modify So I'm thinking it would be better to change For // RepositoryPath returns the absolute path for the given repository
func (l *Library) RepositoryPath(url *lookout.RepositoryInfo) string {
return path.Join(l.fs.Root(), l.relativeRepositoryPath(url))
}
func (l *Library) relativeRepositoryPath(url *lookout.RepositoryInfo) string {
return fmt.Sprintf("%s/%s", url.Host, url.FullName)
} |
|
And other comments that maybe don't apply because this is a WIP, but things that I think would be good to have in the final code:
|
…idate Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
519ee3f to
e258b22
Compare
|
Thanks for your review! Considering that accessing disc that way seemed a bit ugly, and following @mcuadros recommendation, I reused my work from past OSD and brought here a proper way to get the common ancestor using I left that feature in a separate package TODO:
(last one could be done after merging this, if maintaner agrees) |
65ffeca to
6bf2209
Compare
|
LGTM. I think it's best to finish all work on this PR (including any needed tests), possibly getting a quick reivew from @src-d/data-retrieval for the |
6bf2209 to
b891e16
Compare
36eec42 to
69378e5
Compare
69378e5 to
b2dfaf3
Compare
b2dfaf3 to
1372dd5
Compare
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
Signed-off-by: David Pordomingo <David.Pordomingo.F@gmail.com>
1372dd5 to
d39d6b4
Compare
|
Tests are already there.
In case someone from @src-d/data-retrieval or @smola wants to review this PR, I think they could focus only on the new I also pull requested |
| // don't have a common history. That PR won't be able to be created in | ||
| // GitHub, but in other situations (ie. with lookout-sdk), an analyzer | ||
| // could be interested in analyzing the difference between both commints. | ||
| return base, nil |
There was a problem hiding this comment.
why not return an error? If you ask for the common ancestor and it does not exist, maybe returning base makes other things fail later, without any hint of why.
There was a problem hiding this comment.
An analyzer can be interested on reviewing changes between orphan branches. Couldn't it?
There was a problem hiding this comment.
what's the behaviour of git diff base...head if they do not have a common ancestor?
There was a problem hiding this comment.
I tried locally, and git diff base...head behaves as .. if there is no shared history between them?
For curiosity, I checked git/git source code, and it seems that ... behaves that way only if there is at least one "merge-base" between base and head; otherwise it seems to fallback as ... Anyhow I'm not used to git codebase.
| res := commits | ||
| for i := start; i < len(commits); i++ { | ||
| from := commits[i] | ||
| fromHistoryIter := object.NewCommitIterBSF(from, nil, nil) |
There was a problem hiding this comment.
I don't fully understand the algorithm here so this is only a comment for future improvements.
It seems to me that this is walking the commit tree almost fully several times. This may be ok for small repositories but it may have a penalty for bigger ones like kubernetes. Testing it with a PR in kubernetes would be be interesting.
There was a problem hiding this comment.
Thanks @jfontan for your review!!!
Let's see if 👇 this helps you to understand the algorithm.
Yes, the history can potentially be walked as many times as candidates provided for Independents.
But every time a candidate is found as an ancestor of any other, it is removed from the candidates list, so it won't be considered anymore.
The order in which you request the candidates, affects how many times the history is traversed.
In the worst scenario, the history will be traversed as many times as orphan branches are included in the candidates to be analyzed.
given this history:
o----o----B~2---B^---B
/ /
--A~4---o---A~2---o---A
---o---C^---C
then:
- (A)
Independents(B, B^);1step; walks the history only once, because starting byB, it reaches in one stepB^; onceB^is reached, it is removed from the candidates list, and since there only one candidate left (B), it is returned with no more steps. - (B)
Independents(B, B^, B~2);2steps; walks only once (as in the previous example), but it requires two steps to reachB^and thenB~2 - (C)
Independents(B, A~4);8steps; walks the history only once (as in the previous example), but it requires many steps to reachA~4 - (D)
Independents(B^, B);1full + 1step; walks the history twice, because first iteration walking fromB^does not reachBafter a full walk fromB^; then in the second iteration walking fromBit is the same case as (A), so it is returnedAwith only one extra step - (E)
Independents(B, A);2full; walks the history twice; because during first walk fromBit won't be reachedA, and during the second walk fromA, it won't be reachedB - (F)
Independents(B, A, B^);2full; walks the history twice; first time, it will be foundB^as an ancestor, so it will be removed from candidates list; then it will be the same case than E - (G)
Independents(B, B^, B~2, A, A~2, A~4, C, C^);3full; because during the first history walk fromBit will be removedB^,B~2,A~2,A~4from the candidates list; then the second iteration will consider onlyB,A,CandC^, then, walking fromA, it won't find any ancestor in the candidates list; the third iteration, starting fromCwill find (and remove)C^from the list, so the result will be onlyA,BandC.
possible optimizations:
- sort the candidates by
Committer.When descbefore start processing them, so most examples of (D) (Independents(B^, B)) will be converted into (A) cases (Independents(B,B^)); it won't work whenCommitter.Whendoes not reflect the topological order (in such cases, it will cause the problem tried to be solved, but I think it would be a "strange" case) - store all traversed commits and exclude them in the next history walk. It will make that (E)
Independents(B, A)will require only1full+2stepsinstead of2fullbecause the first iteration (fromB) will be used to shortcut the second one (fromA) because sinceA~2,AsharesBhistory. - minor fix: replace
s/commits/resfromline 92to exclude already excluded ancestors.
and thanks to your question, I spotted a bug:
- replace
startfromline 110withfind(from, res) + 1)(pseudocode), because otherwise the iteration starts from the same place as many times as an ancestor is found from thestartposition, what is clearly too much, and does not reflect the desired (and described) behavior.
jfontan
left a comment
There was a problem hiding this comment.
Other than checking of the method works well with bigger repos it LGTM.
|
I have just reviewed code and I propose to focus on merging go-git related stuff into go-git instead of lookout. According to the logs this problem isn't common for lookout. Staging listens to dozen repositories and we didn't have this errors for months so we can wait for the fix without rushing in my opinion. I really appreciate your work David, though merging it into lookout and supporting it here makes me worried. |
|
For sure I agree with you @smacker that this feature should be in |
|
As suggested by @smacker and @smola the core of |
fix #486
depends on #532depends on #526depends on src-d/lookout-test-fixtures#22depends on src-d/lookout-sdk#76
This PR fix #486, so when an analyzer reviews the PR of
get-changes-from-outdated-pr-candidateagainstget-changes-from-outdated-pr, it should not fail (as it's currently happening here)breaking changes
DataService.GetChangesis currently returning the differences betweenbaseandhead, so when this is merged, it will return the differences betweengit merge-base base headandhead.In case any analyzer wants to get the changes between
baseandheadit should configure theChangeRequest.TwoDotsMode, astrueas described by src-d/lookout-sdk#76alternatives:
Implement it inside of
go-gitas proposed by src-d/go-git#1078implementation details:
This PR contains:
service/git/merge_basepackage (that implements it usinggo-git)DataService.GetChangesservice/git/merge_baseAll pieces inside of
service/git/merge_baseare documented in src-d/go-git#1078.