Skip to content

adding vendor as submodule#3348

Merged
ldemailly merged 16 commits intomasterfrom
vendor-sub
Feb 15, 2018
Merged

adding vendor as submodule#3348
ldemailly merged 16 commits intomasterfrom
vendor-sub

Conversation

@ldemailly
Copy link
Copy Markdown
Member

@ldemailly ldemailly commented Feb 9, 2018

Alternative to #2691 with all the benefits and more, without the downsides:

Submodule is only:
Receiving objects: 100% (4145/4145), 8.26 MiB | 930.00 KiB/s, done.

Note: adding pruning instruction to Gopkg.toml makes vendor be 54Mb
instead of 407Mb

vendor checked in as https://github.com/istio-releases/vendor

also fixes #3483

Receiving objects: 100% (4145/4145), 8.26 MiB | 930.00 KiB/s, done.

Note: adding pruning instruction to Gopkg.toml makes vendor be 54Mb
instead of 407Mb
@ldemailly ldemailly requested a review from a team February 9, 2018 07:58
@ldemailly ldemailly mentioned this pull request Feb 9, 2018
@ldemailly ldemailly requested a review from costinm February 9, 2018 08:01
@munnerz
Copy link
Copy Markdown
Member

munnerz commented Feb 9, 2018

After reading the discussion over on #2691, I'm really not sure what benefit using a submodule vs just having the pruned vendor/ directory in this repo is, and can see a number of issues with having a submodule.

  • First, the points @smarterclayton raised still stand true: RFC: Commit vendor #2691 (comment)

    • Go tooling is notorious for playing badly with symlinks - I think dep might be a bit better, but I wouldn't trust go get etc. particularly much.
    • It's an extra barrier of entry for new users to understand
  • When would someone want a copy of istio/istio, but not want a copy of its dependencies? I'm really not sure when someone would want to do this and could not afford to just check out 65MB more sources. In 99% of cases, the user will immediately clone dependencies after cloning the main repo.

  • If there is a reason for users to want to grab istio/istio but without vendor/ (e.g. for samples/), this may be an argument to split that content out from the main repo (or publish those paths as manifests as part of Releases, or to GCS)

The point raised in #2691 by @jwendell also still stands - dep ensure will delete and then create the vendor/ subdirectory, which could cause problems with git submodules? (without more bash hacks) #2691 (comment).

I cloned istio a few days ago, and it took ~10-15m to get into a state where I could go build one of the e2e tests due to the sheer number of dependencies listed, and having to clone their full git tree (I work with an ADSL2+ connection most days 😧). So this 'lightweight repo' isn't really so lightweight at all. Sort of like shipping a car, boasting how light it is, but disclaiming with "engine shipped separately".

We could switch to a world where everything is gated behind scripts, and Istio becomes a 'special' project that cannot be built/worked with with plain ol' tooling, but I don't (personally) think this is a desirable state.

(just my 2c as an outside collaborator who has worked closely with the k/k dep managed pieces, including the staging repo/bot that 'exports' subdirectories from the k/k mono-repo so they can be consumed by 3rd party projects)

@ldemailly
Copy link
Copy Markdown
Member Author

have you tried this branch ? there is no symlink, no issue and it’s smaller and faster without polluting the main repo

@ldemailly
Copy link
Copy Markdown
Member Author

ldemailly commented Feb 9, 2018

let me answer your points one by one:

Go tooling is notorious for playing badly with symlinks - I think dep might be a bit better, but I wouldn't trust go get etc. particularly much.

there is no symlink and go get works fine with submodules

it's an extra barrier of entry for new users to understand

there is nothing special/extra to understand - it's just git. and if you "go get istio.io/istio/" you don't even need any setup. also today you have to grok our overly complex makefiles and init scripts, this would just make it simpler if anything

When would someone want a copy of istio/istio, but not want a copy of its dependencies?

When someone wants to look at istio's source code - maybe a README/doc change... plenty of reasons but that's not actually the driver for this (will come back to that at the end)

If there is a reason for users to want to grab istio/istio but without vendor/

There are - some docs and samples ask to clone istio when the release tar fall behind the documentation (but again this isn't the issue)

also still stands - dep ensure will delete and then create the vendor/ subdirectory, which could cause problems with git submodules? (without more bash hacks)

Did you try ? I did and it just works fine

Slow on home network

I agree and that's in part why I don't want to check in vendor in the main repo but again this option/PR is 8 mbytes download

We could switch to a world where everything is gated behind scripts, and Istio becomes a 'special' project that cannot be built/worked with with plain ol' tooling, but I don't (personally) think this is a desirable state.

I agree with you - this being said this is actually were we are now, and this PR would help simplify

If we do our job right "make" is all you need to do after cloning
(if we do even better, go get istio.io/istio)

So back to why we should not mix millions of code we don't author with code we do author in the same repo/revision history/set of PRs when there is an easy way to seperate the two yet achieve the goals of hermetic and faster start... While I would hope it's self evident, I can add some points:

  • Do you check in your .o and binaries with your source tree ?
  • Do you think robots updating thousands of files and humans making small PRs should compete ? (If seen that mistake made in the past and I can tell you it is not desirable)

@mattdelco
Copy link
Copy Markdown
Contributor

Re: "extra barrier of entry for new users to understand", for any such users besides myself know that you'll need to run a command like "git submodule update --init --recursive" to populate the vendor/ directory. I didn't find any docs that explained how git correlates commit shas across the repos (i.e., for a given commit sha on istio repo which sha of the vendor repo to use).

There's only 1 branch using this scheme so I couldn't readily experiment with how git handles changing branches (I could experiment with this branch vs. master [and back], which isn't a smooth transition but the annoyance would decrease over time). From what I read it seems like people have to explicitly request a submodule update (e.g., "git submodule update"). I suppose this is akin to needing to re-run the dep tool and it could be hookup up to the makefile but it'll probably catch a few people off-guard that vendor/ doesn't update like the rest of the tree.

Copy link
Copy Markdown
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see how it works - if problems are found with submodule we can checkin vendor directly.

Thanks Laurent !

/lgtm

@istio-merge-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@rshriram rshriram added the do-not-merge/hold Block automatic merging of a PR. label Feb 9, 2018
@rshriram
Copy link
Copy Markdown
Member

rshriram commented Feb 9, 2018

/lgtm cancel

Lets please discuss this properly. I only see 3 people out of a repo of 70 people.

@istio-merge-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Feb 9, 2018

My two cents:
We have a tough time handling cross repo stuff. We all know how painful it was to update a dependency across istio/pilot, istio/mixer, istio/auth and istio/istio repos.. There had to be a breaking change to one or other. So, how will this submodule thing, where deps are hosted in a different repo - help the cause?

If I have to update dependencies (where they are no longer compatible with the existing code in istio/istio), how will this work? I can't send a PR to istio/vendor because it will break ongoing tests/PRs in istio/istio. Nor can I simply send a actual code PR that depends on updated dependencies, as its not going to pass without dependency update. We would then have to force merge one to let the other go through. That becomes even more of a problem when the PR in question requires long review.

Is there a solid technical disadvantage to having a large repo (80M or whatever?). Does it have any impact on the final product we ship or does it impact our day to day development ? The big downside I can see there is that our PRs might be bigger, if we update vendors and code. As such, PRs with Goldens are hard to review. now, a PR with 100s of files will be equally harder.

On the other hand, this proposal of storing dependencies in a separate repo is not really helping either. It just moves the problem to a different state, where we often end up having to co-ordinate across two repos for dependency update. Imagine the consternation when PR one updates a dependency in istio/vendor and another PR undoes the update. None of us know much abotu the code in the dependencies. So, its going to be really hard to differentiate whether one PR is reverting another PRs update or not. This problem gains more importance when we have concurrent PRs in istio/istio, where one PR requires updated dependencies for the code it adds, while the other PR is still using an older version of the same code and cannot pass with updated dependencies.

@jwendell and co - I understand the use case RedHat has. But does it have to involve committing the entire vendor tree in istio/istio? Can't you track istio/istio and have a redhat/istio-vendor repo, that literally commits all the dependencies, such that any machine where you want to compile istio could essentially clone these two git repositories alone. You just run make build (without make depend). Shouldn't that work?

@jwendell
Copy link
Copy Markdown
Member

jwendell commented Feb 9, 2018

@rshriram Thanks for your comments. You summarized well the situation. The biggest cons (repo size and PR's that update dependencies) of having a single repo are small compared to the benefits.

About this PR in particular, like you and @mattdelco well said, I don't see a difference of having this submodule stuff and having to run dep ensure. Worse, it adds unnecessary complexity to the developer workflow, like explained above.

Regarding our downstream building, yep, this is the situation we have right now, having to maintain 2 repos or a fork of istio repo with vendor in. While not the ideal solution, it might work (at the cost of some extra work).

@costinm
Copy link
Copy Markdown
Contributor

costinm commented Feb 9, 2018

My point: we can submit, test how it works with tools like go get and dep update, and if we're happy
tweak the makefile and circleci.

If we're not happy after few days - we can revert, minimal harm done.

This PR shouldn't break anything - it's hard to evaluate how this will work, and wasting
time optimizing it and changing makefiles will actually make it more dangerous.

The main benefit of this PR is that it allows us to play with checked in vendor and evaluate
without having to submit all deps in istio/istio. It's just 2 small files.

@costinm
Copy link
Copy Markdown
Contributor

costinm commented Feb 9, 2018

Re. developer workflow: there should be no change except decrease in time for initial build.
Developer will still use make and init.sh - which will run the git submodule command instead of
dep update --vendoronly.

Git submodules are a standard git feature - it's not a symlink.

If things go well - I would be happy to add istio/api as a submodule as well.. If submodule creates
a problem - we can have another debate about checking in vendor in istio or some other hacks,
but at least we'll not need to debate the submodule option.

It's pretty clear that people have very different and strong opinions, so let's get more data.
If someone can make a prototype with scripts or duct tape - we can try that as well.

I'm of the strong opinion to checkin vendor in istio, but happy to see other options.

@mattdelco
Copy link
Copy Markdown
Contributor

When I was reading up on submodules I came across various various warnings about git's poor handling of transitioning between a submodule and a directory with the same name, so the experiment might not be so lightweight to try and/or revert. Regardless someone would first need to update init.sh to make calls to git rather than dep (and this person or another would need to think about how to allow building from a source tar).

I don't feel like I have enough data to compare how the options behave once there's a fair amount of change history in both, nor do I have enough background on why git might have less transfer overhead with a separate repo (nor do I know how branching or reproducibility is handled). If being lightweight was a high priority then perhaps personal branches on istio/istio should be banned in practice because those also seem to add overhead.

@mattdelco
Copy link
Copy Markdown
Contributor

As requested, I cooked up some changes to hook up the Makefile. I only focused on the common case of a dev trying to build the go binaries (i.e., I punted on tests and how someone updates the vendor repo with newer content):

vendor-sub...mattdelco:vendor-sub

Some of the key points in the Makefile:

  • I have init have an order-only dependency on "vendor/Gopkg.lock" so the absence of this file will cause the 1-time/first-time rule for "git submodule update --init vendor" to be run.
  • There's some disagreement about how aggressively make should run "go build" to pick up any new dev changes. The Makefile was recently changed to be much less aggressive than I'd like. Putting that aside, I did (mostly as a demo) add dependencies for "vendor/Gopkg.lock | depend". What this means is that when trying to build a go binary make will always run bin/init.sh, which now mostly just runs "git submodule update vendor" (plus some stuff for the envoy binary). I'm assuming that if vendor was updated by git (e.g., because istio was just rebased to a newer commit) then vendor/Gopkg.lock will always get updated. If Gopkg.lock gets updated then "go build" will get executed, otherwise make won't do anything for a rebuild. To get a better sense of what's going on in experimentation you make have to request the actual binary target (e.g., "make $(make where-is-out)/istioctl") rather than the alias (e.g., "make istioctl"). This was mostly a demo of a way to try to have make aggressively keep vendor/ up-to-date without necessarily causing a recompile as a side-effect. I wouldn't recommend going with this actual approach as-is because make isn't aggressively rebuilding so it won't rebuild a go binary after making changes to the source code.

Some of the key points in bin/init.sh:

  • I commented out stuff I didn't quite understand or thought was irrelevant for the target use cases I had in mind. I figured removing code was more deterministic than leaving or fixing.
  • "git submodule update vendor" always gets run. It seems to be light-weight. I punted on the building-from-source-tar case.
  • I tried to get the envoy stuff over to the appropriate out directory. I don't know what the story is on pilot/pkg/proxy/envoy/envoy

release/store_artifacts.sh and tools/istio-docker.mk is just bring in a fix from another PR so I could do a test release build.

@costinm
Copy link
Copy Markdown
Contributor

costinm commented Feb 10, 2018 via email

@knrc
Copy link
Copy Markdown

knrc commented Feb 10, 2018

It seems that many are assuming there are no downsides to submodules so I'm going to try an enumerate the ones I'm aware of

1 - git clone no longer just works as expected with vendor support

For many developers used to vendored projects they will be used to the following steps

  • git clone
  • cd to appropriate directory
  • go build
    Unfortunately this no longer works as the vendor submodule will be created and left empty.

In order for this to work developers will now need to replace 'git clone ' with either

  • git clone --recursive
    or
  • git clone
  • git submodule update --init

I'm sure many will argue this is a minor change but either way it is a change in the developer workflow

2 - git pull no longer works as expected

Many developers will expect to update their projects by following a workflow similar to

  • git pull upstream

Developers will expect this to fetch the latest updates from the repo and merge the changes so their copy is up to date, unfortunately when using submodules this doesn't happen as the submodule is left in the prior state (remember this fact, it is relevant in the next issue).

The new workflow for developers becomes

  • git pull upstream
  • git submodule update

Again some may argue this is a minor change but yet again this is another change in the developer workflow

3 - the potential for losing submodule changes

Remember that developer who only ran 'git pull upstream'? He has a local copy of the Istio repository which is up to date with master, after all he issued the git pull before working on this change, has made changes to his local istio repo and now wants to commit them.

The developer adds all the files, unfortunately also including the vendor submodule, pushes this to his own repository (no conflicts raised since he is already up to date) before submitting his PR. The PR gets reviewed, nobody catches the vendor submodule being included by mistake and is accepted.

You now have a commit history which has lost a change, this developer has essentially reverted a previous change and nothing in the git tooling is preventing him from doing this.

To me this is one of the most serious issues with submodules, had the vendor directory been in the same repo then the developer's merge would have updated everything and, if he was behind, the git tooling would have noticed and warned the developer about this.

4 - Managing multiple repos

I think this is an obvious issue so I won't do anything other than raise awareness.

In summary we seem to be choosing between two different approaches to vendoring since I believe we already have consensus that vendoring is an appropriate way forward.

On the one hand you have the option to check the vendor directory into the same repository with the downside of adding to the size (8MB compressed from what Laurent's said earlier) but with a much simpler and, at least IMO, a more reliable workflow for everyone.

On the other side you have the option to use submodules which has the benefit of keeping the vendor directory separate from the main code but comes with a number of issues including one which I believe should be serious enough to question its approach.

There may be other issues lurking in submodules, to be honest I've just spent an hour playing around with the basic workflows looking for issues, but I've already found the ones mentioned above.

I am happy to discuss this with anyone before we try to reach a decision, although I'm travelling again next week I will definitely attend the test/release meeting which I hope will prioritise this issue.

@knrc
Copy link
Copy Markdown

knrc commented Feb 10, 2018

@rshriram We have already discussed alternatives to vendoring within the community however it does involve work on our side and I would expect other companies in similar positions would also need to duplicate this within their environment.

At the moment we are maintaining a fork of istio which includes the vendor directory and, in order to simplify the tracking, we have discussed creating branches off the main codebase, containing just the vendor content, whenever we need to push to our internal git repo.

So yes we can build up tooling to work around this but our obvious preference is to have this supported in the community since it benefits not only ourselves but others. Even with the submodule approach to vendoring we will likely still have a need for tooling since we will have to handle the management of the submodule.

@ldemailly
Copy link
Copy Markdown
Member Author

ldemailly commented Feb 10, 2018

@knrc thanks for your comments/insights !

1&2: how about "go get istio.io/istio" working for initial setup (it'll get the submodule, I tested it with fortio)

or "make" (which is needed anyway for various setup, getting envoy etc...)

I would think people who want to contribute can be comfortable with a fairly standard 1 command to type?

3: I don't understand the issue: go dep is how we manage the content of vendor/, the toml and the lock file is where one would see a merge or conflict (and review)

4: we already have multiple repos: api/ tools/ contrib/ gogo, mixerclient, proxy, website etc... and again the vendor submodule is transparent with 1 line in the makefile

@ldemailly
Copy link
Copy Markdown
Member Author

@knrc istio needs and produces containers, go build doesn't, so we need some minimal "make" targets in any case. also it's really not that hard to notice that you need a recursive clone or go get or pull to get the matching vendor. it also only applies to someone changing the dependencies as the .lock file is there anyway so people don't "have to" get the submodule (which is another reason why it is superior to checking in the full vendor in this mono repo)

- attach_workspace:
at: /go
- run: bin/testEnvRootMinikube.sh start
- restore_cache:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the cache for envoy and other downloads.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you mentioned that's dealt with through the attach_workspace

@costinm
Copy link
Copy Markdown
Contributor

costinm commented Feb 14, 2018 via email

@ldemailly ldemailly requested a review from a team February 15, 2018 00:25
@ldemailly ldemailly changed the title (Rfc) adding vendor as submodule adding vendor as submodule Feb 15, 2018
@istio-merge-robot
Copy link
Copy Markdown

@ldemailly PR needs rebase

@istio-merge-robot istio-merge-robot added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 15, 2018
Merge branch 'master' into vendor-sub
matches master ed69c3a and
#3497
@istio-merge-robot istio-merge-robot removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 15, 2018
@ldemailly
Copy link
Copy Markdown
Member Author

/test e2e-suite-rbac-auth

@istio-testing
Copy link
Copy Markdown
Collaborator

@ldemailly: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/e2e-suite-rbac-auth.sh c5db73c link /test e2e-suite-rbac-auth
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ldemailly
Copy link
Copy Markdown
Member Author

mixer test failure is unrelated.

@sdake
Copy link
Copy Markdown
Member

sdake commented Feb 15, 2018

@ldemailly I haven't checked out either of these branches to try the code, although I think the main requirement in my mind (air-gapped reproducible builds) are met by either solution. I'm a fan of executing technical work that are two way doors - in other words, things that can easily be reversed if problems are found. A vendor commit is somewhat permanent without a history rewrite (as already covered in this PR).

Cheers
-steve

@ldemailly
Copy link
Copy Markdown
Member Author

Approved by Test&Release WG and Sven

Merging

@ldemailly
Copy link
Copy Markdown
Member Author

writing an FAQ and email before merging actually

@ldemailly ldemailly merged commit 243da06 into master Feb 15, 2018
@ldemailly ldemailly deleted the vendor-sub branch February 15, 2018 22:13
@ldemailly
Copy link
Copy Markdown
Member Author

@ldemailly ldemailly removed the do-not-merge/hold Block automatic merging of a PR. label Feb 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

change to build: fortio "make install" just be go install .