Skip to content

Conversation

@christopherfujino
Copy link
Contributor

@christopherfujino christopherfujino commented Oct 23, 2019

Description

On October 21, 2019 the deployg_gallery-macos test started failing because the fastlane application depended on a version of the Google Cloud that was newer than the one present (#43204). I temporarily worked around this by commenting out the invocation of fastlane (#43207). This PR restores the commented out line and fixes the initial issue by adding Gemfile and Gemfile.lock files to flutter/dev/ci/mac that is then installed during the mac setup stage, rather than relying on the version present in the mac VM (Note: apparently bundler install from a Gemfile is non-deterministic, I was getting both failures and successes in the setup for the same commit, so I had to check in a Gemfile.lock from one of the successful runs).

During investigation I also discovered that: 1) we were not automatically re-building our linux Docker image; and 2) the Docker image would no longer build as is (#43435). The cause of the Docker image not building was that the code had changed upstream, breaking compatibility with Go version 1.7 (which was on our Docker image). I initially tried upgrading the base debian image, but this broke the Android SDK, because it pulled in a newer version of Java, and I found it easier installing a newer Go package than an older Java package. Thus, the Dockerfile now curl's the official 1.13.3 Go binary (currently the latest).

In regards to our Dockerfile not building on each release, there were several configuration errors we had, which Fedor helped me resolve. In short, now any time we tag a commit, cirrus will create a run that will include the docker_builder job. Note, this run is different from the one triggered by pushing to a branch (thus the roll_dev.dart script, which first tags a commit and then pushes the commit to the dev branch will trigger two cirrus runs). For us to catch the docker_builder failing, someone will need to monitor these cirrus jobs.

Finally, I also implemented cirrus' dockerfile-as-ci feature, so that our cirrus environment is actually based on our Dockerfile rather than a published image (these are cached based on the contents of the Dockerfile). Thus, this will catch failures based on changes to the Dockerfile (the docker_builder jobs will catch failures based on code changes...or upstream changes such as our fastlane issue).

Related Issues

Fixes #43204, fixes #43435

Tests

This restores the code of deploy_gallery-macos and gets it to pass.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

  • I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
  • I signed the [CLA].
  • I read and followed the [Flutter Style Guide], including [Features we expect every widget to implement].
  • I updated/added relevant documentation (doc comments with ///).
  • All existing and new tests are passing.
  • The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Does your PR require Flutter developers to manually update their apps to accommodate your change?

  • Yes, this is a breaking change (Please read [Handling breaking changes]). Replace this with a link to the e-mail where you asked for input on this proposed change.
  • No, this is not a breaking change.

@fluttergithubbot fluttergithubbot added c: contributor-productivity Team-specific productivity, code health, technical debt. work in progress; do not review labels Oct 23, 2019
@christopherfujino christopherfujino force-pushed the fix-docker-build branch 6 times, most recently from aa0f7ee to 274bf8c Compare October 23, 2019 18:28
@christopherfujino christopherfujino changed the title Fix docker build Allow rebuilding of docker image, re-enable deploy gallery macos Oct 24, 2019
@christopherfujino christopherfujino marked this pull request as ready for review October 24, 2019 18:13
@christopherfujino christopherfujino added team-infra Owned by Infrastructure team and removed work in progress; do not review labels Oct 24, 2019
Copy link
Contributor

@mklim mklim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, huge long tail of issues here. Thanks for getting to the bottom of it.

.cirrus.yml Outdated
- which flutter
- sudo gem install cocoapods
- sudo gem install xcpretty
- sudo gem install cocoapods -N
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use bundler and a Gemfile instead for this here too? This is effectively using any for each the gem versions and will probably cause issues down the line, even if it appears to be working now. I think we could convert setup_script into an actual script file and echo the Gemfile out into the operating system from there and then run it similar to how the Dockerfile works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better if we fail fast if something breaks us upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmagman do you have opinions?

Copy link
Contributor

@mklim mklim Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general yes, but this would be exposing us to unnecessary upstream breakages IMHO, especially because it's ignoring semantic versioning.

Let's assume that our CI works with the current versions of these packages. The way that this is currently written, if and when a bad (or even "good" but breaking) version n+1 of any of these gems are published, the Flutter tree will immediately go red because we will greedily pull in the new version even if it's got a bug or if it's a major version bump. Then we will have a fire and immediately have to diagnose and figure out a workaround for the broken upstream package. I don't think there's any reason for us to be using the absolute latest version of these packages at all times, is there? It seems like it would be better to bump the versions deliberately and test the version bumps through CI like everything else.

In addition to that bundler will handle transitive dependency installation and matching similar to how pub does. I don't think two successive sudo gem install commands do the same thing. So I think there's another potential breakage here where future versions of these packages have overlapping but potentially conflicting transitive dependencies, and the first install grabs a version that's incompatible with the second one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the reason why this initially failed was because we were using an old version of fastlane from our mac vm (essentially pinned to whenever the image was last created).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'm concerned pinning fastlane will lead to this same breakage. But perhaps the lesser of two evils?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be surprised if an old version of fastlane that used to work suddenly spontaneously stopped working in a truly hermetic environment. It's more likely that something else in our environment changed and then broke with the version. I would rather we try to drive the environment towards general immutability so we can predict what's going on vs introducing more chaos in the system.

From reading the PR description though it sounds more like fastlane was getting implicitly installed by the OS, and the version that was getting installed that way was incompatible with one of our other dependencies which was itself being greedily installed to a random version so randomly broke. It's not necessarily that it was an older version, it's that it was a random one and that was combined with a random install of GCloud. If we were deliberately versioning all of the environment dependencies, we wouldn't be able to have anything surprise break based on a version bump like this. It's not so much that it was too old I don't think in that it was too unpredictable and changing outside of our PR review cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to try to use newest cocoapods since we were able to catch some issues like #41144. If we can express that in a Gemfile, then pegging xcpretty and fastlane to a tight version range seems like a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. If we want the tree to turn red whenever cocoapods breaks because users will be using the latest, that makes sense. gem 'cocoapods' in the Gemfile would always grab the latest version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a Gemfile (and Gemfile.lock), pinning down the version of fastlane and xcpretty, and having cocoapods be latest.

Copy link
Contributor

@gspencergoog gspencergoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32384589-a60f0e74-c078-11e7-9bc1-e5b5287aea9d

Copy link
Contributor

@mklim mklim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@christopherfujino christopherfujino merged commit 8e8d235 into flutter:master Oct 25, 2019
@christopherfujino christopherfujino deleted the fix-docker-build branch October 25, 2019 22:10
jonahwilliams pushed a commit that referenced this pull request Oct 26, 2019
jonahwilliams pushed a commit that referenced this pull request Oct 26, 2019
Inconnu08 pushed a commit to Inconnu08/flutter that referenced this pull request Nov 26, 2019
Inconnu08 pushed a commit to Inconnu08/flutter that referenced this pull request Nov 26, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

c: contributor-productivity Team-specific productivity, code health, technical debt. team-infra Owned by Infrastructure team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linux Docker image will not build Fastlane fails in deploy_gallery test because dependencies outdated

6 participants