Conversation
|
try making branches only after pulling |
|
/lgtm |
ON-CALL.md
Outdated
| Every week, one Istio developer is on-call and is responsible for maintaining Istio build and help out users and other developers. The on-call person should prioritize on-call duties on top of their daily work. | ||
|
|
||
| ## Schedule | ||
| The on-call schedule can be found [here][https://docs.google.com/spreadsheets/d/1FaHwPpad3F3hva2suJweNeTnocjKtLnbgLkyMRPzgUY/edit#gid=1475801904]. |
There was a problem hiding this comment.
Incorrect md syntax.
Use () after [] for hyperlinks.
You can preview the text using the "view" button above too.
There was a problem hiding this comment.
The [] works for me in my golang IDE viewer, but I will change it.
ON-CALL.md
Outdated
| * Help with creating the release when needed. | ||
| * Check the schedule sheet to make sure the next on call is defined. | ||
| * On Friday, notify the next on-call. | ||
| * On Tuesday morning, handoff to next oncall and send a summary to istio-dev containing these stats, that can be obtained by querying [github with dates ranges][https://help.github.com/articles/searching-issues-and-pull-requests/] |
There was a problem hiding this comment.
Please send hand off summary to istio-oncall@googlegroups.com
| * Join google groups [istio-oncall][https://groups.google.com/forum/#!forum/istio-oncall] | ||
| * Join the `oncall` Slack channel | ||
|
|
||
| ## Responsibilities |
There was a problem hiding this comment.
Please add post submit monitoring and fix
|
also make sure nothing is lost |
|
/hold |
|
(my lgtm was mostly to re-kick the build btw - also I (wrongly) assumed the preview had been looked and we'd iterate) |
|
/lgtm cancel //PR changed after LGTM, removing LGTM. @andraxylia @ldemailly |
|
|
||
| ## Responsibilities | ||
| * Build cop: monitor the builds, the presubmit automated tests, the postsubmit automated tests: | ||
| * Familiarize yourself with the current [open issues affecting automated tests](https://github.com/istio/istio/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Ftest-failure). |
There was a problem hiding this comment.
That seems a too long list to be familiar with. Is there a shortened list just for build/critical test break issue?
There was a problem hiding this comment.
This is the list unfortunately, 20 open issues. If it is too long, people should prioritize fixing those bugs.
There was a problem hiding this comment.
This is the list for critical issues, not all the issues.
| #PRs with review approved / in flight: | ||
| 53 baseline / 22 current | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Would be good to have queries for these for reuse purpose.
There was a problem hiding this comment.
They have to be adapted anyway with the new dates, so no point in adding them here. Please feel free to add them later if you find some that can be re-used.
ON-CALL.md
Outdated
| * If there are new failures, open issues and label them with kind/test-failure, with the appropriate area label, with "prow" or "circleci" label, | ||
| and assign them either directly to an engineer or to the area lead. | ||
| The issue must contain a link to the failed test log. | ||
| Add a comment in github cc-ing the assignees and explaining this must be fixed or reverted with highest priority. Nag people when needed. |
There was a problem hiding this comment.
What is the process of disable the required test when the test is determined to be not stable?
There was a problem hiding this comment.
I have instructions to disable prow in an email, I can add them, but this can only be done by an administrator.
There was a problem hiding this comment.
I added instructions, PTAL.
|
|
||
| #Issues total | ||
| 488 baseline / 526 current | ||
|
|
There was a problem hiding this comment.
How do I interpret baseline? the # of issues at the beginning of the on call?
There was a problem hiding this comment.
wonder if we should put the data in an online spreadsheet so that we can easily see trends among multiple weeks. that'd be more interesting data.
There was a problem hiding this comment.
i am keeping a log in the oncall folder, maybe we can turn it into a sheet with weekly data
There was a problem hiding this comment.
Sure, that can be done later. K8s does fancy graphs with the data, this is more for information purposes. It is not the on call duty to make judgments calls related to the number of open issues.
|
I find keeping an eye on https://github.com/orgs/istio/dashboard useful |
|
Only a handful of people have github permissions to disable tests. Instructions are here for reference, but it will not be actually the on-call doing it. There is no need to add the note about the slippery slope, I hope those who have been given github admin permissions understand this. |
|
Can I get an approval to merge this? People can iterate subsequently. |
ON-CALL.md
Outdated
| # Istio On-call Playbook | ||
|
|
||
| ## Who | ||
| Every week, one Istio developer is on-call and is responsible for maintaining Istio build and for helping out users and other developers. The on-call person should prioritize on-call duties on top of the daily work. |
There was a problem hiding this comment.
maintaining THE Istio build PROCESS
on top of THEIR daily work
There was a problem hiding this comment.
And should we mention it's not 24*7 on-call?
There was a problem hiding this comment.
I will fix the grammar mistakes. I already mentioned this is during business hours for the time zone.
ON-CALL.md
Outdated
| * If there are new failures, open issues and label them with kind/test-failure, with the appropriate area label, with "prow" or "circleci" label, | ||
| and assign them either directly to an engineer or to the area lead. | ||
| The issue must contain a link to the failed test log. | ||
| Add a comment in github cc-ing the assignees and explaining this must be fixed or reverted with highest priority. Nag people when needed. |
There was a problem hiding this comment.
github -> GitHub
here and elsewhere in the doc.
ON-CALL.md
Outdated
| * Help with creating the release when needed. | ||
| * Check the schedule sheet to make sure the next on call is defined. | ||
| * On Friday, notify the next on-call. | ||
| * On Tuesday morning, handoff to next oncall and send a summary to istio-oncall and istio-dev. Include the stats below, that can be obtained by querying [github with dates ranges:](https://help.github.com/articles/searching-issues-and-pull-requests/) |
ON-CALL.md
Outdated
| ## Schedule | ||
| The on-call schedule can be found [here](https://docs.google.com/spreadsheets/d/1FaHwPpad3F3hva2suJweNeTnocjKtLnbgLkyMRPzgUY/edit#gid=1475801904). | ||
|
|
||
| On-call duty starts on Tuesday at noon PST, ends the following week on Tuesday at noon PST, and is performed during regular working hours for your time zone. |
There was a problem hiding this comment.
so, for non-PST timezoned folks, this implies that their oncall cycle might start at the beginning of their regular working hours some time thereafter, correct? and their cycles would end before the handoff?
For the oncall members in Israel, noon PST is like 10PM and 10AM is midnight PST. I'm not sure how much this matters, but is worth mentioning, especially as it regards oncall expectations.
There was a problem hiding this comment.
Right, it is kind of implied by the business hours I mention previously.
ON-CALL.md
Outdated
| * Help with creating the release when needed. | ||
| * Check the schedule sheet to make sure the next on call is defined. | ||
| * On Friday, notify the next on-call. | ||
| * On Tuesday morning, handoff to next oncall and send a summary to istio-oncall and istio-dev. Include the stats below, that can be obtained by querying [github with dates ranges:](https://help.github.com/articles/searching-issues-and-pull-requests/) |
There was a problem hiding this comment.
i'd be careful with "morning" given timezone differences. I think I'd phrase this as "at the end of your oncall shift".
| * Join the `oncall` Slack channel | ||
|
|
||
| ## Responsibilities | ||
| * Build cop: monitor the builds, the presubmit automated tests, the postsubmit automated tests: |
There was a problem hiding this comment.
can we add a link to the test grid (that shows post-submit test status across PRs, etc.) somewhere here? http://k8s-testgrid.appspot.com/istio#Summary
ON-CALL.md
Outdated
| * Uncheck the affected tests. | ||
|
|
||
|
|
||
|
|
ON-CALL.md
Outdated
| # Istio On-call Playbook | ||
|
|
||
| ## Who | ||
| Every week, one Istio developer is on-call and is responsible for maintaining Istio build and for helping out users and other developers. The on-call person should prioritize on-call duties on top of the daily work. |
There was a problem hiding this comment.
And should we mention it's not 24*7 on-call?
| ## Who | ||
| Every week, one Istio developer is on-call and is responsible for maintaining Istio build and for helping out users and other developers. The on-call person should prioritize on-call duties on top of the daily work. | ||
|
|
||
| ## Schedule |
There was a problem hiding this comment.
And how do we do this oncall rotation? Based on volunteer or we have some rotation mechanism like round robin
There was a problem hiding this comment.
I do not know this.
|
Updated the doc, PTAL. Remember this is not a legally binding document, so we can iterate if we find it does not work. |
|
Thanks Andra ! I'll add more too after my shift |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ldemailly The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue. |
replaces https://goo.gl/9xUCRB and https://goo.gl/Hrg94p