Skip to content

roachprod: use exit-on-error:false for crdb log cfg#63472

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:crit-errors
Apr 14, 2021
Merged

roachprod: use exit-on-error:false for crdb log cfg#63472
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:crit-errors

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 12, 2021

See #62763.

We seem to frequently miss the runtime errors resulting from
out-of-memory conditions in the stderr logs. We don't understand
exactly why yet, but it is very likely that with exit-on-error
(which is true by default) we are hitting errors outputting to
the sink which then kill the process before the runtime errors
bubble up.

While we develop a proper fix, avoid the problematic configuration
on roachprod clusters, which notably includes roachtests.

Release note: None

@tbg tbg requested review from a team and ajwerner April 12, 2021 13:45
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 12, 2021

bors r=ajwerner
TFTR!

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 12, 2021

Build failed:

@knz
Copy link
Copy Markdown
Contributor

knz commented Apr 12, 2021

NB: this change will cause nodes to continue running even when they cannot write to the log files on disk, and we'll become blind as to what is happening.

(Not pushing against the change, but this should be clear to everyone involved)

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 14, 2021

NB: this change will cause nodes to continue running even when they cannot write to the log files on disk, and we'll become blind as to what is happening.

That is fine. We only log to files in roachtest, and when we fail to log to files, likely we can't write to the file system, so there is really nowhere for the messages to go and we're not materially worsening the situation.

Unfortunately to get this PR in I need to somehow sniff out the old version in the mixed version upgrade roachtest. Sigh, I'll cook something up.

See cockroachdb#62763.

We seem to frequently miss the runtime errors resulting from
out-of-memory conditions in the stderr logs. We don't understand
exactly why yet, but it is very likely that with `exit-on-error`
(which is true by default) we are hitting errors outputting to
the sink which then kill the process before the runtime errors
bubble up.

While we develop a proper fix, avoid the problematic configuration
on roachprod clusters, which notably includes roachtests.

v20.2 did not have the `--log` flag yet, so we only do this when
starting v21.1 (i.e. `master` at time of writing).

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 14, 2021

Unfortunately to get this PR in I need to somehow sniff out the old version in the mixed version upgrade roachtest. Sigh, I'll cook something up.

This ended up being straightforward.

bors r=ajwerner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 14, 2021

Canceled.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 14, 2021

bors r=ajwerner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 14, 2021

Build succeeded:

@craig craig bot merged commit b2a6c7d into cockroachdb:master Apr 14, 2021
tbg added a commit to tbg/cockroach that referenced this pull request Apr 15, 2021
I broke this in cockroachdb#63472, twofold.

1. if the version switch returned false, we weren't passing log flags
   at all, so there weren't any log files.
2. the version switch returned false on current master, since 21.1 is
   not tagged yet (we're still in the prerelease versions, currently
   at 21.1.0-alpha.3).

I tested that this fix works.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 15, 2021
63342: ui: add button to reset SQL stats r=Azhng a=Azhng

Release note (ui change): User can now reset SQL stats from DB Console.

Follow up to: cockroachdb/ui#271
Closes #33315 

63401: ui: e2e tests, improved cypress test for tsx, overview and db page r=nanduu04 a=nanduu04

Previously, only visual regression tests were written

Wrote cypress tests using the cypress-testing-library

[CRDB-2388](https://cockroachlabs.atlassian.net/browse/CRDB-2388?atlOrigin=eyJpIjoiOWIzNzY0ZjkzY2RhNGU3ZTgyZTUyOTY5MjE4ZmM5YmIiLCJwIjoiaiJ9)

Release note: None

63713: config: deflake TestMarshalableZoneConfigRoundTrip r=aayushshah15 a=aayushshah15

A previous change (#63079) made this test flakey by (needlessly) making
one of the marshalled fields a value derived from two other fields
(as opposed to just one). This commit fixes the flake.

Release note: None

63726: install: fix log flags in roachprod start r=knz a=tbg

I broke this in #63472, twofold.

1. if the version switch returned false, we weren't passing log flags
   at all, so there weren't any log files.
2. the version switch returned false on current master, since 21.1 is
   not tagged yet (we're still in the prerelease versions, currently
   at 21.1.0-alpha.3).

I tested that this fix works.

Release note: None


Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Nandu Pokhrel <nandup@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants