importccl: fix flaky test TestImportCSVStmt by rytaft · Pull Request #34589 · cockroachdb/cockroach

rytaft · 2019-02-05T19:12:48Z

TestImportCSVStmt tests that IMPORT jobs appear in a certain order
in the system.jobs table. Automatic statistics were causing this
test to be flaky since CreateStats jobs were present in the jobs
table as well, in an unpredictable order. This commit fixes the problem
by only selecting IMPORT jobs from the jobs table.

Fixes #34568

Release note: None

cockroach-teamcity · 2019-02-05T19:12:55Z

This change is

dt · 2019-02-05T19:14:51Z

hm, this test is the bulk of the test coverage on IMPORT so it feel unfortunate to have it behave differently than in reality if we could avoid it. Is it possible to teach it to find the right job record?

straw man: grab the last two records, take first that starts with IMPORT and error out if neither do.

dt · 2019-02-05T19:19:40Z

👍 on a quick fix for the build break for now though

rytaft · 2019-02-05T19:23:08Z

I think Dan already merged a PR to skip this test, so I think there's no rush.

Inside verifySystemJob, I could change the query to select job records based on the type:

SELECT job_type, description, user_name, descriptor_ids, status, running_status FROM crdb_internal.jobs WHERE job_type = $1 ORDER BY created LIMIT 1 OFFSET $2

Does that seem like a good solution? Could also do WHERE job_type != 'CREATE STATS'

dt · 2019-02-05T19:42:43Z

Yeah either of those sound good to me. Thanks!

rytaft · 2019-02-05T19:58:42Z

Great -- I went with the first option. PTAL. Thanks!

dt · 2019-02-05T20:11:24Z

pkg/testutils/jobutils/jobs_verification.go

 		SELECT job_type, description, user_name, descriptor_ids, status, running_status
-		FROM crdb_internal.jobs ORDER BY created LIMIT 1 OFFSET $1`,
+		FROM crdb_internal.jobs WHERE job_type = $1 ORDER BY created LIMIT 1 OFFSET $2`,
+		expectedType.String(),


super, super minor nit: might make sense to rename expectedType to filterType or something that indicates it will filter

rytaft · 2019-02-05T20:50:29Z

pkg/testutils/jobutils/jobs_verification.go, line 149 at r2 (raw file):

Previously, dt (David Taylor) wrote…

super, super minor nit: might make sense to rename expectedType to filterType or something that indicates it will filter

good point -- I also took out the check below since it seems unnecessary now.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained

rytaft · 2019-02-05T23:45:08Z

So the problem is that importccl keeps timing out with stressrace. It makes sense since we do a full table scan after every import to collect stats. What do you guys think about disabling auto stats if race is enabled?

madelynnblue · 2019-02-06T02:44:15Z

Yeah, that sounds fine.

rytaft · 2019-02-06T04:27:42Z

Hmm it's still failing even with stats disabled. Any ideas would be appreciated! Otherwise I'll try to investigate more tomorrow...

TestImportCSVStmt tests that IMPORT jobs appear in a certain order in the system.jobs table. Automatic statistics were causing this test to be flaky since CreateStats jobs were present in the jobs table as well, in an unpredictable order. This commit fixes the problem by only selecting IMPORT jobs from the jobs table. Fixes cockroachdb#34568 Release note: None

rytaft · 2019-02-06T22:51:15Z

I'm really baffled about why stressrace keeps timing out. I've tried completely disabling automatic stats (not just for this test, but in general), as well as testing older versions of the code (e.g., week-old versions of the master branch), and nothing seems to work.

I've added some logic to reduce the number of files and rows per file during race builds, and that seems to have done the trick. Hopefully this is ok?

madelynnblue · 2019-02-07T00:28:19Z

pkg/ccl/backupccl/backup_test.go

 	defer cleanupFn()

-	// Get the number of existing jobs.
-	baseNumJobs := jobutils.GetSystemJobsCount(t, sqlDB)


Does this still need to be removed?

Yea -- that was needed to calculate the offset when the code was selecting all jobs. Now that it's just selecting BACKUP/RESTORE jobs, we don't need baseNumJobs

rytaft

Gentle ping- is this ok to merge?

Reviewable status: complete! 0 of 0 LGTMs obtained

rytaft

Thanks!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained

34296: storage: improve message on slow Raft proposal r=petermattis a=tbg Touches #33007. Release note: None 34589: importccl: fix flaky test TestImportCSVStmt r=rytaft a=rytaft `TestImportCSVStmt` tests that `IMPORT` jobs appear in a certain order in the `system.jobs` table. Automatic statistics were causing this test to be flaky since `CreateStats` jobs were present in the jobs table as well, in an unpredictable order. This commit fixes the problem by only selecting `IMPORT` jobs from the jobs table. Fixes #34568 Release note: None 34660: storage: make RaftTruncatedState unreplicated r=bdarnell a=tbg This isn't 100% polished yet, but generally ready for review. ---- See #34287. Today, Raft (or preemptive) snapshots include the past Raft log, that is, log entries which are already reflected in the state of the snapshot. Fundamentally, this is because we have historically used a replicated TruncatedState. TruncatedState essentially tells us what the first index in the log is (though it also includes a Term). If the TruncatedState cannot diverge across replicas, we *must* send the whole log in snapshots, as the first log index must match what the TruncatedState claims it is. The Raft log is typically, but not necessarily small. Log truncations are driven by a queue and use a complex decision process. That decision process can be faulty and even if it isn't, the queue could be held up. Besides, even when the Raft log contains only very few entries, these entries may be quite large (see SSTable ingestion during RESTORE). All this motivates that we don't want to (be forced to) send the Raft log as part of snapshots, and in turn we need the TruncatedState to be unreplicated. This change migrates the TruncatedState into unreplicated keyspace. It does not yet allow snapshots to avoid sending the past Raft log, but that is a relatively straightforward follow-up change. VersionUnreplicatedRaftTruncatedState, when active, moves the truncated state into unreplicated keyspace on log truncations. The migration works as follows: 1. at any log position, the replicas of a Range either use the new (unreplicated) key or the old one, and exactly one of them exists. 2. When a log truncation evaluates under the new cluster version, it initiates the migration by deleting the old key. Under the old cluster version, it behaves like today, updating the replicated truncated state. 3. The deletion signals new code downstream of Raft and triggers a write to the new, unreplicated, key (atomic with the deletion of the old key). 4. Future log truncations don't write any replicated data any more, but (like before) send along the TruncatedState which is written downstream of Raft atomically with the deletion of the log entries. This actually uses the same code as 3. What's new is that the truncated state needs to be verified before replacing a previous one. If replicas disagree about their truncated state, it's possible for replica X at FirstIndex=100 to apply a truncated state update that sets FirstIndex to, say, 50 (proposed by a replica with a "longer" historical log). In that case, the truncated state update must be ignored (this is straightforward downstream-of-Raft code). 5. When a split trigger evaluates, it seeds the RHS with the legacy key iff the LHS uses the legacy key, and the unreplicated key otherwise. This makes sure that the invariant that all replicas agree on the state of the migration is upheld. 6. When a snapshot is applied, the receiver is told whether the snapshot contains a legacy key. If not, it writes the truncated state (which is part of the snapshot metadata) in its unreplicated version. Otherwise it doesn't have to do anything (the range will migrate later). The following diagram visualizes the above. Note that it abuses sequence diagrams to get a nice layout; the vertical lines belonging to NewState and OldState don't imply any particular ordering of operations. ``` ┌────────┐ ┌────────┐ │OldState│ │NewState│ └───┬────┘ └───┬────┘ │ Bootstrap under old version │ <─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ │ │ Bootstrap under new version │ │ <─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ │─ ─ ┐ │ | Log truncation under old version │< ─ ┘ │ │ │─ ─ ┐ │ │ | Snapshot │ │< ─ ┘ │ │ │ │ │─ ─ ┐ │ │ | Snapshot │ │< ─ ┘ │ │ │ Log truncation under new version │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─>│ │ │ │ │─ ─ ┐ │ │ | Log truncation under new version │ │< ─ ┘ │ │ │ │─ ─ ┐ │ │ | Log truncation under old version │ │< ─ ┘ (necessarily running new binary) ``` Release note: None 34762: distsqlplan: fix error in union planning r=jordanlewis a=jordanlewis Previously, if 2 inputs to a UNION ALL had identical post processing except for renders, further post processing on top of that union all could invalidate the plan and cause errors or crashes. Fixes #34437. Release note (bug fix): fix a planning crash during UNION ALL operations that had projections, filters or renders directly on top of the UNION ALL in some cases. 34767: sql: reuse already allocated memory for the cache in a row container r=yuzefovich a=yuzefovich Previously, we would always allocate new memory for every row that we put in the cache of DiskBackedIndexedRowContainer and simply discard the memory underlying the row that we remove from the cache. Now, we're reusing that memory. Release note: None 34779: opt: add stats to tpch xform test r=justinj a=justinj Since we have stats by default now, this should be the default testing mechanism. I left in tpch-no-stats since we also have that for tpcc, just for safety. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Justin Jaffray <justin@cockroachlabs.com>

craig · 2019-02-11T22:38:08Z

Build succeeded

GitHub CI (Cockroach)

rytaft requested review from a team and dt February 5, 2019 19:12

madelynnblue approved these changes Feb 5, 2019

View reviewed changes

dt approved these changes Feb 5, 2019

View reviewed changes

rytaft force-pushed the fix-import-test branch from ead650f to d39529f Compare February 5, 2019 19:54

rytaft changed the title ~~importccl: disable automatic stats for TestImportCSVStmt~~ importccl: fix flaky test TestImportCSVStmt Feb 5, 2019

dt approved these changes Feb 5, 2019

View reviewed changes

rytaft force-pushed the fix-import-test branch from d39529f to 5d019f5 Compare February 5, 2019 20:49

rytaft force-pushed the fix-import-test branch 2 times, most recently from 4b9cf37 to 33b93ee Compare February 5, 2019 22:39

dt approved these changes Feb 5, 2019

View reviewed changes

rytaft force-pushed the fix-import-test branch from 33b93ee to caf77ba Compare February 5, 2019 23:44

rytaft force-pushed the fix-import-test branch from caf77ba to 26eaf64 Compare February 6, 2019 18:49

rytaft force-pushed the fix-import-test branch from 26eaf64 to db5f5db Compare February 6, 2019 20:32

madelynnblue reviewed Feb 7, 2019

View reviewed changes

rytaft commented Feb 11, 2019

View reviewed changes

madelynnblue approved these changes Feb 11, 2019

View reviewed changes

dt approved these changes Feb 11, 2019

View reviewed changes

rytaft commented Feb 11, 2019

View reviewed changes

craig bot merged commit db5f5db into cockroachdb:master Feb 11, 2019

rytaft deleted the fix-import-test branch April 2, 2020 22:16

Conversation

rytaft commented Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Feb 5, 2019

Uh oh!

dt commented Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dt commented Feb 5, 2019

Uh oh!

rytaft commented Feb 5, 2019

Uh oh!

dt commented Feb 5, 2019

Uh oh!

rytaft commented Feb 5, 2019

Uh oh!

dt Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

rytaft commented Feb 5, 2019

Uh oh!

dt left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft commented Feb 5, 2019

Uh oh!

madelynnblue commented Feb 6, 2019

Uh oh!

rytaft commented Feb 6, 2019

Uh oh!

rytaft commented Feb 6, 2019

Uh oh!

madelynnblue Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

rytaft Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

rytaft left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rytaft left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Feb 11, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rytaft commented Feb 5, 2019 •

edited

Loading

dt commented Feb 5, 2019 •

edited

Loading

rytaft left a comment •

edited

Loading