Skip to content

importccl: set roachpb.Value checksums when creating data#24128

Merged
madelynnblue merged 1 commit intocockroachdb:masterfrom
madelynnblue:import-checksum
Mar 24, 2018
Merged

importccl: set roachpb.Value checksums when creating data#24128
madelynnblue merged 1 commit intocockroachdb:masterfrom
madelynnblue:import-checksum

Conversation

@madelynnblue
Copy link
Copy Markdown
Contributor

IMPORT was not setting value checksums when generating KV pairs from
CSV rows. This results in CPuts on, for example, secondary indexes
failing because the checksum doesn't match.

Fixes #23984

Release note (bug fix): correctly generate on-disk checksums during
IMPORT. If there is existing data created by IMPORT that cannot
be recreated by this fixed version of IMPORT, use cockroach dump
to rewrite any affected tables.

@madelynnblue madelynnblue requested review from a team, dt and nvb March 21, 2018 20:08
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Mar 21, 2018

:lgtm: did you confirm that this fixes the issue?


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/ccl/importccl/csv.go, line 711 at r1 (raw file):

		count += int64(len(kvBatch))
		for _, kv := range kvBatch {
			if err := writer.Put(kv.Key, kv.Value.RawBytes); err != nil {

Adding the call to InitChecksum would be a bit more natural here, but it's up to you.


Comments from Reviewable

@madelynnblue
Copy link
Copy Markdown
Contributor Author

Yes, the test I added produces the error in the bug.


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/ccl/importccl/csv.go, line 711 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Adding the call to InitChecksum would be a bit more natural here, but it's up to you.

Sadly this function (writeRocksDB) is only called in the local version of import, so adding it here would also mean adding it in a second place for the distributed version. The place where it is now is best because that's the latest point before the implementations diverge.


Comments from Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Mar 21, 2018

How does this interact with #24126? Do we still need this for mixed-version clusters or is it obsolete?


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Mar 21, 2018

#24126 will fix #23984 for any 2.0 clusters, but the change here is still a good idea for mixed-version clusters where some of the nodes may require checksums.

@madelynnblue
Copy link
Copy Markdown
Contributor Author

I'm not opposed to closing this PR if #24126 is a sufficient fix. Will that make it into 2.0 or 2.0.1? We could choose to not merge this PR since it'd just be a performance decrease for no value.

@bdarnell
Copy link
Copy Markdown
Contributor

We should merge and cherry-pick this (for 2.0.1, I think) since it will help for mixed-version clusters. Then we can remove the checksum computation at the same time we remove all the others.

IMPORT was not setting value checksums when generating KV pairs from
CSV rows. This results in CPuts on, for example, secondary indexes
failing because the checksum doesn't match.

Fixes #23984

Release note (bug fix): correctly generate on-disk checksums during
IMPORT. If there is existing data created by IMPORT that cannot
be recreated by this fixed version of IMPORT, use `cockroach dump`
to rewrite any affected tables.
@madelynnblue madelynnblue merged commit 6b1b600 into cockroachdb:master Mar 24, 2018
@madelynnblue madelynnblue deleted the import-checksum branch March 24, 2018 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: duplicate key value on non-unique index

4 participants