Skip to content

importccl: support IMPORT INTO for DELIMITED and PGCOPY#52628

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
Anzoteh96:import-into-delim
Aug 18, 2020
Merged

importccl: support IMPORT INTO for DELIMITED and PGCOPY#52628
craig[bot] merged 1 commit intocockroachdb:masterfrom
Anzoteh96:import-into-delim

Conversation

@Anzoteh96
Copy link
Copy Markdown

@Anzoteh96 Anzoteh96 commented Aug 11, 2020

Previously, IMPORT INTO is supported only for CSV and AVRO. This PR extends the support of IMPORT INTO for DELIMITED and PGCOPY, in the hope that this will be useful when supporting default columns for imports from these files.

This involves populating the targeted columns from import processor when creating new input reader and row converter for DELIMITED and PGCOPY, when the targeted columns are specified in IMPORT INTO.

Fixes #52405

Release note (general): IMPORT INTO is now supported for DELIMITED and PGCOPY file formats.

@Anzoteh96 Anzoteh96 requested a review from pbardea August 11, 2020 15:00
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@Anzoteh96 Anzoteh96 marked this pull request as ready for review August 11, 2020 15:01
@Anzoteh96 Anzoteh96 requested a review from a team August 11, 2020 15:01
@Anzoteh96
Copy link
Copy Markdown
Author

Anzoteh96 commented Aug 11, 2020

Some notes:

  1. DEFAULT columns will not be supported (in this PR) for these formats yet because the targeted columns have not been correctly populated for these file formats (however, this will be fixed later on).

  2. Once this is successfully merged, the document needs to updated too.

Copy link
Copy Markdown
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer supporting these formats when they are fully supported (e.g. properly populate target cols). Maybe we can support one at a time in the same change that fixes the targeted column fix?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96 and @pbardea)


pkg/ccl/importccl/import_stmt.go, line 152 at r1 (raw file):

// File formats supported for IMPORT INTO
var allowedIntoFormats = []string{"CSV", "AVRO", "DELIMITED", "PGCOPY"}

If we define this as a map, then we don't need the isSupportedImportInto, correct?

@Anzoteh96 Anzoteh96 marked this pull request as draft August 11, 2020 15:28
@Anzoteh96 Anzoteh96 changed the title importccl: support IMPORT INTO for DELIMITED and PGCOPY importccl: [WIP] support IMPORT INTO for DELIMITED and PGCOPY Aug 11, 2020
Copy link
Copy Markdown
Author

@Anzoteh96 Anzoteh96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense for the targeted columns to be fixed too (though note that the targeted columns for AVRO will be less straightforward since it's extracted from the schema instead of the statement). I've marked it as WIP and will come to it again when that's being fixed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96)

Copy link
Copy Markdown
Author

@Anzoteh96 Anzoteh96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96)


pkg/ccl/importccl/import_stmt_test.go, line 3176 at r2 (raw file):

Quoted 17 lines of code…
{
			name:            "delimited",
			data:            "1\t2\n3\t4",
			create:          "a INT, b INT DEFAULT 42, c INT",
			targetCols:      "c, a",
			format:          "DELIMITED",
			expectedResults: [][]string{{"2", "42", "1"}, {"4", "42", "3"}},
		},
		{
			name:            "pgcopy",
			data:            "1,2\n3,4",
			create:          "a INT, b INT DEFAULT 42, c INT",
			targetCols:      "c, a",
			with:            `delimiter = ","`,
			format:          "PGCOPY",
			expectedResults: [][]string{{"2", "42", "1"}, {"4", "42", "3"}},
		},

Some update: I've fixed the targeted column problem and included test case when some columns are not targeted (as per the default column example here). Feel free to suggest any additional test case.

(Also, the target cols above are done in such a way that tests the validity of this import when target columns are reordered).

@Anzoteh96 Anzoteh96 marked this pull request as ready for review August 12, 2020 18:04
@Anzoteh96 Anzoteh96 changed the title importccl: [WIP] support IMPORT INTO for DELIMITED and PGCOPY importccl: support IMPORT INTO for DELIMITED and PGCOPY Aug 12, 2020
@Anzoteh96 Anzoteh96 force-pushed the import-into-delim branch 3 times, most recently from d3f9c23 to 2bb4485 Compare August 18, 2020 00:21
Copy link
Copy Markdown
Author

@Anzoteh96 Anzoteh96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/importccl/import_stmt.go, line 152 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

If we define this as a map, then we don't need the isSupportedImportInto, correct?

Done.


pkg/ccl/importccl/import_stmt_test.go, line 3176 at r2 (raw file):

Previously, Anzoteh96 (Anzo Teh) wrote…
{
			name:            "delimited",
			data:            "1\t2\n3\t4",
			create:          "a INT, b INT DEFAULT 42, c INT",
			targetCols:      "c, a",
			format:          "DELIMITED",
			expectedResults: [][]string{{"2", "42", "1"}, {"4", "42", "3"}},
		},
		{
			name:            "pgcopy",
			data:            "1,2\n3,4",
			create:          "a INT, b INT DEFAULT 42, c INT",
			targetCols:      "c, a",
			with:            `delimiter = ","`,
			format:          "PGCOPY",
			expectedResults: [][]string{{"2", "42", "1"}, {"4", "42", "3"}},
		},

Some update: I've fixed the targeted column problem and included test case when some columns are not targeted (as per the default column example here). Feel free to suggest any additional test case.

(Also, the target cols above are done in such a way that tests the validity of this import when target columns are reordered).

Update: added test cases for when targeted columns are reordered.


pkg/ccl/importccl/import_stmt_test.go, line 1176 at r4 (raw file):

			verifyQuery: "SELECT * FROM t ORDER BY a",
			expected:    [][]string{{"hello", "hello"}, {"hi", "hi"}},
		},

Note: we had this for PGDUMP and MYSQLDUMP as added by Rohan, but it was removed when we enforce the restriction of support to CSV and AVRO. I'm now adding DELIMITED and PGCOPY.

Previously, IMPORT INTO is supported only for CSV and AVRO.
This PR extends the support of IMPORT INTO for DELIMITED and
PGCOPY, in the hope that this will be useful when supporting
default columns for imports from these files. This involves
populating the targeted columns from import processor when
creating new input reader and row converter for DELIMITED and
PGCOPY, when the targeted columns are specified in IMPORT INTO.

Fixes cockroachdb#52405

Release note (general): IMPORT INTO is now supported for
DELIMITED and PGCOPY file formats.
Copy link
Copy Markdown
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@Anzoteh96
Copy link
Copy Markdown
Author

TFTR!

bors r=pbardea

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 18, 2020

Build succeeded:

@craig craig bot merged commit d1afb84 into cockroachdb:master Aug 18, 2020
@Anzoteh96 Anzoteh96 deleted the import-into-delim branch August 19, 2020 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

importccl: enable support for IMPORT INTO ... DELIMITED

3 participants