Skip to content

sql: fix the handling of string types#28951

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20180822-c10-fix-string-types
Aug 23, 2018
Merged

sql: fix the handling of string types#28951
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20180822-c10-fix-string-types

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Aug 22, 2018

First commits from #28950 and priors.
Forked from #28690.

This patch corrects two long-standing bugs in CockroachDB which were
confusing both 3rd party tools/ORMs and users:

  • the SQL standard type called "CHAR" is supposed to have a default
    maximum width of 1 character.

    Prior to this patch, CockroachDB did not support this and instead
    set no default maximum width on CHAR. This would cause CockroachDB
    to fail validating the width of values inserted in tables, fail
    to constrain the width of values during casts to CHAR, and report
    incorrect information in introspection tables.

  • PostgresSQL's dialect distinguishes the table column types TEXT,
    CHAR, VARCHAR and a special thing called "char" (this is a legacy
    PostgreSQL type which is, unfortunately, still used by some
    pg_catalog tables).

    Prior to this patch, CockroachDB would map all of these types into
    the same column type "STRING". In turn this caused them to show
    up as "text" in introspection.

    While this was not incorrect from the perspective of value
    handling (all these types are, indeed, handled with about the same
    value semantics in both PostgreSQL and CockroachDB), ORMs
    do pay attention to this distinction and become confused if they
    see a different type in introspection (e.g. "text") than what
    the application model requires (e.g. "char"). The result of this
    confusion was to trigger a schema change to adapt the type, which
    in turn fails due to missing features in ALTER COLUMN TYPE.

This patch corrects both problems by giving the appropriate default
width to CHAR and preserving the distinction between the various string
types in column descriptors and thus introspection.

Additionally, a width constraint check on collated string columns was
missing. This is now fixed too.

Tables and columns created prior to this patch are unaffected.

Release note (bug fix): CockroachDB now distinguishes "CHAR" and
"VARCHAR" as mandated by the SQL standard and PostgreSQL
compatibility. When a width is not specified (e.g. CHAR(3)), the
maximum width of VARCHAR remains unconstrained whereas the maximum
width for CHAR is now 1 character.

Release note (bug fix): CockroachDB now properly checks the width of
strings inserted in a collated string column with a specified width.

Release note (sql change): CockroachDB now preserves the distinction
between different column types for string values like in PostgreSQL,
for compatibility with 3rd party tools and ORMs.

@knz knz requested a review from BramGruneir August 22, 2018 12:03
@knz knz requested a review from a team as a code owner August 22, 2018 12:03
@knz knz requested review from a team August 22, 2018 12:03
@knz knz requested a review from a team as a code owner August 22, 2018 12:03
@knz knz requested review from a team August 22, 2018 12:03
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20180822-c10-fix-string-types branch from 4ba970c to ac648a7 Compare August 22, 2018 12:04
@knz knz added the docs-todo label Aug 22, 2018
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Aug 22, 2018

cc @andreimatei this fixes CHAR!

@knz knz force-pushed the 20180822-c10-fix-string-types branch 2 times, most recently from 3634cd4 to e39014e Compare August 22, 2018 12:30
@knz knz mentioned this pull request Aug 22, 2018
@knz knz force-pushed the 20180822-c10-fix-string-types branch from e39014e to bbb52c9 Compare August 22, 2018 12:57
@knz knz requested a review from a team August 22, 2018 12:57
@knz knz force-pushed the 20180822-c10-fix-string-types branch 2 times, most recently from 1b8bb20 to df7abd7 Compare August 22, 2018 15:41
Copy link
Copy Markdown
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we enforce a max length? If we do, we should test it.

Sadly, I think we also need to add a number of tests for "char". I didn't see any.

Also, is there a reason to not consolidate collated strings with strings now? Can the absence of the locale be enough to make this practical? It looks like most of their handling is repeated code.

Reviewed 4 of 4 files at r1, 1 of 1 files at r2, 18 of 18 files at r3, 2 of 2 files at r4, 7 of 7 files at r5, 3 of 3 files at r6, 13 of 13 files at r7, 31 of 31 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/coltypes/aliases.go, line 77 at r8 (raw file):

	Interval = &TInterval{}

	// Char is an immutable T instance.

A comment here about looking into strings.go to see what these types are would be helpful.


pkg/sql/coltypes/strings.go, line 94 at r8 (raw file):

func (node *TString) Format(buf *bytes.Buffer, f lex.EncodeFlags) {
	buf.WriteString(node.TypeName())
	if !(node.Variant == TStringVariantCHAR && node.N == 1) && node.N > 0 {

Ugh, this took a second to reason about.

Can you swap the order here. node.N > 0 then the check for not char(1)?


pkg/sql/coltypes/strings.go, line 123 at r8 (raw file):

// TCollatedString represents a STRING, CHAR, QCHAR or VARCHAR type
// with a collation locale.
type TCollatedString struct {

Why not embed a TString?


pkg/sql/parser/sql.y, line 6159 at r8 (raw file):

character_base:
  CHARACTER

Are CHARACTER and CHAR still different? Or can they be consolidated?


pkg/sql/sqlbase/column_type_properties.go, line 216 at r8 (raw file):

		typName = "CHAR"
	case ColumnType_QCHAR:
		// yes, that's the name. The ways of PostgreSQL are inscrutable.

nit: Yes needs to be capitalized. And this comment is golden. :)


pkg/sql/sqlbase/column_type_properties.go, line 230 at r8 (raw file):

//
// See also InformationSchemaVisibleType() below.
func (c *ColumnType) SQLString() string {

This is so similar to the other function, it would be nice to DRY them up.


pkg/sql/sqlbase/column_type_properties.go, line 234 at r8 (raw file):

	case ColumnType_STRING:
		typName := c.stringTypeName()
		if !(c.VisibleType == ColumnType_CHAR && c.Width == 1) && c.Width > 0 {

Again, put the c.Width check first for readability.


pkg/sql/sqlbase/structured.proto, line 155 at r8 (raw file):

    REAL = 5;
    DOUBLE_PRECISION = 6; // Deprecated, remove post-2.2.
	VARCHAR = 7;

lint: please remove the tabs

@knz knz force-pushed the 20180822-c10-fix-string-types branch from df7abd7 to a55820b Compare August 23, 2018 11:03
@knz knz requested a review from a team August 23, 2018 11:03
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we enforce a max length? If we do, we should test it.

Done. (logic test)

Sadly, I think we also need to add a number of tests for "char". I didn't see any.

  • the parser tests are already alongside the others.
  • the coltypes tests (in sem/tree/col_types_test.go) are already alongside the others.
  • added the case to TestMakeTableDescColumns.
  • added the logic tests.

Also, is there a reason to not consolidate collated strings with strings now? Can the absence of the locale be enough to make this practical? It looks like most of their handling is repeated code.

Filed a separate issue for that: #29001.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/coltypes/aliases.go, line 77 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

A comment here about looking into strings.go to see what these types are would be helpful.

Good idea. Done.


pkg/sql/coltypes/strings.go, line 94 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Ugh, this took a second to reason about.

Can you swap the order here. node.N > 0 then the check for not char(1)?

Done. Also added a comment to clarify.


pkg/sql/coltypes/strings.go, line 123 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Why not embed a TString?

Good idea! Done.


pkg/sql/parser/sql.y, line 6159 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Are CHARACTER and CHAR still different? Or can they be consolidated?

Done.


pkg/sql/sqlbase/column_type_properties.go, line 230 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

This is so similar to the other function, it would be nice to DRY them up.

I fail to agree; they have extremely little in common. Not even the switch cases are the same.

I did take your comment as a hint to merge the cases for string and collatedstring, though. I did that.


pkg/sql/sqlbase/column_type_properties.go, line 234 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Again, put the c.Width check first for readability.

Done. Also the comment :)


pkg/sql/sqlbase/structured.proto, line 155 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

lint: please remove the tabs

Done.

@knz knz force-pushed the 20180822-c10-fix-string-types branch 2 times, most recently from 111c0b1 to 05ebd0b Compare August 23, 2018 11:15
This patch corrects two long-standing bugs in CockroachDB which were
confusing both 3rd party tools/ORMs and users:

- the SQL standard type called "CHAR" is supposed to have a default
  maximum width of 1 character.

  Prior to this patch, CockroachDB did not support this and instead
  set no default maximum width on CHAR. This would cause CockroachDB
  to fail validating the width of values inserted in tables, fail
  to constrain the width of values during casts to CHAR, and report
  incorrect information in introspection tables.

- PostgresSQL's dialect distinguishes the table column types TEXT,
  CHAR, VARCHAR and a special thing called `"char"` (this is a legacy
  PostgreSQL type which is, unfortunately, still used by some
  pg_catalog tables).

  Prior to this patch, CockroachDB would map all of these types into
  the same column type "STRING". In turn this caused them to show
  up as "text" in introspection.

  While this was not incorrect from the perspective of value
  handling (all these types are, indeed, handled with about the same
  value semantics in both PostgreSQL and CockroachDB), ORMs
  do pay attention to this distinction and become confused if they
  see a different type in introspection (e.g. "text") than what
  the application model requires (e.g. "char"). The result of this
  confusion was to trigger a schema change to adapt the type, which
  in turn fails due to missing features in ALTER COLUMN TYPE.

This patch corrects both problems by giving the appropriate default
width to CHAR and preserving the distinction between the various string
types in column descriptors and thus introspection.

Additionally, a width constraint check on collated string columns was
missing. This is now fixed too.

Tables and columns created prior to this patch are unaffected.

Release note (bug fix): CockroachDB now distinguishes "CHAR" and
"VARCHAR" as mandated by the SQL standard and PostgreSQL
compatibility. When a width is not specified (e.g. `CHAR(3)`), the
maximum width of `VARCHAR` remains unconstrained whereas the maximum
width for `CHAR` is now 1 character.

Release note (bug fix): CockroachDB now properly checks the width of
strings inserted in a collated string column with a specified width.

Release note (sql change): CockroachDB now preserves the distinction
between different column types for string values like in PostgreSQL,
for compatibility with 3rd party tools and ORMs.
@knz knz force-pushed the 20180822-c10-fix-string-types branch from 05ebd0b to 9c63fe4 Compare August 23, 2018 11:38
@craig craig bot merged commit 9c63fe4 into cockroachdb:master Aug 23, 2018
@knz knz deleted the 20180822-c10-fix-string-types branch August 23, 2018 12:51
craig bot pushed a commit that referenced this pull request Aug 23, 2018
29006: backport-2.1: fix the handling of SQL types r=knz a=knz

Backport 1/1 commits from #28937.
Backport 1/1 commits from #28942.
Backport 17/17 commits from #28943.
Backport 1/1 commits from #28944.
Backport 1/1 commits from #28945.
Backport 1/1 commits from #28949.
Backport 2/2 commits from #28950.
Backport 1/1 commits from #28951 (#28959).
Backport 1/1 commits from #28952 (#28959).
Backport 1/1 commits from #28955 (#28959).
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28690.

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Aug 24, 2018
29006: backport-2.1: fix the handling of SQL types r=knz a=knz

Backport 1/1 commits from #28937.
Backport 1/1 commits from #28942.
Backport 17/17 commits from #28943.
Backport 1/1 commits from #28944.
Backport 1/1 commits from #28945.
Backport 1/1 commits from #28949.
Backport 2/2 commits from #28950.
Backport 1/1 commits from #28951 (#28959).
Backport 1/1 commits from #28952 (#28959).
Backport 1/1 commits from #28955 (#28959).
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28690.

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants