sql: fix the handling of string types#28951
Conversation
4ba970c to
ac648a7
Compare
|
cc @andreimatei this fixes CHAR! |
3634cd4 to
e39014e
Compare
e39014e to
bbb52c9
Compare
1b8bb20 to
df7abd7
Compare
BramGruneir
left a comment
There was a problem hiding this comment.
Do we enforce a max length? If we do, we should test it.
Sadly, I think we also need to add a number of tests for "char". I didn't see any.
Also, is there a reason to not consolidate collated strings with strings now? Can the absence of the locale be enough to make this practical? It looks like most of their handling is repeated code.
Reviewed 4 of 4 files at r1, 1 of 1 files at r2, 18 of 18 files at r3, 2 of 2 files at r4, 7 of 7 files at r5, 3 of 3 files at r6, 13 of 13 files at r7, 31 of 31 files at r8.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/sql/coltypes/aliases.go, line 77 at r8 (raw file):
Interval = &TInterval{} // Char is an immutable T instance.
A comment here about looking into strings.go to see what these types are would be helpful.
pkg/sql/coltypes/strings.go, line 94 at r8 (raw file):
func (node *TString) Format(buf *bytes.Buffer, f lex.EncodeFlags) { buf.WriteString(node.TypeName()) if !(node.Variant == TStringVariantCHAR && node.N == 1) && node.N > 0 {
Ugh, this took a second to reason about.
Can you swap the order here. node.N > 0 then the check for not char(1)?
pkg/sql/coltypes/strings.go, line 123 at r8 (raw file):
// TCollatedString represents a STRING, CHAR, QCHAR or VARCHAR type // with a collation locale. type TCollatedString struct {
Why not embed a TString?
pkg/sql/parser/sql.y, line 6159 at r8 (raw file):
character_base: CHARACTER
Are CHARACTER and CHAR still different? Or can they be consolidated?
pkg/sql/sqlbase/column_type_properties.go, line 216 at r8 (raw file):
typName = "CHAR" case ColumnType_QCHAR: // yes, that's the name. The ways of PostgreSQL are inscrutable.
nit: Yes needs to be capitalized. And this comment is golden. :)
pkg/sql/sqlbase/column_type_properties.go, line 230 at r8 (raw file):
// // See also InformationSchemaVisibleType() below. func (c *ColumnType) SQLString() string {
This is so similar to the other function, it would be nice to DRY them up.
pkg/sql/sqlbase/column_type_properties.go, line 234 at r8 (raw file):
case ColumnType_STRING: typName := c.stringTypeName() if !(c.VisibleType == ColumnType_CHAR && c.Width == 1) && c.Width > 0 {
Again, put the c.Width check first for readability.
pkg/sql/sqlbase/structured.proto, line 155 at r8 (raw file):
REAL = 5; DOUBLE_PRECISION = 6; // Deprecated, remove post-2.2. VARCHAR = 7;
lint: please remove the tabs
df7abd7 to
a55820b
Compare
knz
left a comment
There was a problem hiding this comment.
Do we enforce a max length? If we do, we should test it.
Done. (logic test)
Sadly, I think we also need to add a number of tests for "char". I didn't see any.
- the parser tests are already alongside the others.
- the coltypes tests (in sem/tree/col_types_test.go) are already alongside the others.
- added the case to TestMakeTableDescColumns.
- added the logic tests.
Also, is there a reason to not consolidate collated strings with strings now? Can the absence of the locale be enough to make this practical? It looks like most of their handling is repeated code.
Filed a separate issue for that: #29001.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/coltypes/aliases.go, line 77 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
A comment here about looking into strings.go to see what these types are would be helpful.
Good idea. Done.
pkg/sql/coltypes/strings.go, line 94 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Ugh, this took a second to reason about.
Can you swap the order here. node.N > 0 then the check for not char(1)?
Done. Also added a comment to clarify.
pkg/sql/coltypes/strings.go, line 123 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Why not embed a TString?
Good idea! Done.
pkg/sql/parser/sql.y, line 6159 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Are CHARACTER and CHAR still different? Or can they be consolidated?
Done.
pkg/sql/sqlbase/column_type_properties.go, line 230 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
This is so similar to the other function, it would be nice to DRY them up.
I fail to agree; they have extremely little in common. Not even the switch cases are the same.
I did take your comment as a hint to merge the cases for string and collatedstring, though. I did that.
pkg/sql/sqlbase/column_type_properties.go, line 234 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Again, put the c.Width check first for readability.
Done. Also the comment :)
pkg/sql/sqlbase/structured.proto, line 155 at r8 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
lint: please remove the tabs
Done.
111c0b1 to
05ebd0b
Compare
This patch corrects two long-standing bugs in CockroachDB which were confusing both 3rd party tools/ORMs and users: - the SQL standard type called "CHAR" is supposed to have a default maximum width of 1 character. Prior to this patch, CockroachDB did not support this and instead set no default maximum width on CHAR. This would cause CockroachDB to fail validating the width of values inserted in tables, fail to constrain the width of values during casts to CHAR, and report incorrect information in introspection tables. - PostgresSQL's dialect distinguishes the table column types TEXT, CHAR, VARCHAR and a special thing called `"char"` (this is a legacy PostgreSQL type which is, unfortunately, still used by some pg_catalog tables). Prior to this patch, CockroachDB would map all of these types into the same column type "STRING". In turn this caused them to show up as "text" in introspection. While this was not incorrect from the perspective of value handling (all these types are, indeed, handled with about the same value semantics in both PostgreSQL and CockroachDB), ORMs do pay attention to this distinction and become confused if they see a different type in introspection (e.g. "text") than what the application model requires (e.g. "char"). The result of this confusion was to trigger a schema change to adapt the type, which in turn fails due to missing features in ALTER COLUMN TYPE. This patch corrects both problems by giving the appropriate default width to CHAR and preserving the distinction between the various string types in column descriptors and thus introspection. Additionally, a width constraint check on collated string columns was missing. This is now fixed too. Tables and columns created prior to this patch are unaffected. Release note (bug fix): CockroachDB now distinguishes "CHAR" and "VARCHAR" as mandated by the SQL standard and PostgreSQL compatibility. When a width is not specified (e.g. `CHAR(3)`), the maximum width of `VARCHAR` remains unconstrained whereas the maximum width for `CHAR` is now 1 character. Release note (bug fix): CockroachDB now properly checks the width of strings inserted in a collated string column with a specified width. Release note (sql change): CockroachDB now preserves the distinction between different column types for string values like in PostgreSQL, for compatibility with 3rd party tools and ORMs.
05ebd0b to
9c63fe4
Compare
29006: backport-2.1: fix the handling of SQL types r=knz a=knz Backport 1/1 commits from #28937. Backport 1/1 commits from #28942. Backport 17/17 commits from #28943. Backport 1/1 commits from #28944. Backport 1/1 commits from #28945. Backport 1/1 commits from #28949. Backport 2/2 commits from #28950. Backport 1/1 commits from #28951 (#28959). Backport 1/1 commits from #28952 (#28959). Backport 1/1 commits from #28955 (#28959). Backport 1/1 commits from #28959. Backport 1/1 commits from #28959. Backport 1/1 commits from #28690. cc @cockroachdb/release Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
29006: backport-2.1: fix the handling of SQL types r=knz a=knz Backport 1/1 commits from #28937. Backport 1/1 commits from #28942. Backport 17/17 commits from #28943. Backport 1/1 commits from #28944. Backport 1/1 commits from #28945. Backport 1/1 commits from #28949. Backport 2/2 commits from #28950. Backport 1/1 commits from #28951 (#28959). Backport 1/1 commits from #28952 (#28959). Backport 1/1 commits from #28955 (#28959). Backport 1/1 commits from #28959. Backport 1/1 commits from #28959. Backport 1/1 commits from #28690. cc @cockroachdb/release Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
First commits from #28950 and priors.
Forked from #28690.
This patch corrects two long-standing bugs in CockroachDB which were
confusing both 3rd party tools/ORMs and users:
the SQL standard type called "CHAR" is supposed to have a default
maximum width of 1 character.
Prior to this patch, CockroachDB did not support this and instead
set no default maximum width on CHAR. This would cause CockroachDB
to fail validating the width of values inserted in tables, fail
to constrain the width of values during casts to CHAR, and report
incorrect information in introspection tables.
PostgresSQL's dialect distinguishes the table column types TEXT,
CHAR, VARCHAR and a special thing called
"char"(this is a legacyPostgreSQL type which is, unfortunately, still used by some
pg_catalog tables).
Prior to this patch, CockroachDB would map all of these types into
the same column type "STRING". In turn this caused them to show
up as "text" in introspection.
While this was not incorrect from the perspective of value
handling (all these types are, indeed, handled with about the same
value semantics in both PostgreSQL and CockroachDB), ORMs
do pay attention to this distinction and become confused if they
see a different type in introspection (e.g. "text") than what
the application model requires (e.g. "char"). The result of this
confusion was to trigger a schema change to adapt the type, which
in turn fails due to missing features in ALTER COLUMN TYPE.
This patch corrects both problems by giving the appropriate default
width to CHAR and preserving the distinction between the various string
types in column descriptors and thus introspection.
Additionally, a width constraint check on collated string columns was
missing. This is now fixed too.
Tables and columns created prior to this patch are unaffected.
Release note (bug fix): CockroachDB now distinguishes "CHAR" and
"VARCHAR" as mandated by the SQL standard and PostgreSQL
compatibility. When a width is not specified (e.g.
CHAR(3)), themaximum width of
VARCHARremains unconstrained whereas the maximumwidth for
CHARis now 1 character.Release note (bug fix): CockroachDB now properly checks the width of
strings inserted in a collated string column with a specified width.
Release note (sql change): CockroachDB now preserves the distinction
between different column types for string values like in PostgreSQL,
for compatibility with 3rd party tools and ORMs.