| title | STRING |
|---|---|
| summary | The STRING data type stores a string of Unicode characters. |
| toc | true |
| docs_area | reference.sql |
The STRING [data type]({% link {{ page.version.version }}/data-types.md %}) stores a string of Unicode characters.
{{site.data.alerts.callout_info}}
STRING is not a data type supported by PostgreSQL. For PostgreSQL compatibility, CockroachDB supports additional aliases and STRING-related types.
{{site.data.alerts.end}}
CockroachDB supports the following alias for STRING:
TEXT
For PostgreSQL compatibility, CockroachDB supports the following STRING-related types and their aliases:
VARCHAR(and aliasCHARACTER VARYING)CHAR(and aliasCHARACTER)NAME
These types are functionally identical to STRING.
CockroachDB also supports the single-byte "char" special character type. As in PostgreSQL, this special type is intended for internal use in [system catalogs]({% link {{ page.version.version }}/system-catalogs.md %}), and has a storage size of 1 byte. CockroachDB truncates all values of type "char" to a single character.
To limit the length of a string column, use STRING(n), where n is the maximum number of Unicode code points (normally thought of as "characters") allowed. This applies to all related types as well (e.g., to limit the length of a VARCHAR type, use VARCHAR(n)). To reduce performance issues caused by storing very large string values in indexes, Cockroach Labs recommends setting length limits on string-typed columns.
{{site.data.alerts.callout_danger}} {% include {{page.version.version}}/sql/add-size-limits-to-indexed-columns.md %} {{site.data.alerts.end}}
When inserting a STRING value or a STRING-related-type value:
-
If the value is cast with a length limit (e.g.,
CAST('hello world' AS STRING(5))), CockroachDB truncates to the limit. This applies toSTRING(n)and all related types. -
If the value exceeds the column's length limit, CockroachDB returns an error. This applies to
STRING(n)and all related types. -
For
STRING(n)andVARCHAR(n)/CHARACTER VARYING(n)types, if the value is under the column's length limit, CockroachDB does not add space padding to the end of the value. -
For
CHAR(n)/CHARACTER(n)types, if the value is under the column's length limit, CockroachDB adds space padding from the end of the value to the length limit.Type | Length
--------------------------------------------------|------------------------------
CHARACTER, CHARACTER(n), CHAR, CHAR(n) | Fixed-length
CHARACTER VARYING(n), VARCHAR(n), STRING(n) | Variable-length, with a limit
TEXT, VARCHAR, CHARACTER VARYING, STRING | Variable-length, with no limit
"char" (special type) | 1 byte
A value of type STRING can be expressed using a variety of formats.
See [string literals]({% link {{ page.version.version }}/sql-constants.md %}#string-literals) for more details.
When printing out a STRING value in the [SQL shell]({% link {{ page.version.version }}/cockroach-sql.md %}), the shell uses the simple
SQL string literal format if the value doesn't contain special character,
or the escaped format otherwise.
STRING values accept [collations]({% link {{ page.version.version }}/collate.md %}), which lets you sort strings according to language- and country-specific rules.
The size of a STRING value is variable, but it's recommended to keep values under 64 kilobytes to ensure performance. Above that threshold, [write amplification]({% link {{ page.version.version }}/architecture/storage-layer.md %}#write-amplification) and other considerations may cause significant performance degradation.
{% include_cached copy-clipboard.html %}
> CREATE TABLE strings (a STRING PRIMARY KEY, b STRING(4), c TEXT);{% include_cached copy-clipboard.html %}
> SHOW COLUMNS FROM strings; column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
--------------+-----------+-------------+----------------+-----------------------+-----------+------------
a | STRING | false | NULL | | {primary} | false
b | STRING(4) | true | NULL | | {primary} | false
c | STRING | true | NULL | | {primary} | false
(3 rows)
{% include_cached copy-clipboard.html %}
> INSERT INTO strings VALUES ('a1b2c3d4', 'e5f6', 'g7h8i9');{% include_cached copy-clipboard.html %}
> SELECT * FROM strings; a | b | c
+----------+------+--------+
a1b2c3d4 | e5f6 | g7h8i9
(1 row)
{% include_cached copy-clipboard.html %}
> CREATE TABLE aliases (a STRING PRIMARY KEY, b VARCHAR, c CHAR);{% include_cached copy-clipboard.html %}
> SHOW COLUMNS FROM aliases; column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
+-------------+-----------+-------------+----------------+-----------------------+-------------+-----------+
a | STRING | false | NULL | | {primary} | false
b | VARCHAR | true | NULL | | {primary} | false
c | CHAR | true | NULL | | {primary} | false
(3 rows)
STRING values can be [cast]({% link {{ page.version.version }}/data-types.md %}#data-type-conversions-and-casts) to any of the following data types:
| Type | Details |
|---|---|
ARRAY |
Requires supported [ARRAY]({% link {{ page.version.version }}/array.md %}) string format, e.g., '{1,2,3}'.Note that string literals can be implicitly cast to any supported ARRAY data type except [BYTES]({% link {{ page.version.version }}/bytes.md %}), [ENUM]({% link {{ page.version.version }}/enum.md %}), [JSONB]({% link {{ page.version.version }}/jsonb.md %}), SERIAL, and the spatial data types Box2D, GEOGRAPHY, and GEOMETRY. |
BIT |
Requires supported [BIT]({% link {{ page.version.version }}/bit.md %}) string format, e.g., '101001' or 'xAB'. |
BOOL |
Requires supported [BOOL]({% link {{ page.version.version }}/bool.md %}) string format, e.g., 'true'. |
BYTES |
For more details, [see here]({% link {{ page.version.version }}/bytes.md %}#supported-conversions). |
CITEXT |
Preserves the original letter case, but value comparisons are treated case-insensitively. Refer to [CITEXT]({% link {{ page.version.version }}/citext.md %}). |
DATE |
Requires supported [DATE]({% link {{ page.version.version }}/date.md %}) string format, e.g., '2016-01-25'. |
DECIMAL |
Requires supported [DECIMAL]({% link {{ page.version.version }}/decimal.md %}) string format, e.g., '1.1'. |
FLOAT |
Requires supported [FLOAT]({% link {{ page.version.version }}/float.md %}) string format, e.g., '1.1'. |
INET |
Requires supported [INET]({% link {{ page.version.version }}/inet.md %}) string format, e.g, '192.168.0.1'. |
INT |
Requires supported [INT]({% link {{ page.version.version }}/int.md %}) string format, e.g., '10'. |
INTERVAL |
Requires supported [INTERVAL]({% link {{ page.version.version }}/interval.md %}) string format, e.g., '1h2m3s4ms5us6ns'. |
JSONPATH |
Requires a valid [JSONPath]({% link {{ page.version.version }}/jsonpath.md %}) expression string, e.g., '$' or '$.players[*] ? (@.stats.ppg > 30)'. |
LTREE |
Requires supported [LTREE]({% link {{ page.version.version }}/ltree.md %}) string format, e.g., 'Animals.Mammals.Carnivora'. |
TIME |
Requires supported [TIME]({% link {{ page.version.version }}/time.md %}) string format, e.g., '01:22:12' (microsecond precision). |
TIMESTAMP |
Requires supported [TIMESTAMP]({% link {{ page.version.version }}/timestamp.md %}) string format, e.g., '2016-01-25 10:10:10.555555'. |
TSQUERY |
Requires supported [TSQUERY]({% link {{ page.version.version }}/tsquery.md %}) string format, e.g., 'Requires & supported & TSQUERY & string & format'.Note that casting a string to a TSQUERY will not normalize the tokens into lexemes. To do so, use to_tsquery(), plainto_tsquery(), or phraseto_tsquery(). |
TSVECTOR |
Requires supported [TSVECTOR]({% link {{ page.version.version }}/tsvector.md %}) string format, e.g., 'Requires supported TSVECTOR string format.'.Note that casting a string to a TSVECTOR will not normalize the tokens into lexemes. To do so, use to_tsvector(). |
UUID |
Requires supported [UUID]({% link {{ page.version.version }}/uuid.md %}) string format, e.g., '63616665-6630-3064-6465-616462656562'. |
While both STRING and BYTES can appear to have similar behavior in many situations, one should understand their nuance before casting one into the other.
STRING treats all of its data as characters, or more specifically, Unicode code points. BYTES treats all of its data as a byte string. This difference in implementation can lead to dramatically different behavior. For example, let's take a complex Unicode character such as ☃ (the snowman emoji):
{% include_cached copy-clipboard.html %}
> SELECT length('☃'::string); length
+--------+
1
> SELECT length('☃'::bytes); length
+--------+
3
In this case, [LENGTH(string)]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) measures the number of Unicode code points present in the string, whereas [LENGTH(bytes)]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) measures the number of bytes required to store that value. Each character (or Unicode code point) can be encoded using multiple bytes, hence the difference in output between the two.
A literal entered through a SQL client will be translated into a different value based on the type:
BYTESgives a special meaning to the pair\xat the beginning, and translates the rest by substituting pairs of hexadecimal digits to a single byte. For example,\xffis equivalent to a single byte with the value of 255. For more information, see [SQL Constants: String literals with character escapes]({% link {{ page.version.version }}/sql-constants.md %}#string-literals-with-character-escapes).STRINGdoes not give a special meaning to\x, so all characters are treated as distinct Unicode code points. For example,\xffis treated as aSTRINGwith length 4 (\,x,f, andf).
You can cast a STRING value of hexadecimal digits prefixed by x or X to a BIT value.
For example:
{% include_cached copy-clipboard.html %}
> SELECT 'XAB'::BIT(8) bit
------------
10101011
(1 row)
STRING values can be concatenated with any non-ARRAY, non-NULL type, resulting in a STRING value.
For example:
{% include_cached copy-clipboard.html %}
> SELECT 1 || 'item'; ?column?
------------
1item
(1 row)
{% include_cached copy-clipboard.html %}
> SELECT true || 'item'; ?column?
------------
titem
(1 row)
Concatenating a STRING value with a [NULL value]({% link {{ page.version.version }}/null-handling.md %}) results in a NULL value.
For example:
{% include_cached copy-clipboard.html %}
> SELECT NULL || 'item'; ?column?
------------
NULL
(1 row)
You can use the [parse_timestamp() function]({% link {{ page.version.version }}/functions-and-operators.md %}) to parse strings in TIMESTAMP format.
{% include_cached copy-clipboard.html %}
SELECT parse_timestamp ('2022-05-28T10:53:25.160Z'); parse_timestamp
--------------------------
2022-05-28 10:53:25.16
(1 row)
You can use the [to_tsvector() function]({% link {{ page.version.version }}/functions-and-operators.md %}#full-text-search-functions) to parse strings in [TSVECTOR]({% link {{ page.version.version }}/tsvector.md %}) format. This will normalize the tokens into lexemes, and will add an integer position to each lexeme.
{% include_cached copy-clipboard.html %}
SELECT to_tsvector('How do trees get on the internet?'); to_tsvector
---------------------------------
'get':4 'internet':7 'tree':3
For more information on usage, see [Full-Text Search]({% link {{ page.version.version }}/full-text-search.md %}).
You can use the [to_tsquery(), plainto_tsquery(), and phraseto_tsquery() functions]({% link {{ page.version.version }}/functions-and-operators.md %}#full-text-search-functions) to parse strings in [TSQUERY]({% link {{ page.version.version }}/tsquery.md %}) format. This will normalize the tokens into lexemes.
When using to_tsquery(), the string input must be formatted as a [TSQUERY]({% link {{ page.version.version }}/tsquery.md %}#syntax), with operators separating tokens.
{% include_cached copy-clipboard.html %}
SELECT to_tsquery('How & do & trees & get & on & the & internet?'); to_tsquery
-------------------------------
'tree' & 'get' & 'internet'
For more information on usage, see [Full-Text Search]({% link {{ page.version.version }}/full-text-search.md %}).
- [Data Types]({% link {{ page.version.version }}/data-types.md %})
- [String literal syntax]({% link {{ page.version.version }}/sql-constants.md %}#string-literals)