Add new DataType Map(key,value)#15806
Conversation
|
@CurtizJ |
|
You can try to merge with fresh master.
|
|
Have you considered using separate columns for each key? In that case it would be easy to implement combinators and such stuff, since then it would be no need to collect values by keys first. |
In the first implement version for Map Datatype, I use tow array columns, one for key and the other for value. I think it's the simplest way to implement this. |
|
@CurtizJ I have finished coding. Can you review it when free? Thank you. |
|
Yes, I will try to review your PR by the end of the week. BTW, please, add tests with other table engines, at least with CREATE TABLE table_map(a Map(String, String)) ENGINE = MergeTree ORDER BY tuple();
insert into table_map values ({'name':'zhangsan', 'gender':'male'}), ({'name':'lisi', 'gender':'female'});stacktrace
|
|
@CurtizJ Ok, I will add tests with MergeTree Engine |
CurtizJ
left a comment
There was a problem hiding this comment.
Also need to implement function map (similar to array and tuple) to allow creation of maps not only for literals.
To allow something like this:
insert into table_map(m) select {'key1' : number, 'key2' : number * 2} from numbers(1000)insert into table_map(m) select map('key1', number, 'key2', number * 2) from numbers(1000)
src/Functions/array/arrayElement.cpp
Outdated
| } | ||
| else | ||
| { | ||
| // Default value for unmatched keys |
There was a problem hiding this comment.
It's better to use default value of type.
There was a problem hiding this comment.
how to distinguish the real value and the default value?
for example, Integer type's default value is 0
src/Functions/array/arrayElement.cpp
Outdated
| const DataTypePtr & key_type = (typeid_cast<const DataTypeArray *>(kv_types[0].get()))->getNestedType(); | ||
| const DataTypePtr & value_type = (typeid_cast<const DataTypeArray *>(kv_types[1].get()))->getNestedType(); | ||
|
|
||
| Field index = (*columns[arguments[1]].column)[0]; |
There was a problem hiding this comment.
Seems, that only const indices are supported. Need to add check for constness of argument. But it's better to implement support of non-const columns as index, like it's done in arrayElement function.
There was a problem hiding this comment.
ok, I will improve this.
src/Functions/array/arrayElement.cpp
Outdated
| } | ||
|
|
||
| template <typename DataType> | ||
| bool FunctionArrayElement::executeMappedValueNumber(const ColumnArray * column, std::vector<int> matched_idxs, |
There was a problem hiding this comment.
Need a const reference for vector.
| const ColumnArray::Offsets & offsets = column->getOffsets(); | ||
| size_t rows = offsets.size(); | ||
|
|
||
| for (size_t i = 0; i < rows; i++) |
There was a problem hiding this comment.
Since you know number of elements, it's better to do reserve for vector before loop for better performance.
src/DataTypes/DataTypeMap.cpp
Outdated
| std::string DataTypeMap::doGetName() const | ||
| { | ||
| WriteBufferFromOwnString s; | ||
| s << "Map(" << (typeid_cast<const DataTypeArray *>(keys.get()))->getNestedType()->getName() |
There was a problem hiding this comment.
No need to typeid_cast here. You have already saved key_type and value_type.
| ColumnMap::ColumnMap(MutableColumns && mutable_columns) | ||
| { | ||
| columns.reserve(mutable_columns.size()); | ||
| for (auto & column : mutable_columns) |
There was a problem hiding this comment.
Here you can init ColumnMap::columns to have less than 2 columns, that would cause a crash at run-time in say ColumnMap::cloneEmpty() or any other method that assumes that there are exactly 2 columns.
There was a problem hiding this comment.
Consider checking that each of mutable_columns is actually a ColumnArray.
SELECT CAST(([1, 2, 3], ['1', '2', 'foo']), 'Map(UInt8, String)') AS map
src/DataTypes/DataTypeMap.cpp
Outdated
|
|
||
| void DataTypeMap::serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const | ||
| { | ||
| writeChar('[', ostr); |
There was a problem hiding this comment.
It would probably be more natural to serialize Map as a JSON object.
src/Core/Field.cpp
Outdated
| UInt8 type; | ||
| DB::readBinary(type, buf); | ||
|
|
||
| switch (type) |
There was a problem hiding this comment.
Consider using Field::dispatch + overloaded functions, to avoid enumerating all Field types in many places.
There was a problem hiding this comment.
As an example, see writeFieldText or toString(Field) in Field.cpp.
Allow casting Tuple as Map.
|
@akuzm @CurtizJ @Enmk But this is repeated with query parameter. From the test I found the syntax of DataType Map is confused with below test cases:
3)01056_prepared_statements_null_and_escaping.sh 4)01015_insert_values_parametrized.sh
7)00954_client_prepared_statements.sh But can also be parsed as Map . |
|
@CurtizJ ready for review. |
|
@alexey-milovidov @CurtizJ After this PR is reviewed, in the next stage:
|
|
Thanks. I will review and merge soon. |
|
This PR is moved to #17829 |
|
🥇 |
Hi, |
|
@wangxin05 maybe this feature will be implemented in #23932 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new datatype Map for supporting storage k:v .
related to issue: #1841
1st version for Map only support String type of key and value.
Later I will implement Int and other type for key and value.
Detailed description / Documentation draft:
Add new datatype Map for supporting storage k:v .
related to issue: #1841
1st version for Map only support String type of key and value.
Later I will implement Int and other type for key and value.
Usage:
Addition:
PostgreSQL Json data type
postgresql does not support map data type but json, Operator for json in PostgreSQL:
And postgreSQL support other json functions like : xxx_to_json, json_each....
Hive Map data type
Usage: