-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](catalog) support nested namespaces of iceberg #56415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
76fdf45 to
633b4de
Compare
0d250c1 to
464df8f
Compare
|
run buildall |
ClickBench: Total hot run time: 30.54 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.61 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.29 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
### What problem does this PR solve? Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
### What problem does this PR solve? Followup #56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
### What problem does this PR solve? Followup #56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
Followup apache#56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
What problem does this PR solve?
Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table.
Iceberg support nested namespaces, which means the following namespaces are valid:
So we need to support mapping nested namespace to Doris' database.
This PR add a global variable
enable_nested_namespaceto control this behavior.Default is
false, and no logic is changed.If set to true, Doris can support following statments:
But, I can execute statement like:
I don't know why, there is a very strange behavior in MySQL client, when adding back quota,
the INIT_DB command can only receive
nested.db1part, but expecticeberg.nested.db1.Also support creating nested database name in internal catalog:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)