Skip to content

Conversation

@alexreid-db
Copy link
Collaborator

@alexreid-db alexreid-db commented Jun 19, 2024

PR Checklist

  • A description of the changes is added to the description of this PR.
  • If there is a related issue, make sure it is linked to this PR.
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added or modified a feature, documentation in docs is updated

Description of changes

  • Adds registerTable endpoint to Iceberg REST catalog service
  • Adds "ICEBERG" as a data source format

This makes a native iceberg table more "top-level" like a delta table and adds the ability to register/store an existing native iceberg table in UC (instead of UC considering an iceberg table just a delta table with uniform enabled, i.e. having a uniform iceberg metadata location set).

Here's an example of it working locally with OSS Spark using the iceberg-spark 'register_table' procedure (Note: you need to manually create the new schema/namespace using cli first):

spark-sql (default)> create table hadoop.uctest.table1 (id int);
Time taken: 0.501 seconds
spark-sql (default)> insert into hadoop.uctest.table1 values (1),(2),(3);
Time taken: 0.919 seconds
spark-sql (default)> select * from hadoop.uctest.table1;
1
2
3
Time taken: 0.065 seconds, Fetched 3 row(s)
spark-sql (default)> select * from iceberg.unity.test.table1;
...
[TABLE_OR_VIEW_NOT_FOUND] The table or view `iceberg`.`unity`.`test`.`table1` cannot be found. Verify the spelling and correctness of the schema and catalog.
...
spark-sql (default)> CALL iceberg.system.register_table (table => 'unity.test.ice1', metadata_file => 'file:///tmp/warehouse/uctest/table1/metadata/v2.metadata.json');
8507157311815341168	3	3
Time taken: 0.034 seconds, Fetched 1 row(s)
spark-sql (default)> select * from iceberg.unity.test.ice1;
1
2
3
Time taken: 0.269 seconds, Fetched 3 row(s)

Spark conf for above example:

spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1,org.apache.iceberg:iceberg-aws-bundle:1.5.1

spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.catalog-impl org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.iceberg.uri http://127.0.0.1:8080/api/2.1/unity-catalog/iceberg
spark.sql.catalog.iceberg.token not_used

spark.sql.catalog.hadoop org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.hadoop.type hadoop
spark.sql.catalog.hadoop.warehouse /tmp/warehouse

Related Issue

Addresses part of adding additional Iceberg REST Endpoints #3

Testing

Test cases for the above mentioned scenarios have been added and are running fine.

@alexreid-db alexreid-db force-pushed the iceberg-register-table-impl branch from e80b0af to 64f4abe Compare June 24, 2024 16:34
@alexreid-db alexreid-db marked this pull request as ready for review June 24, 2024 16:36
dennyglee pushed a commit that referenced this pull request Sep 2, 2024
* start of login page

* env and google auth button

* merge with main, remove params reference

* start of okta auth

* initial commit for handling auth token (#67)

* start of login with keycloak

* handle google sign in with token

* more google auth

* profile dropdown

* merge with main

* merge with main

* convert to axios

* start of readme instructions

* get current user endpoint (#70)

clean up some other endpoints

* commenting out UI until repositories are merged

* clean up current user (#74)

* yarn lock file

* remove keycloak for now, node version error in jwt-decode dependency

* commit yarn lock

* remove state as useEffect dependency, comment out currentUser call for now

---------

Co-authored-by: Xiang Xu <xiang.xu@databricks.com>
tdas pushed a commit that referenced this pull request Sep 5, 2024
* start of login page

* env and google auth button

* merge with main, remove params reference

* start of okta auth

* initial commit for handling auth token (#67)

* start of login with keycloak

* handle google sign in with token

* more google auth

* profile dropdown

* merge with main

* merge with main

* convert to axios

* start of readme instructions

* get current user endpoint (#70)

clean up some other endpoints

* commenting out UI until repositories are merged

* clean up current user (#74)

* yarn lock file

* remove keycloak for now, node version error in jwt-decode dependency

* commit yarn lock

* remove state as useEffect dependency, comment out currentUser call for now

---------

Co-authored-by: Xiang Xu <xiang.xu@databricks.com>
rtyler pushed a commit to rtyler/unitycatalog that referenced this pull request Sep 5, 2024
* start of login page

* env and google auth button

* merge with main, remove params reference

* start of okta auth

* initial commit for handling auth token (unitycatalog#67)

* start of login with keycloak

* handle google sign in with token

* more google auth

* profile dropdown

* merge with main

* merge with main

* convert to axios

* start of readme instructions

* get current user endpoint (unitycatalog#70)

clean up some other endpoints

* commenting out UI until repositories are merged

* clean up current user (unitycatalog#74)

* yarn lock file

* remove keycloak for now, node version error in jwt-decode dependency

* commit yarn lock

* remove state as useEffect dependency, comment out currentUser call for now

---------

Co-authored-by: Xiang Xu <xiang.xu@databricks.com>
@nicor88
Copy link

nicor88 commented Nov 24, 2024

@dennyglee this is a pretty amazing feature, is there any plan to have the "register_table" endpoint available?
This will unlock some use cases where iceberg tables are available in another catalog, but make them available (even if read-only) in unity catalog (databricks and not)

@dennyglee
Copy link
Contributor

@dennyglee this is a pretty amazing feature, is there any plan to have the "register_table" endpoint available? This will unlock some use cases where iceberg tables are available in another catalog, but make them available (even if read-only) in unity catalog (databricks and not)

Hey @nicor88 - I don't believe we have a plan around this yet but it would be great if you're interested helping/documenting a proposed design for this if you're up for it? I think that's a great idea, eh?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants