Skip to content

feat(storage-azdls): pick Add Azure Datalake Storage support (#1368)#74

Merged
xxhZs merged 4 commits intodev_rebase_main_20250325from
li0k/pick_b5b8aa8_to_0325
Sep 29, 2025
Merged

feat(storage-azdls): pick Add Azure Datalake Storage support (#1368)#74
xxhZs merged 4 commits intodev_rebase_main_20250325from
li0k/pick_b5b8aa8_to_0325

Conversation

@Li0k
Copy link
Copy Markdown

@Li0k Li0k commented Sep 23, 2025

@Li0k Li0k requested review from chenzl25 and xxhZs September 23, 2025 07:58
@Li0k Li0k force-pushed the li0k/pick_b5b8aa8_to_0325 branch from c6edcdd to 2a50069 Compare September 23, 2025 09:06
- Closes apache#1360.

This PR adds an integration for the Azure Datalake storage service. At
it's core, it adds parsing logic for configuration properties. The
finished config struct is simply passed down to OpenDAL. In addition it
adds logic to parse fully qualified file URIs, and matches it against
expected (previously configured) values.

It also creates a new `Storage::Azdls` enum variant based on OpenDAL's
existing `Scheme::Azdls` enum variant. It then fits the parsing logic
into the existing framework to build the storage integration from an
`io::FileIOBuilder`.

Other Iceberg ADLS integrations ([pyiceberg +
Java](https://github.com/apache/iceberg-go/pull/313/files#r2021460617))
also support the `wasb://` and `wasbs://` schemes.
WASB refers to a client-side implementation of hierarchical namespaces
on top of Blob Storage. ADLS(v2) on the other hand is a service offered
by Azure, also built on top of Blob Storage.
IIUC we can accept both schemes because objects written to Blob Storage
via `wasb://` will also be accessible via `adfs://` (which operates on
the same Blob Storage).
Even though the URIs slightly differ in format when they refer to the
same object, we can largely reuse existing logic.
```diff
-wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
+adfs[s]://<filesystemname>@<accountname>.dfs.core.windows.net/<path>
```

I added minor unit tests to validate the configuration property parsing
logic.

I decided **not** to add integration tests because
1. ADLS is not S3-compatible which means that we can't reuse our Minio
setup
2. the Azure-specific alternative to local testing - Azurite - doesn't
support ADLS

I have yet to test it in a functioning environment.

---------

Signed-off-by: Jannik Steinmann <jannik.steinmann@datadoghq.com>
@Li0k Li0k force-pushed the li0k/pick_b5b8aa8_to_0325 branch from 2a50069 to af79eae Compare September 23, 2025 09:11
Copy link
Copy Markdown
Collaborator

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thanks!

Copy link
Copy Markdown
Collaborator

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment on lines -155 to -158
pub(crate) fn s3_config_build(
client: &reqwest::Client,
cfg: &S3Config,
path: &str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the crates/iceberg/src/io/storage_s3.rs unchanged, since the client inside the S3 is used to optimize the performance.

@xxhZs xxhZs merged commit 8c4a29f into dev_rebase_main_20250325 Sep 29, 2025
13 of 21 checks passed
@xxhZs xxhZs deleted the li0k/pick_b5b8aa8_to_0325 branch September 29, 2025 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants