Skip to content

Add storage module from google-cloud-cpp to build system#82881

Merged
pamarcos merged 42 commits intomasterfrom
google-cloud-cpp-storage-build
Jul 16, 2025
Merged

Add storage module from google-cloud-cpp to build system#82881
pamarcos merged 42 commits intomasterfrom
google-cloud-cpp-storage-build

Conversation

@pamarcos
Copy link
Copy Markdown
Member

@pamarcos pamarcos commented Jun 30, 2025

Preparation work for adding native GCS table function/engine

Add storage module from google-cloud-cpp to build system:

  • Clean CMake files to avoid so much commented code. WIN32 and unnecessary installation steps removed
  • Add nlohmann's JSON dependency to contrib
  • Add crc32c dependency to contrib. Seems everyone and their cousing integrates their own version of crc32 somehow. We have it within ClickHouse in rocksdb, arrow, libhdfs3, librdkafka and abseil. However, I didn't think it was clean to use a dependency of a dependency to build yet another dependency. I did try to use the abseil's one, though, but it required changes in google-cloud-cpp, so rather than modifying our fork, I thought it'd be simpler to maintain to just use google's take on crc32c.
  • Split ENABLE_GOOGLE_CLOUD_CPP for Storage (and others that may come in the future) and ENABLE_GOOGLE_CLOUD_CPP_KMS for KMS which is only used in private
  • Add ci-build label support to avoid running all tests. Reasoning is that in order to build darwin and freebsd, all tests are blocking them and sometimes (as in this case) we are only interested in builds

Check that gcloud_storage example builds and links for Linux, macOS and FreeBSD for archs x86_64 and aarch64

I added to build_clickhouse.py the DENABLE_EXAMPLES for all flavors to ensure they work ok. In macOS and FreeBSD the field example builds linking, but it's unrelated to these changes.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=82881&sha=73242ae58fc8157e4b60379d2f4bce2576537360&name_0=PR

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add storage module from google-cloud-cpp to build system

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 30, 2025

Workflow [PR], commit [e9c7461]

Summary:

job_name test_name status info comment
Build (arm_v80compat) failure
Build ClickHouse failure

@clickhouse-gh clickhouse-gh bot added pr-build Pull request with build/testing/packaging improvement submodule changed At least one submodule changed in this PR. labels Jun 30, 2025
- nlohmann's json
- crc32c

Also silence CMP0177 policy
@pamarcos pamarcos force-pushed the google-cloud-cpp-storage-build branch from f112099 to c6f3391 Compare June 30, 2025 06:59
This is just to check that the CI builds ok.
TODO: revert this commit before merging
@pamarcos pamarcos force-pushed the google-cloud-cpp-storage-build branch from c6f3391 to 6c95be2 Compare June 30, 2025 07:22
@pamarcos pamarcos marked this pull request as ready for review June 30, 2025 10:45
@azat azat self-assigned this Jun 30, 2025
@azat
Copy link
Copy Markdown
Member

azat commented Jun 30, 2025

In 6c95be2 I added a sample file to be compiled among the examples to ensure we can use the Google's storage API. I've reverted the commit instead of removing it completely for visibility. I can squash commits to remove that noise before the merge if needed:

Maybe we can enable it for debug builds only? (it should build all examples)

@pamarcos
Copy link
Copy Markdown
Member Author

Maybe we can enable it for debug builds only? (it should build all examples)

That makes sense. I've created a separate PR to address that: #82905
I added the GCS sample just to confirm that my changes can link a binary that uses the SDK. Once I start working on the implementation, that example doesn't make any sense anymore, which is why I think it's not worth adding it into the repo since it'll get obsolete in the next PR.

@pamarcos pamarcos requested a review from azat July 2, 2025 09:49
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pamarcos haven't you forgot to include example (revert of revert) or you decided not to?

In general LGTM, but I think that crc32c cmake rules should be simplified.

pamarcos added 3 commits July 2, 2025 10:32
Originally it was a copy of a subset of crc32c's CMake.
I didn't want to change too much to ease syncing, but
let's tidy it up to keep unecessary stuff out.
@pamarcos
Copy link
Copy Markdown
Member Author

pamarcos commented Jul 2, 2025

haven't you forgot to include example (revert of revert) or you decided not to?

@azat I reverted the example on purpose because as it is right now we're not even building google-cloud-cpp anyway. It's only built for ClickHouse Cloud. I'll have to modify that in my next PR once we have something in public that actually uses the SDK, though.

I've brought back the example after simplifying the crc32c's CMake to ensure it's still working for all flavors, but my idea is to revert it once again once the CI is green.

but I think that crc32c cmake rules should be simplified.

And you are right. I did it following your comments. Thanks for the thoughtful review on that.

@azat
Copy link
Copy Markdown
Member

azat commented Jul 3, 2025

@azat I reverted the example on purpose because as it is right now we're not even building google-cloud-cpp anyway. It's only built for ClickHouse Cloud. I'll have to modify that in my next PR once we have something in public that actually uses the SDK, though.

I've brought back the example after simplifying the crc32c's CMake to ensure it's still working for all flavors, but my idea is to revert it once again once the CI is green.

Yep, but what I meant is that it will be nice to have some binary that will use storage API to ensure that everything at least links correctly (since clickhouse binary does not use storage API AFAIU)

But I am not insist, I'm OK with your approach as well.

@pamarcos pamarcos marked this pull request as ready for review July 15, 2025 10:13
@pamarcos
Copy link
Copy Markdown
Member Author

pamarcos commented Jul 15, 2025

@azat, @rschu1ze please take another look whenever you can.
In https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=82881&sha=d40f814abea63affd58b3e048b196c9976e3c78f&name_0=PR there's the proof that gcloud_storage builds and links successfully for Linux, macOS (x86_64 and aarch64) and FreeBSD.

Of course, ai-cpp-sdk was landed recently and it include yet another own version of nlohmann's JSON. But seems to be a patched version, so I decided to change its name to avoid a clash.

pamarcos added 8 commits July 15, 2025 15:42
| /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/crc32c/src/crc32c_arm64.cc:87:20: error: always_inline function 'vmull_p64' requires target feature 'aes', but would be inlined into function 'ExtendArm64' that is compiled without support for 'aes'
|    87 |     t1 = (uint64_t)vmull_p64(crc1, k1);
|       |                    ^
| /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/crc32c/src/crc32c_arm64.cc:88:20: error: always_inline function 'vmull_p64' requires target feature 'aes', but would be inlined into function 'ExtendArm64' that is compiled without support for 'aes'
|    88 |     t0 = (uint64_t)vmull_p64(crc0, k0);
|       |                    ^
@pamarcos pamarcos added this pull request to the merge queue Jul 16, 2025
Merged via the queue into master with commit 7c158a0 Jul 16, 2025
2 of 119 checks passed
@pamarcos pamarcos deleted the google-cloud-cpp-storage-build branch July 16, 2025 14:01
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 16, 2025
baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 19, 2025
baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 20, 2025
baibaichen pushed a commit to apache/gluten that referenced this pull request Jul 20, 2025
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250720)

* Fix gtest due to ClickHouse/ClickHouse#83599

* fxi build due to ClickHouse/ClickHouse#83631

* Fix Build due to ClickHouse/ClickHouse#83294

* Fix Build due to ClickHouse/ClickHouse#82881

---------

Co-authored-by: kyligence-git <gluten@kyligence.io>
Co-authored-by: Chang chen <chenchang@apache.com>
@pamarcos pamarcos mentioned this pull request Aug 18, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-build CI with just build jobs, style check and fast test pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants