Skip to content

Conversation

@coryan
Copy link
Contributor

@coryan coryan commented Sep 29, 2021

This adds the GCS library to ThirdpartyToolchain, and enables the
library in some of the builds. A future PR will add the library to the
package manager configuration files and to any remaining builds where it
needs to go.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@coryan coryan marked this pull request as ready for review September 29, 2021 16:31
@coryan
Copy link
Contributor Author

coryan commented Sep 29, 2021

The "Java JNI" test passes locally with archery docker run debian-java-jni. To me, it seems unlikely that my PR caused the failure, but it is possible I do not understand something here.

The appveyor failure seems to be a race condition in a test. The test has a SleepFor(0.01) which is generally a code smell.

In any case, if I am wrong do let me know and I will investigate / change things as needed.

This adds the GCS library to ThirdpartyToolchain, and enables the
library in some of the builds. A future PR will add the library to the
package manager configuration files and to any remaining builds where it
needs to go.
@coryan
Copy link
Contributor Author

coryan commented Sep 30, 2021

/cc: @kou or @pitrou are you the right folks to look at this PR? If so, please take a look when you have a chance.

-DCMAKE_CXX_STANDARD=11
"-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}"
-DCMAKE_INSTALL_LIBDIR=lib
"-DCMAKE_INSTALL_PREFIX=<INSTALL_DIR>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use EP_COMMON_CMAKE_ARGS here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also did the same for nlohmann_json (add EP_COMMON_CMAKE_ARGS and sort the arguments), but maybe it has no effect there.

-DCMAKE_INSTALL_RPATH=$ORIGIN
-DCMAKE_INSTALL_LIBDIR=lib
-DGOOGLE_CLOUD_CPP_ENABLE=storage
-DBUILD_TESTING=OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you sort this list in alphabetical order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 3682 to 3691
if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO")
find_package(google_cloud_cpp_storage QUIET)
if(NOT google_cloud_cpp_storage_FOUND)
build_google_cloud_cpp()
endif()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED")
build_google_cloud_cpp()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM")
find_package(google_cloud_cpp_storage REQUIRED)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use resolve_dependency() here?

Suggested change
if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO")
find_package(google_cloud_cpp_storage QUIET)
if(NOT google_cloud_cpp_storage_FOUND)
build_google_cloud_cpp()
endif()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED")
build_google_cloud_cpp()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM")
find_package(google_cloud_cpp_storage REQUIRED)
endif()
resolve_dependency(google_cloud_cpp_storage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. My suggestion was insufficient.

We need to add google_cloud_cpp_storage entry to build_dependency():

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 36c8c33ab..e12abefc5 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -151,6 +151,8 @@ macro(build_dependency DEPENDENCY_NAME)
     build_gflags()
   elseif("${DEPENDENCY_NAME}" STREQUAL "GLOG")
     build_glog()
+  elseif("${DEPENDENCY_NAME}" STREQUAL "google_cloud_cpp_storage")
+    build_google_cloud_cpp_storage()
   elseif("${DEPENDENCY_NAME}" STREQUAL "gRPC")
     build_grpc()
   elseif("${DEPENDENCY_NAME}" STREQUAL "GTest")

And we should use google_cloud_cpp_storage_SOURCE and build_google_cloud_cpp_storage for variable/function names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think.

Comment on lines 3511 to 3512
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
"-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove them because EP_COMMON_CMAKE_ARGS include them?

Suggested change
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
"-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

PROPERTIES IMPORTED_LOCATION
"${GOOGLE_CLOUD_CPP_STATIC_LIBRARY_COMMON}"
INTERFACE_INCLUDE_DIRECTORIES
"${GOOGLE_CLOUD_CPP_INSTALL_PREFIX}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo?

Suggested change
"${GOOGLE_CLOUD_CPP_INSTALL_PREFIX}")
"${GOOGLE_CLOUD_CPP_INCLUDE_DIR}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

Threads::Threads
OpenSSL::SSL
OpenSSL::Crypto)
add_dependencies(google-cloud-cpp::storage GOOGLE_CLOUD_CPP_ep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use lowercase here?

Suggested change
add_dependencies(google-cloud-cpp::storage GOOGLE_CLOUD_CPP_ep)
add_dependencies(google-cloud-cpp::storage google_cloud_cpp_ep)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 3682 to 3691
if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO")
find_package(google_cloud_cpp_storage QUIET)
if(NOT google_cloud_cpp_storage_FOUND)
build_google_cloud_cpp()
endif()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED")
build_google_cloud_cpp()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM")
find_package(google_cloud_cpp_storage REQUIRED)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. My suggestion was insufficient.

We need to add google_cloud_cpp_storage entry to build_dependency():

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 36c8c33ab..e12abefc5 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -151,6 +151,8 @@ macro(build_dependency DEPENDENCY_NAME)
     build_gflags()
   elseif("${DEPENDENCY_NAME}" STREQUAL "GLOG")
     build_glog()
+  elseif("${DEPENDENCY_NAME}" STREQUAL "google_cloud_cpp_storage")
+    build_google_cloud_cpp_storage()
   elseif("${DEPENDENCY_NAME}" STREQUAL "gRPC")
     build_grpc()
   elseif("${DEPENDENCY_NAME}" STREQUAL "GTest")

And we should use google_cloud_cpp_storage_SOURCE and build_google_cloud_cpp_storage for variable/function names.

set(CRC32C_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/crc32c_ep-install")
set(CRC32C_CMAKE_ARGS
${EP_COMMON_CMAKE_ARGS}
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. FWIW, there are many other uses of this in the file, you may want to clean that up in a separate bug+PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

${EP_COMMON_CMAKE_ARGS}
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
-DCMAKE_CXX_STANDARD=11
"-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-DCMAKE_CXX_FLAGS=${EP_CXX_FLAGS}"

set(GOOGLE_CLOUD_CPP_CMAKE_ARGS
${EP_COMMON_CMAKE_ARGS}
-DBUILD_TESTING=OFF
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 3685 to 3694
if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO")
find_package(google_cloud_cpp_storage QUIET)
if(NOT google_cloud_cpp_storage_FOUND)
build_google_cloud_cpp_storage()
endif()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED")
build_google_cloud_cpp_storage()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM")
find_package(google_cloud_cpp_storage REQUIRED)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_dependency() does this.

Suggested change
if(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "AUTO")
find_package(google_cloud_cpp_storage QUIET)
if(NOT google_cloud_cpp_storage_FOUND)
build_google_cloud_cpp_storage()
endif()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "BUNDLED")
build_google_cloud_cpp_storage()
elseif(GOOGLE_CLOUD_CPP_SOURCE STREQUAL "SYSTEM")
find_package(google_cloud_cpp_storage REQUIRED)
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thanks. Removed.

ARROW_WITH_ZSTD=ON \
ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \
AWSSDK_SOURCE=BUNDLED \
GOOGLE_CLOUD_CPP_SOURCE=BUNDLED \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GOOGLE_CLOUD_CPP_SOURCE=BUNDLED \
google_cloud_cpp_storage_SOURCE=BUNDLED \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that this could be a nuisance if you ever want to use Bigtable or Spanner as data sources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to implement custom resolve_dependency() like feature for them when we also want to use other Google Cloud C++ libraries.

If Google Cloud C++ provides google_cloud_cpp CMake package with storage, bigtable and so on components, we can use google_cloud_cpp_SOURCE and find_package(google_cloud_cpp COMPONENTS storage bigtable ...).

-DCMAKE_INSTALL_PREFIX=${CMAKE_INSTALL_PREFIX:-${ARROW_HOME}} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD:-OFF} \
-Dgflags_SOURCE=${gflags_SOURCE:-} \
-DGOOGLE_CLOUD_CPP_SOURCE=${GOOGLE_CLOUD_CPP_SOURCE:-} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-DGOOGLE_CLOUD_CPP_SOURCE=${GOOGLE_CLOUD_CPP_SOURCE:-} \
-Dgoogle_cloud_cpp_storage_SOURCE=${google_cloud_cpp_storage_SOURCE:-} \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

-DCRC32C_BUILD_TESTS=OFF
-DCRC32C_BUILD_BENCHMARKS=OFF
-DCRC32C_USE_GLOG=OFF)
set(CRC32C_BUILD_BYPRODUCTS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(CRC32C_BUILD_BYPRODUCTS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, thanks. Removed.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks.

@kou kou closed this in 2a6f78b Oct 5, 2021
@kou
Copy link
Member

kou commented Oct 5, 2021

Oh, arm64 build on Travis CI is failed:

https://app.travis-ci.com/github/apache/arrow/jobs/541265250#L2124

FAILED: CMakeFiles/crc32c.dir/src/crc32c.cc.o 

/usr/bin/c++   -Iinclude -I/build/cpp/crc32c_ep-prefix/src/crc32c_ep/include -fdiagnostics-color=always -ggdb -O0 -g -fPIC -Wall -Wextra -Werror -fno-exceptions -fno-rtti   -fdiagnostics-color=always -ggdb -O0 -g -fPIC   -std=gnu++11 -MD -MT CMakeFiles/crc32c.dir/src/crc32c.cc.o -MF CMakeFiles/crc32c.dir/src/crc32c.cc.o.d -o CMakeFiles/crc32c.dir/src/crc32c.cc.o -c /build/cpp/crc32c_ep-prefix/src/crc32c_ep/src/crc32c.cc

In file included from /build/cpp/crc32c_ep-prefix/src/crc32c_ep/src/crc32c.cc:11:

/build/cpp/crc32c_ep-prefix/src/crc32c_ep/src/./crc32c_arm64_linux_check.h: In function 'bool crc32c::CanUseArm64Linux()':

/build/cpp/crc32c_ep-prefix/src/crc32c_ep/src/./crc32c_arm64_linux_check.h:36:37: error: the address of 'long unsigned int getauxval(long unsigned int)' will never be NULL [-Werror=address]

   36 |   unsigned long hwcap = (&getauxval != nullptr) ? getauxval(AT_HWCAP) : 0;

      |                          ~~~~~~~~~~~^~~~~~~~~~

cc1plus: all warnings being treated as errors

ARROW_WITH_ZSTD=ON \
ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-${llvm}/bin/llvm-symbolizer \
AWSSDK_SOURCE=BUNDLED \
google_cloud_cpp_storage_SOURCE=BUNDLED \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was GCS_SOURCE too ambiguous? This is quite a long parameter name...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use CMake's package name for XXX_SOURCE.
If we don't follow the convention, we can choose GCS_SOURCE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know how to proceed, I can do either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the CMake name then fair enough.

@kou
Copy link
Member

kou commented Oct 5, 2021

Forgot to update ARROW_THIRDPARTY_DEPENDENCIES in this pull request.

@coryan
Copy link
Contributor Author

coryan commented Oct 5, 2021

Oh, arm64 build on Travis CI is failed:

https://app.travis-ci.com/github/apache/arrow/jobs/541265250#L2124

That seems to be fixed upstream in crc32c, but the fix is not in any tag or release. Would you be Okay with me pinning to a SHA of this dependency? Can I reproduce these builds myself to verify I have fixed such problems?

@kou
Copy link
Member

kou commented Oct 5, 2021

That seems to be fixed upstream in crc32c, but the fix is not in any tag or release. Would you be Okay with me pinning to a SHA of this dependency?

Yes. We sometimes use this approach.

Can I reproduce these builds myself to verify I have fixed such problems?

Yes.

  1. Install Archery: python3 -m pip install -e dev/archery
    See also: https://arrow.apache.org/docs/developers/archery.html
  2. Run ubuntu-cpp service with ARCH=arm64v8 environment variable: ARCH=arm64v8 archery docker run ubuntu-cpp
    Note that you need qemu-user-static to run this on x86_64.

BTW, it's strange that Travis CI jobs weren't run for this pull request...

kou pushed a commit that referenced this pull request Oct 5, 2021
In #11268 I neglected to add `google_cloud_cpp_storage` to the
third-party dependencies.

Closes #11320 from coryan/ARROW-14223-add-google-cloud-cpp-to-thirdparty-dependencies

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@coryan coryan deleted the ARROW-8147-add-google-cloud-cpp-to-ThirdpartyToolchain branch October 5, 2021 21:24
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This adds the GCS library to ThirdpartyToolchain, and enables the
library in some of the builds. A future PR will add the library to the
package manager configuration files and to any remaining builds where it
needs to go.

Closes apache#11268 from coryan/ARROW-8147-add-google-cloud-cpp-to-ThirdpartyToolchain

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
In apache#11268 I neglected to add `google_cloud_cpp_storage` to the
third-party dependencies.

Closes apache#11320 from coryan/ARROW-14223-add-google-cloud-cpp-to-thirdparty-dependencies

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants