Skip to content

Conversation

@josiahyan
Copy link
Contributor

I'm hoping to work on parts of the GCS filesystem implementation, and this is a prerequisite.

@josiahyan
Copy link
Contributor Author

Running the cmake autoformatter heavily modifies the touched cmake files; should I be doing that?

@josiahyan josiahyan marked this pull request as draft November 24, 2020 15:27
@github-actions
Copy link

@xhochy
Copy link
Member

xhochy commented Nov 24, 2020

Running the cmake autoformatter heavily modifies the touched cmake files; should I be doing that?

Maybe you have a different version of that installed and that formats differently? You can also summon the autoformatter with @github-actions<space>autotune on the PR.

@josiahyan
Copy link
Contributor Author

I'll take a look! Didn't know I could do that.

@josiahyan
Copy link
Contributor Author

@github-actions autotune

@josiahyan
Copy link
Contributor Author

josiahyan commented Nov 25, 2020

@xhochy I can't seem to summon the bot. But I read what the action did, and saw that it required a very specific version of the cmake autoformatter, as you suggested. It works now! Thanks for your help!

@josiahyan josiahyan marked this pull request as ready for review November 25, 2020 04:31
@xhochy
Copy link
Member

xhochy commented Nov 25, 2020

@xhochy I can't seem to summon the bot. But I read what the action did, and saw that it required a very specific version of the cmake autoformatter, as you suggested. It works now! Thanks for your help!

The bot did run but didn't make any changes to your CMake files: https://github.com/apache/arrow/runs/1450988053?check_suite_focus=true#step:10:11

@josiahyan
Copy link
Contributor Author

josiahyan commented Nov 25, 2020

Oh, I know why! The CI queue was full (my test runs were taking hours to go through). By the time the runner got to my PR (Wed, 25 Nov 2020 02:15:47 GMT), my final commit with the manual cmake autoformat had gone in about an hour before.

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 7d1014a to 4fd6a36 Compare December 1, 2020 14:43
@josiahyan
Copy link
Contributor Author

@github-actions rebase

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch 2 times, most recently from 979c43a to 953de85 Compare December 1, 2020 16:43
@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 3c26730 to b9313da Compare December 1, 2020 21:21
@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 24, 2020
@nealrichardson
Copy link
Member

Hi @josiahyan, apologies for the delay here. Could you please rebase this? Hopefully we can get this reviewed and merged soon after.

@josiahyan
Copy link
Contributor Author

josiahyan commented Jan 29, 2021 via email

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from b9313da to 850c239 Compare February 28, 2021 01:03
@josiahyan josiahyan marked this pull request as draft February 28, 2021 01:03
@josiahyan josiahyan marked this pull request as ready for review February 28, 2021 04:29
@josiahyan
Copy link
Contributor Author

Sorry about the delay! I've rebased the commits and manually merged in the changes made in master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the exact pin here?

Copy link
Contributor Author

@josiahyan josiahyan Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently available version in conda (1.24) is incompatible with the way I initially packaged it. I believe I saw in the CI run that storage_client was missing or something, as the google-cloud-cpp package has started namespacing its exports (finally!).

Given the hard link against abseil libraries (as gRPC does too), to reduce packaging size, I decided that the version was safer pinned. The abseil libraries actually loaded probably depend on the matrix of abseil versions and users of it (gRPC, and the proposed google-cloud-cpp inclusion), making it more brittle. Personally, I extract it by a bazel query (provided in comments) and a Python script to lookup the symbols in the ar files. I believe the gRPC builds do something similar, or just pull them out from the compiler invocations (does @pitrou or @kou know how they were determined in commits like these?). Ideally the linker would do the dead section elimination, but I didn't dig into why the build system wouldn't allow so (I'm guessing the libs have to be bundled in and linked separately somehow?).

I'm hoping to upgrade the dependency when there is code using it (which would be the GCS implementation of the Arrow file interfaces).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I determined required abseil libraries manually. CI error helped me.

@salqadri
Copy link

What's the status of this PR?

@josiahyan
Copy link
Contributor Author

What's the status of this PR?

Sorry, but I lost track of following up on this PR! I believed it was ready to merge in Feb, but assumed people had lost interest in pursuing (native) GCS support, or that this change was not quite worth making for now. Is there still a need for this? If so, I can attempt to rebase it again.

@nealrichardson
Copy link
Member

What's the status of this PR?

Sorry, but I lost track of following up on this PR! I believed it was ready to merge in Feb, but assumed people had lost interest in pursuing (native) GCS support, or that this change was not quite worth making for now. Is there still a need for this? If so, I can attempt to rebase it again.

I think there's interest (I'm interested, at least). I'd like to see someone more skilled in cmake than I to review and approve: @kou @pitrou @xhochy ?

@kou
Copy link
Member

kou commented Aug 1, 2021

Could you rebase on the master? Then I'll review this.

@josiahyan
Copy link
Contributor Author

josiahyan commented Aug 1, 2021 via email

@emkornfield
Copy link
Contributor

@josiahyan do you think you will have time to rebase? Otherwise I can see how difficult this might be.

@josiahyan josiahyan force-pushed the ARROW-8147-add-gcp-sdk branch from 7076b4c to 7531ca5 Compare August 23, 2021 05:44
@josiahyan
Copy link
Contributor Author

@josiahyan do you think you will have time to rebase? Otherwise I can see how difficult this might be.

Whoops sorry, I lost track of this. I’ll be trying to rebase this - I don’t see any major changes (so far) that should result in conflicts.

@josiahyan josiahyan marked this pull request as draft August 23, 2021 07:25
autoconf
ccache
google-cloud-cpp=1.20.0
nlohmann_json>=3.4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the new JSON dependency for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess google's SDK must rely on it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am one of the authors of google's GCS SDK. Yes, the SDK definitely depends on nlohmann_json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda has dependency handling, so there is no reason to mention it explicitly, though.

GCPSDK_PREFIX_PATH)

set(GCPSDK_CXX_FLAGS "${EP_CXX_FLAGS}")
# workaround for crc32c not declaring its header interface in its Find*.cmake file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid this by using crc32c 1.1.1:

google/crc32c@b377ce4

@josiahyan
Copy link
Contributor Author

I’ve discussed this PR with @coryan , and we think its better to start over, given the need to upgrade dependencies, and handle the Abseil libraries (which I believe, when used with gRPC are causing the build to fail on some platforms) with more scripting. He will be taking over this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: C++ needs-rebase A PR that needs to be rebased by the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants