Support DatabaseID in Datastore beam connector#27987
Support DatabaseID in Datastore beam connector#27987johnjcasey merged 34 commits intoapache:masterfrom
Conversation
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Run Java_GCP_IO_Direct PreCommit |
|
Looks like the failing tests in Direct IO are coming from BigQuery and unrelated to the change: |
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
Outdated
Show resolved
Hide resolved
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
Outdated
Show resolved
Hide resolved
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
Outdated
Show resolved
Hide resolved
| public void startBundle(StartBundleContext c) { | ||
| datastore = datastoreFactory.getDatastore(c.getPipelineOptions(), projectId.get(), localhost); | ||
| // Read Firestore DatabaseID from FirestoreOptions. | ||
| String firestoreDatabaseId = |
There was a problem hiding this comment.
why are we passing the databaseId around, but querying for it from options here? I would think that it should be passed in from the constructor.
There was a problem hiding this comment.
We hope users can run the datastore beam with flag for write. StartBundleContext.
"--firestoreDb=${firestoreDb}",
Similar example in Firestore:
There was a problem hiding this comment.
This doesn't track to for me. It looks like you pass the value around when constricting the IO, but then query from pipeline options here. That is very confusing for a user.
There was a problem hiding this comment.
A user could do .withDatabaseId(ID), and that doesn't appear to do anything
There was a problem hiding this comment.
Thanks for pointing this out! Now I see why this is confusing.
It's unexpected users can do DatabaseID in WriteFn. The only way that can pass DatabaseID is from FirestoreOptions. Removed withDatabaseId references for Write.
There was a problem hiding this comment.
I think this is an antipattern. I don't think we should be passing in options via options. What if the user wanted to query from two databases in a single pipeline?
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
Show resolved
Hide resolved
johnjcasey
left a comment
There was a problem hiding this comment.
We shouldn't have databaseID passed in as a pipeline option. It doesn't support reading from two databases in a single pipeline
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
Outdated
Show resolved
Hide resolved
bvolpato
left a comment
There was a problem hiding this comment.
In general it LGTM -- my only concern is on the ValueProvider usage, as it might break and throw IllegalStateException when provided in runtime.
|
I run the failed tests locally and looks like they passed. |
|
Run Java_GCP_IO_Direct PreCommit |
|
Run Java_GCP_IO_Direct PreCommit |
[Feature] Support DatabaseID in Datastore beam connector