Skip to content

Add BigTableIOLT#759

Merged
copybara-service[bot] merged 1 commit intoGoogleCloudPlatform:mainfrom
akvelon:vitalii.terentev/bi-23-bigtableio-test
Jun 9, 2023
Merged

Add BigTableIOLT#759
copybara-service[bot] merged 1 commit intoGoogleCloudPlatform:mainfrom
akvelon:vitalii.terentev/bi-23-bigtableio-test

Conversation

@Amar3tto
Copy link
Copy Markdown
Collaborator

@Amar3tto Amar3tto commented Jun 1, 2023

No description provided.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloud documentation suggests ..."Bigtable can scale to billions of rows and thousands of columns" so this diagonal structured key-value might not be the typical use case for Bigtable

We possibly need a numColumns in Configuration, default set to 1 and make the column key String.format("value%09d", index%numColumn)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the reason of apache/beam#27022. Also going to investigate it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to set multiple columns as you suggested, it didn't help with the issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the cause is

  • cell does not have set timestamp, so it default to epoch (1970-01-01)
  • the createTable has a garbage collection policy of 1h, so large amount data written triggers GC and some records get deleted

We need to use .setTimestampMicros(java.time.Instant.now().toEpochMilli()*1000) for Mutation.SetCell

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createTable has a default maxAge of 1h. This causes cells without setting timestamp (default to epoch) gets garbage collected. I think it should be never by default. What do you think @pranavbhandari24

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the javadoc to make it clearer, please check

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. Yeah, I'd agree that the data should never be garbage collected (if that is an option)
Will look into this.

@Amar3tto Amar3tto force-pushed the vitalii.terentev/bi-23-bigtableio-test branch from eeb6bcd to fbcbd4e Compare June 7, 2023 06:58
@Amar3tto Amar3tto marked this pull request as ready for review June 7, 2023 14:08
Copy link
Copy Markdown
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks pretty solid to me, just had a few minor comments.

@Amar3tto Amar3tto force-pushed the vitalii.terentev/bi-23-bigtableio-test branch from a9eb628 to 751cf7f Compare June 7, 2023 18:04
Copy link
Copy Markdown
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick iteration. Last few comments.

@Abacn Abacn added the Google LGTM Approval of a pull request to be merged into the repository label Jun 8, 2023
@Amar3tto Amar3tto force-pushed the vitalii.terentev/bi-23-bigtableio-test branch from 1a5c6d7 to 64b55d8 Compare June 9, 2023 11:16
@Abacn
Copy link
Copy Markdown
Contributor

Abacn commented Jun 9, 2023

Thanks, no need to deal with "This branch is out-of-date with the base branch" in the GitHub side. This will be merged once internal test passes

@copybara-service copybara-service bot merged commit 2db1937 into GoogleCloudPlatform:main Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Google LGTM Approval of a pull request to be merged into the repository size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants