In short,
- if
targetParallelism ≥ BigtableSource#getEstimatedSizeBytes; then
desiredBundleSizeBytes is set to 0; which
- makes
BigtableSource#splitKeyRangeIntoBundleSizedSubranges angry.
What happened?
Imagine a case where in:
|
long estimatedBytes = source.getEstimatedSizeBytes(options); |
|
long bytesPerBundle = estimatedBytes / targetParallelism; |
|
List<? extends BoundedSource<T>> bundles = source.split(bytesPerBundle, options); |
targetParallelism is 32; and
source.getEstimatedByteSize() is 10
then
so
|
List<? extends BoundedSource<T>> bundles = source.split(bytesPerBundle, options); |
will be called with the values:
split.source(0L, options)
In OffsetBasedSource#split, this desired-0-sized split is handled:
|
long desiredBundleSizeOffsetUnits = |
|
Math.max(Math.max(1, desiredBundleSizeBytes / getBytesPerOffset()), minBundleSize); |
But BigtableSource#split does not seem to handle the desired-0-sized split:
|
desiredBundleSizeBytes = |
|
Math.max(sizeEstimate / maximumNumberOfSplits, desiredBundleSizeBytes); |
|
|
|
// Delegate to testable helper. |
|
List<BigtableSource> splits = |
|
splitBasedOnSamples(desiredBundleSizeBytes, getSampleRowKeys(options)); |
so a few frames down the road from BigtableSource#split you'll end up violating this checkArgument in BigtableSource#splitKeyRangeIntoBundleSizedSubranges:
|
checkArgument( |
|
desiredBundleSizeBytes > 0, |
|
"Desired bundle size %s bytes must be greater than 0.", |
|
desiredBundleSizeBytes); |
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
In short,
targetParallelism≥BigtableSource#getEstimatedSizeBytes; thendesiredBundleSizeBytesis set to0; whichBigtableSource#splitKeyRangeIntoBundleSizedSubrangesangry.What happened?
Imagine a case where in:
beam/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
Lines 215 to 217 in 282d027
targetParallelismis32; andsource.getEstimatedByteSize()is10then
bytesPerBundlewill be0so
beam/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
Line 217 in 282d027
will be called with the values:
split.source(0L, options)In
OffsetBasedSource#split, this desired-0-sized split is handled:beam/sdks/java/core/src/main/java/org/apache/beam/sdk/io/OffsetBasedSource.java
Lines 115 to 116 in 282d027
But
BigtableSource#splitdoes not seem to handle the desired-0-sized split:beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
Lines 1328 to 1333 in 282d027
so a few frames down the road from
BigtableSource#splityou'll end up violating thischeckArgumentinBigtableSource#splitKeyRangeIntoBundleSizedSubranges:beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
Lines 1623 to 1626 in 71c8459
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components