What happened?
Reported from GoogleCloudPlatform/DataflowTemplates#759
When implementing a load test for BigTableIO, we encountered the following:
- load tests up to 200mb pass stably.
- after 5 million records, not all data gets into BigTable, but the pipeline logs indicate that all data was written.
Dataflow write pipeline logs say that 10M records were written.
However, the read job shows only 1.6M records read.
Using the cbt utility, the cbt -instance count
command found out that BigTableIO write did not work correctly. Despite the fact that the logs say that all 10M records were written, in fact, there were exactly as many in the table as the read pipeline processed (1.6M). Some of the records processed by the write pipeline did not get into the table.
- Dataflow write pipeline logs -
2023-06-05_03_51_23-9051905355392445711
- Dataflow read pipeline logs -
2023-06-05_03_58_18-7016807525741705033
project: apache-beam-testing
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
What happened?
Reported from GoogleCloudPlatform/DataflowTemplates#759
When implementing a load test for BigTableIO, we encountered the following:
Dataflow write pipeline logs say that 10M records were written.
However, the read job shows only 1.6M records read.
Using the cbt utility, the cbt -instance count
command found out that BigTableIO write did not work correctly. Despite the fact that the logs say that all 10M records were written, in fact, there were exactly as many in the table as the read pipeline processed (1.6M). Some of the records processed by the write pipeline did not get into the table.2023-06-05_03_51_23-90519053553924457112023-06-05_03_58_18-7016807525741705033project: apache-beam-testing
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components