-
Notifications
You must be signed in to change notification settings - Fork 136
Description
When using PDML for long running tasks (> 1 hour) the task fails after one hour with a deadline exceeded exception (Exception in thread "main" com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s.).
Note that by default the timeout should be 2 hours:
java-spanner/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerOptions.java
Line 313 in 45acd89
| private Duration partitionedDmlTimeout = Duration.ofHours(2L); |
This is however not respected and overriden by the RPC timeout configured, which by default is 1 hour:
Steps to reproduce
- Create a default spanner client.
- Run a PDML update task that takes more than 1 hour. I have created a table with > 1 billion records to reproduce the problem.
Code example
final SpannerOptions options = SpannerOptions.newBuilder().build();
final Spanner spanner = options.getService();
final DatabaseId id = DatabaseId.of("project", "instance", "database");
final DatabaseClient client = spanner.getDatabaseClient(id);
// This times out after 1 hour with a DeadlineExceededException
client.executePartitionedUpdate(Statement.of(
"UPDATE Table SET Column = 'Value' WHERE true"
));Stack trace
Exception in thread "main" com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s. [buffered_nanos=115604067, remote_addr=spanner.googleapis.com/142.250.67.10:443]
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:211)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:57)
at com.google.cloud.spanner.SpannerExceptionFactory.fromApiException(SpannerExceptionFactory.java:226)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:144)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:94)
at com.google.cloud.spanner.PartitionedDMLTransaction.executeStreamingPartitionedUpdate(PartitionedDMLTransaction.java:156)
at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate(SessionImpl.java:118)
at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate(SessionPool.java:1349)
at com.google.cloud.spanner.SessionPool$PooledSessionFuture.executePartitionedUpdate(SessionPool.java:1236)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:227)
at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:224)
at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry(DatabaseClientImpl.java:241)
at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate(DatabaseClientImpl.java:222)
at com.google.cloud.sample.app.pdml.dao.UserDao.activateUsers(UserDao.java:61)
at com.google.cloud.sample.app.pdml.PdmlApp.main(PdmlApp.java:17)
Caused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s. [buffered_nanos=115604067, remote_addr=spanner.googleapis.com/142.250.67.10:443]
at io.grpc.Status.asRuntimeException(Status.java:533)
at com.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:149)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose(SpannerErrorInterceptor.java:100)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
at io.grpc.internal.ClientCallImpl$1CloseInContext.runInContext(ClientCallImpl.java:416)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Any additional information below
It seems that the following is happening:
- By default we set the
streamWaitTimeoutto 2 hours in the context for the gRPC call. - By default we set the
streamIdleTimeoutto 30 mins in the context for the gRPC call. - By default we set the
timeoutto null in the context for the gRPC call. - During the call to the gRPC endpoint we merge the read context with a default read context, which changes the value of the
timeoutto 1 hour. - The PDML change fails after 1 hour (not 2 hours) with a deadline exceeded exception, according to the
timeoutvariable set.
I am not sure where the streamWaitTimeout and streamIdleTimeout values are used.
I think that we would need to set the also the timeout value based on the partitionedDmlTimeout for PDML executions.