Skip to content

PDML Timeout not Respected #336

@thiagotnunes

Description

@thiagotnunes

When using PDML for long running tasks (> 1 hour) the task fails after one hour with a deadline exceeded exception (Exception in thread "main" com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s.).

Note that by default the timeout should be 2 hours:

private Duration partitionedDmlTimeout = Duration.ofHours(2L);

This is however not respected and overriden by the RPC timeout configured, which by default is 1 hour:

https://github.com/googleapis/googleapis/blob/50ae1c72fd94a3ae4269394b09e4b7fbb9251146/google/spanner/v1/spanner_grpc_service_config.json#L14

Steps to reproduce

  1. Create a default spanner client.
  2. Run a PDML update task that takes more than 1 hour. I have created a table with > 1 billion records to reproduce the problem.

Code example

final SpannerOptions options = SpannerOptions.newBuilder().build();
final Spanner spanner = options.getService();
final DatabaseId id = DatabaseId.of("project", "instance", "database");
final DatabaseClient client = spanner.getDatabaseClient(id);

// This times out after 1 hour with a DeadlineExceededException
client.executePartitionedUpdate(Statement.of(
    "UPDATE Table SET Column = 'Value' WHERE true"
));

Stack trace

Exception in thread "main" com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s. [buffered_nanos=115604067, remote_addr=spanner.googleapis.com/142.250.67.10:443]
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:211)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:57)
	at com.google.cloud.spanner.SpannerExceptionFactory.fromApiException(SpannerExceptionFactory.java:226)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:144)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:94)
	at com.google.cloud.spanner.PartitionedDMLTransaction.executeStreamingPartitionedUpdate(PartitionedDMLTransaction.java:156)
	at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate(SessionImpl.java:118)
	at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate(SessionPool.java:1349)
	at com.google.cloud.spanner.SessionPool$PooledSessionFuture.executePartitionedUpdate(SessionPool.java:1236)
	at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:227)
	at com.google.cloud.spanner.DatabaseClientImpl$3.apply(DatabaseClientImpl.java:224)
	at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry(DatabaseClientImpl.java:241)
	at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate(DatabaseClientImpl.java:222)
	at com.google.cloud.sample.app.pdml.dao.UserDao.activateUsers(UserDao.java:61)
	at com.google.cloud.sample.app.pdml.PdmlApp.main(PdmlApp.java:17)
Caused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599.998701666s. [buffered_nanos=115604067, remote_addr=spanner.googleapis.com/142.250.67.10:443]
	at io.grpc.Status.asRuntimeException(Status.java:533)
	at com.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:149)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose(SpannerErrorInterceptor.java:100)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
	at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
	at io.grpc.internal.ClientCallImpl$1CloseInContext.runInContext(ClientCallImpl.java:416)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Any additional information below

It seems that the following is happening:

  • By default we set the streamWaitTimeout to 2 hours in the context for the gRPC call.
  • By default we set the streamIdleTimeout to 30 mins in the context for the gRPC call.
  • By default we set the timeout to null in the context for the gRPC call.
  • During the call to the gRPC endpoint we merge the read context with a default read context, which changes the value of the timeout to 1 hour.
  • The PDML change fails after 1 hour (not 2 hours) with a deadline exceeded exception, according to the timeout variable set.

I am not sure where the streamWaitTimeout and streamIdleTimeout values are used.

I think that we would need to set the also the timeout value based on the partitionedDmlTimeout for PDML executions.

Metadata

Metadata

Assignees

Labels

🚨This issue needs some love.api: spannerIssues related to the googleapis/java-spanner API.triage meI really want to be triaged.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions