Skip to content

Query Cancellation Hangs When Using Multi Subnet Failover #144

@sehrope

Description

@sehrope

Normally we use the standard Statement.cancel() method to cancel running queries. If it's called on a connection made with multiSubnetFailover=true, it has no effect and both the original Thread executing the Statement and the Thread issuing the cancel() will hang until the statement completes.

This issue been tested and confirmed on the latest stable SQL Server JDBC driver as well as a number of older drivers (prior to the 6.x branch).

See below for a self-contained example of this. Replace the JDBC url with any recent version of SQL Server and flip the boolean multiSubnetFailover property between true/false to see the impact. The sample code issues a dummy WAITFOR statement to simulate a slow query and then tries to cancel it in a separate thread. If multiSubnetFailover is enabled, the cancellation never occurs as the canceler thread blocks waiting to write to the nio Channel (probably trying to write the ATTENTION cancellation packet).

Stack trace for thread: stmt-canceller

java.nio.channels.Channels.writeFully(Channels.java:96)
java.nio.channels.Channels.access$000(Channels.java:61)
java.nio.channels.Channels$1.write(Channels.java:174)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyOutputStream.writeInternal(IOBuffer.java:1059)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyOutputStream.write(IOBuffer.java:1051)
sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)
sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)
sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:1887)
com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4285)
com.microsoft.sqlserver.jdbc.TDSWriter.writePacket(IOBuffer.java:4186)
com.microsoft.sqlserver.jdbc.TDSWriter.endMessage(IOBuffer.java:3192)
com.microsoft.sqlserver.jdbc.TDSWriter.sendAttention(IOBuffer.java:4164)
com.microsoft.sqlserver.jdbc.TDSCommand.interrupt(IOBuffer.java:7647)
com.microsoft.sqlserver.jdbc.SQLServerStatement.cancel(SQLServerStatement.java:1193)
Main$2.run(Main.java:64)

Stack trace for thread: main

sun.nio.ch.FileDispatcherImpl.read0(Native Method)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:192)
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:949)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:937)
sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
sun.security.ssl.InputRecord.read(InputRecord.java:503)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1871)
com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6602)
com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7919)
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:879)
com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:778)
com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7465)
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2271)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:191)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:166)
com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:751)
Main.main(Main.java:77)

Stepping through a debugger and comparing the stacks with multiSubnetFailover enabled/disabled, we noticed that the underlying SocketInputStream is a different class. When the feature is disabled, it's a java.net.SocketInputStream. When it's enabled it's a SocketAdaptor.SocketInputStream.

If I had to guess, I'd say that whatever the JDK is doing to spoof a Socket interface atop a Channel isn't as concurrent as a true Socket. It's a probably a shared lock acquired by a reader on the channel blocking the writer. These two lines look suspicious as they're both synchronizing on the same object:

https://github.com/openjdk-mirror/jdk/blob/jdk8u/jdk8u/master/src/share/classes/sun/nio/ch/SocketAdaptor.java#L188

https://github.com/openjdk-mirror/jdk/blob/jdk8u/jdk8u/master/src/share/classes/java/nio/channels/Channels.java#L95

While this could be a bug in the JDK, I'm not sure that non-blocking between readers/writers is part of the API contract for Sockets or at least Sockets derived from Channels. As such, this might be something you'd need to have the driver handle. The answer might be to switch to using real Sockets rather than Channels even when multiSubnetFailover is enabled.


Sample Program Reproduce Issue - QueryCancelTest.java

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.time.LocalDateTime;
import java.util.Properties;

public class QueryCancelTest {
  private static final long CANCEL_WAIT_MILLIS = 2000;

  private static void log(String format, Object... args) {
    String timestamp = LocalDateTime.now().toString();
    String threadName = Thread.currentThread().getName();
    System.out.printf(timestamp + " [" + threadName + "] - " + format + "\n", args);
  }

  private static void dumpStackTrace(Thread thread) {
    StringBuilder sb = new StringBuilder();
    sb.append("Stack trace for thread: " + thread.getName() + "\n");
    for (StackTraceElement elem : thread.getStackTrace()) {
      sb.append("    " + elem.toString() + "\n");
    }
    System.err.println(sb.toString());
  }

  private static void sleepQuietly(long millis) {
    try {
      Thread.sleep(millis);
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
  }

  private static void delayPrintStackTrace(final Thread thread) {
    new Thread() {
      public void run() {
        sleepQuietly(2000);
        dumpStackTrace(thread);
      }
    }.start();
  }

  public static void main(String args[]) {
    String host = "db.example.com";
    int port = 1433;
    String databaseName = "test";
    String user = "myuser";
    String password = "mypass";
    boolean multiSubnetFailover = true;

    String url = "jdbc:sqlserver://" + host + ":" + port;
    Properties props = new Properties();
    props.put("databaseName", databaseName);
    props.put("user", user);
    props.put("password", password);
    if (multiSubnetFailover) {
      props.put("multiSubnetFailover", "true");
    }

    try {
      try (Connection conn = DriverManager.getConnection(url, props)) {
        log("Created Connection");
        try (final Statement stmt = conn.createStatement()) {
          log("Created Statement");

          final Thread mainThread = Thread.currentThread();
          final Thread cancellationThread = new Thread() {
            public void run() {
              log("Started");
              try {
                log("Waiting for %s ms for statement to begin", CANCEL_WAIT_MILLIS);
                Thread.sleep(CANCEL_WAIT_MILLIS);
                log("Finished waiting, will attempt to cancel statement");
                delayPrintStackTrace(this);
                delayPrintStackTrace(mainThread);
                log("Before stmt.cancel()");
                stmt.cancel();
                log("After stmt.cancel()");
              } catch (Exception e) {
                e.printStackTrace();
              }
              log("Finished");
            }
          };
          cancellationThread.setName("stmt-canceller");
          cancellationThread.start();

          try {
            log("Before stmt.execute(...)");
            boolean hasResultSet = stmt.execute("WAITFOR DELAY '00:00:15'");
            log("After stmt.execute(...) hasResultSet=%s", hasResultSet);
          } catch (SQLException e) {
            e.printStackTrace();
          }
        }
      }
      log("Finished successfully");
    } catch (Exception e) {
      e.printStackTrace();
      System.exit(1);
    }
    System.exit(0);
  }
}

Metadata

Metadata

Assignees

Labels

Under ReviewUsed for pull requests under review

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions