Skip to content

Address of closed listening socket isn't immediately available for reuse #3650

@stevenengler

Description

@stevenengler

When a listening socket is closed and the plugin attempts to immediately bind a new socket to the same address, the bind will fail.

For example see this workaround in one of our tests.

// https://github.com/shadow/shadow/issues/3563: some non-zero time needs to pass
// for shadow to clean up state and recognize the address as being available again.
// Unclear why.
std::thread::sleep(std::time::Duration::from_nanos(1));

I believe that this bug is preventing netcat from working correctly in Shadow: #3564

Here's an example that reproduces the issue.
general:
  stop_time: 30s
  #model_unblocked_syscall_latency: true
experimental:
  strace_logging_mode: standard
  #max_unapplied_cpu_latency: 1 ns
network:
  graph:
    type: 1_gbit_switch
hosts:
  server:
    network_node_id: 0
    processes:
    - path: python3
      args:
        - '-u'
        - '-c'
        - |
          import socket

          server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
          server.bind(('0.0.0.0', 8080))
          server.listen(100)
          print("Listening")

          (s, _) = server.accept()
          s.close()
          print("Accepted socket")

          server.close()
          print("Closed listening socket")

          # all sockets have been closed, so we expect this to work
          server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
          server.bind(('0.0.0.0', 8080))
          server.listen(100)
          server.close()
  client:
    network_node_id: 0
    processes:
    - path: python3
      start_time: 100 ms
      expected_final_state: running
      args:
        - '-u'
        - '-c'
        - |
          import socket, time

          s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          s.connect(('server', 8080))
          print("Connected")

          # don't close the socket, we want to keep the socket open until the
          # end of the simulation so that closing the socket doesn't send a FIN
          print("Sleeping")
          time.sleep(100000)
Traceback (most recent call last):
  File "<string>", line 19, in <module>
    server.bind(('0.0.0.0', 8080))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
OSError: [Errno 98] Address already in use

Note that if you uncomment the model_unblocked_syscall_latency and max_unapplied_cpu_latency lines, or if you add a short sleep before the second server bind, the simulation will succeed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugError or flaw producing unexpected results

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions