Skip to content

local ports exhaust quickly due to TCP TIME_WAIT when reconnect_interval is small  #232

@minhuw

Description

@minhuw

I found that when reconnect-interval is small, local ports exhaust quickly before the experiment completes as the log below shows.

$ memtier_benchmark -s 192.168.1.2 -t 1 -p 7777 -c 128 -n 10000 --json-out-file experiment.json --reconnect-interval 1
Json file experiment.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 1%,   0 secs]  1 threads:       14335 ops,   14340 (avg:   14340) ops/sec, 611.74KB/sec (avg: 611.74KB/sec),  5.19 (avg:  5.19) msec latency
<some logs omitted>
[RUN #1 2%,  20 secs]  1 threads:       27477 ops,     692 (avg:    1373) ops/sec, 28.74KB/sec (avg: 58.36KB/sec), 47.35 (avg: 28.27) msec latency
connect failed, error = Cannot assign requested address
memtier_benchmark: shard_connection.cpp:470: void shard_connection::process_response(): Assertion `ret == 0' failed.

I find that SO_LINGER is not enabled so closed TCP connections go to the TIMEWAIT state instead of releasing local ports immediately.

https://github.com/RedisLabs/memtier_benchmark/blob/4203084b085e0f93bb9130c281362ea651d2140c/shard_connection.cpp#L229-L235

It works if I enable SO_LINGER as follows thus aborting the connection immediately when it is closed.

-        struct linger ling = {0, 0};
+        struct linger ling = {1, 0};

Is there any reason SO_LINGER is not enabled? Any workaround so I could test the scenario when reconnect_interval is very small?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions