-
Notifications
You must be signed in to change notification settings - Fork 897
High Load
Precaution 1: 3proxy was not initially developed for high load and is positioned as a SOHO product. The main reason is the "one connection - one thread" model 3proxy uses. 3proxy is known to work with over 200,000 connections under proper configuration, but use it in a production environment under high loads at your own risk and do not expect too much.
Precaution 2: This documentation is incomplete and insufficient. High loads may require very specific system tuning including, but not limited to, specific or customized kernels, builds, settings, sysctls, options, etc. All of this is not covered by this documentation.
The number of simultaneous connections per service is limited by the 'maxconn' option. The default maxconn value since 3proxy 0.8 is 500. You may want to set 'maxconn' to a higher value. Under this configuration:
maxconn 1000 proxy -p3129 proxy -p3128 socks
maxconn for every service is 1000, and there are 3 services running (2 proxy and 1 socks), so for all services there can be up to 3000 simultaneous connections to 3proxy.
Avoid setting 'maxconn' to an arbitrarily high value; it should be carefully chosen to protect the system and proxy from resource exhaustion. Setting maxconn above available resources can lead to denial of service conditions.
Each running service requires:- 1 thread (process)
- 1 socket (file descriptor)
- 1 stack memory segment + some heap memory, ~64K-128K depending on the system
- 1 thread (process)
- 2 sockets (file descriptors). For FTP, 4 sockets are required.
Under Linux since 0.9, splice() is used. It's much more efficient but requires
2 sockets (file descriptors) + 2 pipes (file descriptors) = 4 file descriptors.
For FTP with splice(), 4 sockets and 2 pipes are required.
Up to 128K (up to 256K in the case of splice()) of kernel buffer memory. This is the theoretical maximum; actual numbers depend on connection quality and traffic amount.
1 additional socket (file descriptor) during name resolution for non-cached names
1 additional socket during authentication or logging for RADIUS authentication or logging. - 1 ephemeral port (3 ephemeral ports for FTP connections).
- 1 stack memory segment of ~32K-128K depending on the system + at least 16K and up to a few MB (for 'proxy' and 'ftppr') of heap memory. If you are short on memory, prefer 'socks' over 'proxy' and 'ftppr'.
- Many system buffers, especially in the case of slow network connections.
Hard and soft ulimits must be set above calculated requirements. Under Linux, you can check the limits of a running process with
cat /proc/PID/limits
where PID is the process ID. Validate that ulimits match your expectations, especially if you run 3proxy under a dedicated account by adding, e.g.:
system "ulimit -Ha >>/tmp/3proxy.ulim.hard" system "ulimit -Sa >>/tmp/3proxy.ulim.soft"
at the beginning (before the first service is started) and at the end of the config file. Perform both a hard restart (i.e., kill and start the 3proxy process) and a soft restart by sending SIGUSR1 to the 3proxy process; check that the ulimits recorded to files match your expectations. In systemd-based distros (e.g., latest Debian/Ubuntu), changing limits.conf is not enough; limits must be adjusted in the systemd configuration, e.g., by setting:
DefaultLimitDATA=infinity DefaultLimitSTACK=infinity DefaultLimitCORE=infinity DefaultLimitRSS=infinity DefaultLimitNOFILE=102400 DefaultLimitAS=infinity DefaultLimitNPROC=10240 DefaultLimitMEMLOCK=infinity
in user.conf / system.conf
Check the manuals/documentation for your system's limitations, e.g., the system-wide limit for the number of open files (fs.file-max in Linux). You may need to change sysctls or even rebuild the kernel from source.
To help with socket-based system-dependent settings, since 0.9-devel, 3proxy supports different socket options which can be set via the -ol option for the listening socket, -oc for the proxy-to-client socket, and -os for the proxy-to-server socket. Example:
proxy -olSO_REUSEADDR,SO_REUSEPORT -ocTCP_TIMESTAMPS,TCP_NODELAY -osTCP_NODELAYAvailable options are system-dependent.
If 3proxy is used in a VPS environment, there can be additional limitations. For example, kernel resources, system CPU usage, and IOCTLs can be limited differently, and this can become a bottleneck. Since 0.9-devel, 3proxy uses splice() by default on Linux. splice() prevents network traffic from being copied from kernel space to the 3proxy process and generally increases throughput, especially in the case of high-volume traffic. This is especially true for virtual environments (it can improve throughput up to 10 times) unless there are additional kernel limitations. Since some work is moved to the kernel, it requires up to 2 times more kernel resources in terms of CPU, memory, and IOCTLs. If your hosting additionally limits kernel resources (you can see this as nearly 100% CPU usage without any real CPU activity for any application performing IOCTLs), use the -s0 option to disable splice() usage for a given service, e.g.:
socks -s0
Check the ephemeral port range for your system and extend it to the number of ports required. The ephemeral range is always limited to the maximum number of ports (64K). To extend the number of outgoing connections above this limit, extending the ephemeral port range is not enough; you need additional actions:
- Configure multiple outgoing IPs
- Make sure 3proxy is configured to use a different outgoing IP by either setting
the external IP via RADIUS:
radius secret 1.2.3.4 auth radius proxy
or by using multiple services with different external interfaces, for example:allow user1,user11,user111 proxy -p1111 -e1.1.1.1 flush allow user2,user22,user222 proxy -p2222 -e2.2.2.2 flush allow user3,user33,user333 proxy -p3333 -e3.3.3.3 flush allow user4,user44,user444 proxy -p4444 -e4.4.4.4 flush
or via "parent extip" rotation, e.g.:allow user1,user11,user111 parent 1000 extip 1.1.1.1 0 allow user2,user22,user222 parent 1000 extip 2.2.2.2 0 allow user3,user33,user333 parent 1000 extip 3.3.3.3 0 allow user4,user44,user444 parent 1000 extip 4.4.4.4 0 proxy
orallow * parent 250 extip 1.1.1.1 0 parent 250 extip 2.2.2.2 0 parent 250 extip 3.3.3.3 0 parent 250 extip 4.4.4.4 0 socks
Under the latest Linux versions, you can also start multiple services with different external addresses on a single port with SO_REUSEPORT on the listening socket to evenly distribute incoming connections between outgoing interfaces:socks -olSO_REUSEPORT -p3128 -e 1.1.1.1 socks -olSO_REUSEPORT -p3128 -e 2.2.2.2 socks -olSO_REUSEPORT -p3128 -e 3.3.3.3 socks -olSO_REUSEPORT -p3128 -e 4.4.4.4
For web browsing, the last two examples are not recommended because the same client can get a different external address for different requests; you should choose the external interface with user-based rules instead. - You may need additional system-dependent actions to use the same port on different IPs,
usually by adding the SO_REUSEADDR (SO_PORT_SCALABILITY for Windows) socket option to
the external socket. This option can be set (since 0.9-devel) with the -os option:
proxy -p3128 -e1.2.3.4 -osSO_REUSEADDR
The behavior for SO_REUSEADDR and SO_REUSEPORT is different between different systems, even between different kernel versions, and can lead to unexpected results. The specifics are described here. Use these options only if actually required and if you fully understand the possible consequences. For example, SO_REUSEPORT can help establish more connections than the number of client ports available, but it can also lead to situations where connections randomly fail due to IP+port pair collisions if the remote or local system doesn't support this trick.
'stacksize' is a size added to all stack allocations and can be both positive and negative. Stack is required for function calls. 3proxy itself doesn't require a large stack, but it can be required if some poorly written libc, 3rd party libraries, or system functions are called. There is known dirty code in Unix ODBC implementations and built-in DNS resolvers, especially in the case of IPv6 and a large number of interfaces. Under most 64-bit systems, extending stacksize will lead to additional memory space usage but does not require actual committed memory, so you can increase stacksize to a relatively large value (e.g., 1024000) without the need to add additional physical memory, but it's system/libc dependent and requires additional testing under your installation. Don't forget about memory-related ulimits.
For 32-bit systems, address space can be a bottleneck you should consider. If you're short on address space, you can try using a negative stack size.
There are known race condition issues in the Linux/glibc resolver. The probability of a race condition arises under configuration with IPv6, a large number of interfaces or IP addresses, or with resolvers configured. In this case, install a local recursor and use 3proxy's built-in resolver (nserver / nscache / nscache6).
Public resolvers like those from Google have rate limits. For a large number of requests, install a local caching recursor (ISC bind named, PowerDNS recursor, etc).Currently, 3proxy is not optimized to use large ACLs, user lists, etc. All lists are processed linearly. In the devel version, you can use RADIUS authentication to avoid user lists and ACLs in 3proxy itself. Also, RADIUS allows you to easily set an outgoing IP on a per-user basis or implement more sophisticated logic. RADIUS is a new beta feature; test it before using it in production.
Every configuration reload requires additional resources. Do not make frequent changes, such as user addition/deletion via configuration; use alternative authentication methods instead, like RADIUS.
The 'force' behavior (default) re-authenticates all connections after configuration reload; it may be resource-consuming with a large number of connections. Consider adding the 'noforce' command before services are started to prevent connection re-authentication.
Using a configuration file directly in 'monitor' can lead to a race condition where the configuration is reloaded while the file is being written. To avoid race conditions:
- Update config files only if there is no lock file
- Create a lock file when the 3proxy configuration is updated, e.g., with "touch /some/path/3proxy/3proxy.lck". If you generate config files asynchronously, e.g., by a user's request via web, you should consider implementing existence checking and file creation as an atomic operation.
- Add
system "rm /some/path/3proxy/3proxy.lck"
at the end of the config file to remove it after the configuration is successfully loaded - Use a dedicated version file to monitor, e.g.:
monitor "/some/path/3proxy/3proxy.ver"
- After the config is updated, change the version file for 3proxy to reload the configuration, e.g., with "touch /some/path/3proxy/3proxy.ver".
If most requests require an exchange with a small amount of data in both directions without the need for bandwidth, e.g., messengers or small web requests, you can eliminate Nagle's algorithm delay with the TCP_NODELAY flag. Usage example:
proxy -osTCP_NODELAY -ocTCP_NODELAY
sets TCP_NODELAY for client (oc) and server (os) connections.
Do not use TCP_NODELAY on slow connections with high delays when connection bandwidth is a bottleneck.
splice() allows copying data between connections without copying to the process address space. It can speed up the proxy on high-bandwidth connections if most connections require large data transfers. Splice is enabled by default on Linux since 0.9; "-s0" disables splice usage. Example:
proxy -s0
Splice is only available on Linux. Splice requires more system buffers and file descriptors and produces more IOCTLs but reduces process memory and overall CPU usage. Disable splice if there are a lot of short-lived connections with no bandwidth requirements.
Use splice only on high-speed connections (e.g., 10GbE) when the processor, memory speed, or system bus are bottlenecks.
TCP_NODELAY and splice are not contrary to each other and should be combined on high-speed connections.
proxy -g8000,3,10
The first parameter is the average read size we want to keep, the second parameter is the minimal number of packets in the same direction to apply the algorithm, and the last value is the delay added after polling and prior to reading data. The example above adds a 10-millisecond delay before reading data if the average polling size is below 8000 bytes and 3 read operations have been made in the same direction. It's especially useful with splice.
logdump 1 1is useful to see how grace delays work; choose a delay value to avoid filling the read pipe/buffer (typically 64K) but keep the request sizes close to the chosen average on large file uploads/downloads.