Linux Network Stack

Images

On Linux systems, the network stack is a core kernel component, and device drivers are additional modules. Packets are passed through these kernel components as the struct sk_buff (socket buffer) data type. Note that there may also be queued in the IP layer (not pictured) for packet reassembly.

TCP Connection Queues

Bursts of inbound connections are handled by using backlog queues. There are two such queues, one for incomplete connections while the TCP handshake completes (also known as the SYN backlog), and one for established sessions waiting to be accepted by the application (also known as the listen backlog). These are pictured in Figure 10.9.

Images
Figure 10.9 TCP backlog queues

Only one queue was used in earlier kernels, and it was vulnerable to SYN floods. A SYN flood is a type of DoS attack that involves sending numerous SYNs to the listening TCP port from bogus IP addresses. This fills the backlog queue while TCP waits to complete the handshake, preventing real clients from connecting.

With two queues, the first can act as a staging area for potentially bogus connections, which are promoted to the second queue only once the connection is established. The first queue can be made long to absorb SYN floods and optimized to store only the minimum amount of metadata necessary.

The use of SYN cookies bypasses the first queue, as they show the client is already authorized.

TCP Buffering

Data throughput is improved by using send and receive buffers associated with the socket. These are pictured in Figure 10.10.

Images
Figure 10.10 TCP send and receive buffers

The size of both the send and receive buffers is tunable. Larger sizes improve throughput performance, at the cost of more main memory spent per connection. One buffer may be set to be larger than the other if the server is expected to perform more sending or receiving. The Linux kernel will also dynamically increase the size of these buffers based on connection activity and allows tuning of their minimum, default, and maximum sizes.

Segmentation Offload: GSO and TSO

Network devices and networks accept packet sizes up to a maximum segment size (MSS) that may be as small as 1500 bytes. To avoid the network stack overheads of sending many small packets, Linux uses generic segmentation offload (GSO) to send packets up to 64 Kbytes in size (“super packets”), which are split into MSS-sized segments just before delivery to the network device. If the NIC and driver support TCP segmentation offload (TSO), GSO leaves splitting to the device, improving network stack throughput. There is also a generic receive offload (GRO) complement to GSO. GRO and GSO are implemented in-kernel software, and TSO is implemented by NIC hardware.

CPU Scaling

High packet rates can be achieved by engaging multiple CPUs to process packets and the TCP/IP stack. Linux supports various methods for multi-CPU packet processing:

  • RSS: Receive Side Scaling: For modern NICs that support multiple queues and can hash packets to different queues, which are in turn processed by different CPUs, interrupting them directly. This hash may be based on the IP address and TCP port numbers, so that packets from the same connection end up being processed by the same CPU.
  • RPS: Receive Packet Steering: A software implementation of RSS, for NICs that do not support multiple queues. This involves a short interrupt service routine to map the inbound packet to a CPU for processing. A similar hash can be used to map packets to CPUs, based on fields from the packet headers.
  • RFS: Receive Flow Steering: This is similar to RPS, but with affinity for where the socket was last processed on-CPU, to improve CPU cache hit rates and memory locality.
  • Accelerated Receive Flow Steering: This achieves RFS in hardware, for NICs that support this functionality. It involves updating the NIC with flow information so that it can determine which CPU to interrupt.
  • XPS: Transmit Packet Steering: For NICs with multiple transmit queues, this supports transmission by multiple CPUs to the queues.

Optimizations

  • Pacing: This controls when to send packets, spreading out transmissions (pacing) to avoid bursts that may hurt performance (this may help avoid TCP micro-bursts that can lead to queueing delay, or even cause network switches to drop packets. It may also help with the incast problem, when many end points transmit to one at the same time).
  • TCP Small Queues (TSQ): This controls (reduces) how much is queued by the network stack to avoid problems including bufferbloat.
  • Byte Queue Limits (BQL): These automatically size the driver queues large enough to avoid starvation, but also small enough to reduce the maximum latency of queued packets, and to avoid exhausting NIC TX descriptors. It works by pausing the addition of packets to the driver queue when necessary, and was added in Linux 3.3.
  • Earliest Departure Time (EDT): This uses a timing wheel instead of a queue to order packets sent to the NIC. Timestamps are set on every packet based on policy and rate configuration. This was added in Linux 4.20, and has BQL- and TSQ-like capabilities 

Congestion control algorithms

  • Reno: Triple duplicate ACKs trigger: halving of the congestion window, halving of the slow-start threshold, fast retransmit, and fast recovery.
  • Tahoe: Triple duplicate ACKs trigger: fast retransmit, halving the slow-start threshold, congestion window set to one maximum segment size (MSS), and slow-start state. (Along with Reno, Tahoe was first developed for 4.3BSD.)
  • CUBIC: Uses a cubic function (hence the name) to scale the window, and a “hybrid start” function to exit slow start. CUBIC tends to be more aggressive than Reno, and is the default in Linux.
  • BBR: Instead of window-based, BBR builds an explicit model of the network path characteristics (RTT and bandwidth) using probing phases. BBR can provide dramatically better performance on some network paths, while hurting performance on others. BBRv2 is currently in development and promises to fix some of the deficiencies of v1.
  • DCTCP: DataCenter TCP relies on switches configured to emit Explicit Congestion Notification (ECN) marks at a very shallow queue occupancy to rapidly ramp up to the available bandwidth (RFC 8257). This makes DCTCP unsuitable for deployment across the Internet, but in a suitably configured controlled environment it can improve performance significantly.

The Math behind Fibonacci Numbers

In this blog we will see how to find index of a particular number in fibonacci series and find the number at a particular index in fibonacci series

1. Find the number at a particular index in fibonacci series

The Fibonacci numbers are defined recursively by the following difference equation:

\left\{\begin{matrix} F_{n} = F_{n-1} + F_{n-2} & & & & \\ F_{1} = 1 & & & & \\ F_{0} = 0 & & & & \end{matrix}\right.

It is easy to compute the first few elements in the sequence:

0,1,1,2,3,5,8,13,21,34⋯

Derivation of the general formula

It is possible to derive a general formula for F_{n} without computing all the previous numbers in the sequence. If a gemetric series (i.e. a series with a constant ratio between consecutive terms r^n ) is to solve the difference equation, we must have

r^n = r^{n-1} + r^{n-2}

which is equivalent to

r^2 = r + 1

This equation has two unique solutions

\varphi = \frac {1 + \sqrt{5}}{2} \approx 1.61803

\frac {1 - \sqrt{5}}{2} = 1 - \varphi = - \frac {1}{\varphi } \approx -0.61803

In particular the larger root is known as the golden ratio
\varphi = \frac {1 + \sqrt{5}}{2} \approx 1.61803

Now, since both roots solve the difference equation for Fibonacci numbers, any linear combination of the two sequences also solves it

a\begin{pmatrix} \frac {1 + \sqrt{5}}{2} \end{pmatrix}^n + b\begin{pmatrix} \frac {1 - \sqrt{5}}{2} \end{pmatrix}^n

It’s not hard to see that all Fibonacci numbers must be of this general form because we can uniquely solve for a and b such that the initial conditions of F_{1}=1 and F_{0}=0 are met

\left\{\begin{matrix} F_{0} = a\begin{pmatrix} \frac {1 + \sqrt{5}}{2} \end{pmatrix}^0 + b\begin{pmatrix} \frac {1 - \sqrt{5}}{2} \end{pmatrix}^0 & \\ F_{1} = a\begin{pmatrix} \frac {1 + \sqrt{5}}{2} \end{pmatrix}^1 + b\begin{pmatrix} \frac {1 - \sqrt{5}}{2} \end{pmatrix}^1 & \end{matrix}\right.

yielding

\left\{\begin{matrix} a= \frac {1}{\sqrt{5}}\\ b= \frac {-1}{\sqrt{5}} \end{matrix}\right.

We have therefore derived the general formula for the n-th Fibonacci number

F_{n}= \frac {1}{\sqrt{5}}\bigl(\begin{smallmatrix} \frac {1 + \sqrt{5}}{2} \end{smallmatrix}\bigr)^n + \frac {1}{\sqrt{5}}\bigl(\begin{smallmatrix} \frac {1 - \sqrt{5}}{2} \end{smallmatrix}\bigr)^n

Since the second term has an absolute value smaller than 1, we can see that the ratios of Fibonacci numbers converge to the golden ratio
\lim_{n -> \infty } \frac {F_{n}}{F_{n-1}} = \frac {1 + \sqrt{5}}{2}

Python code:

def findFibIndexValue(index):
    golden_ratio = (1 + math.sqrt(5)) / 2
    return round(((golden_ratio ** index) - ((1 - golden_ratio) ** index)) / math.sqrt(5))

2. Find index of a particular number in fibonacci series

We know that
F_{n} = \frac {1}{\sqrt{5}}\left ( \varphi ^n - \hat \varphi^n \right )
where
\varphi = \frac {1}{2}\left ( 1 + \sqrt{5} \right ) and \hat \varphi = \frac {1}{2}\left ( 1 - \sqrt{5} \right )

n = round \begin{Bmatrix} \frac {logF_{n}+log\sqrt{5}}{log \varphi} \end{Bmatrix}

where “round” means round to the nearest integer. For speed of computation we should simplify this to:
n = round \begin{Bmatrix} \alpha \cdot log F_{n} + \beta \end{Bmatrix}

where the 𝛼 and 𝛽 constants are precomputed as:
\alpha = \frac {1}{log \varphi } \approx 2.078087
\beta = \frac {log \sqrt{5}}{log \varphi} \approx 1.672276

Python code:

def findFibIndex(val):
    return round(2.078087 * math.log(val) + 1.672276)

3. Count number of digits in the nth Fibonacci number

F_{n}= \frac {1}{\sqrt{5}}\bigl(\begin{smallmatrix} \frac {1 + \sqrt{5}}{2} \end{smallmatrix}\bigr)^n + \frac {1}{\sqrt{5}}\bigl(\begin{smallmatrix} \frac {1 - \sqrt{5}}{2} \end{smallmatrix}\bigr)^n

The above formula can be simplified,
F_{n} = round\left ( \frac {\varphi ^n}{\sqrt {5}} \right )
Count of digits in Fib(n)
= log_{10}F_{n}
= log_{10}\left ( \frac {\varphi ^n}{\sqrt{5}} \right )
= n \cdot log_{10}\varphi - log_{10}\sqrt{5}
= n \cdot log_{10}\varphi - \frac {log_{10}5}{2}

 

Python code:

phi = (1 + 5**.5) / 2
def numberOfDig (n) : 
    if n == 1 : 
        return 1
    return math.ceil((n * math.log10(phi) - 
                      .5 * math.log10(5)))

Fastest way to compute 9^(9^9) exactly all digits?

Is there a way to exactly compute all of the ca. 370 million decimal digits of 9^(9^9) very fast? I used an out of the box bignumber algorithms library(*) which took 16 minutes.

(*) I used Java BigInteger the pow() method:

?- time((_ is 9^(9^9))).
% Up 974,318 ms, GC 9,302 ms, Thread Cpu 962,688 ms (Current 06/01/18
19:54:01)
Yes

?- statistics.
Max Memory           7,635,730,432 Bytes
Used Memory           765,913,440 Bytes
Free Memory          2,891,739,304 Bytes
Uptime                 2,466,670 Millis
GC Time                    9,315 Millis
Thread Cpu Time          963,812 Millis
Current Time          06/01/18 20:18:37 

The Java BigInteger implementation uses Karatsuba and Toom Cook:
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/math/BigInteger.java

P.S.: To additionally output the ca. 370 million decimal digits, if it would use 1 ms per digit, would only need 370’000 ms. Thats a third of the time I needed to compute the exact digits in the above, i.e. 974,318 ms without displaying.

Solution:

Try https://raw.githubusercontent.com/tbuktu/bigint/master/src/main/java/java/math/BigInteger.java for an improved BigInteger that switches to Schoenhage-Strassen once integers get past 74,000 digits. My back of the envelope says that that should be an order of magnitude faster after you get to hundreds of millions of digits.

Return list of primes up to n using for loop

I have just picked up learing python and I am trying to create a simple function which accepts an integer and returns a list of all primes from 2 to that integer.

I have created the function but code doesn’t seem to work. I have found solutions only for more efficient (and complex) methodes (like this one Finding prime numbers using list comprehention) for this problem which don’t really help me in finding my mistake.

def list_of_primes(n):
    primes = []
    for y in range (2, n):
        for z in range(2, y):
            if y % x == 0:
                continue
            else:
                primes.append(y)
        primes.sort()
        return primes

What is wrong with the code?

Solution:

There are several errors in your code. Below is a working implementation of your algorithm.

def list_of_primes(n):
    primes = []
    for y in range (2, n):
        for z in range(2, y):
            if y % z == 0:
                break
        else:
            primes.append(y)
    primes.sort()
    return primes

list_of_primes(20)

# [2, 3, 5, 7, 11, 13, 17, 19]

Explanation

  • Indentation is crucial in Python.
  • You need to test if y is divisible by z, not by a variable x which has not been defined.
  • Sort your list and return at the very end, both outside your outer for loop.
  • Use break to skip a number when it is found to be non-prime.
  • Apply your else statement on the inner for loop, not as part of the if / else clause.

How to efficiently find the indices of matching elements in two lists

I am working on two large data sets, and my question is as follows.

Suppose I have two lists:

list1 = [A,B,C,D]

list2 = [B,D,A,G]

How can I efficiently find the matching index, using Python, other than O(n2) searching? The result should look like:

matching_index(list1,list2) -> [(0,2),(1,0),(3,1)]

Solution:

Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverse index of the first list and then traverse the second list. This traverses each list only once and thus is O(n).

def find_matching_index(list1, list2):

    inverse_index = { element: index for index, element in enumerate(list1) }

    return [(index, inverse_index[element])
        for index, element in enumerate(list2) if element in inverse_index]

find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]

With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple index with a set.

def find_matching_index(list1, list2):

    # Create an inverse index which keys are now sets
    inverse_index = {}

    for index, element in enumerate(list1):

        if element not in inverse_index:
            inverse_index[element] = {index}

        else:
            inverse_index[element].add(index)

    # Traverse the second list    
    matching_index = []

    for index, element in enumerate(list2):

        # We have to create one pair by element in the set of the inverse index
        if element in inverse_index:
            matching_index.extend([(x, index) for x in inverse_index[element]])

    return matching_index

find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]

Unfortunately, this is no longer O(n). Consider the case where you input [1, 1] and [1, 1], the output is [(0, 0), (0, 1), (1, 0), (1, 1)]. Thus by the size of the output, the worst case is O(n^2).

Although, this solution is still O(n) if there are not duplicates.

Python iterate through array while finding the mean of the top k elements

Suppose I have a Python array a=[3, 5, 2, 7, 5, 3, 6, 8, 4]. My goal is to iterate through this array 3 elements at a time returning the mean of the top 2 of the three elements.

Using the above above, during my iteration step, the first three elements are [3, 5, 2] and the mean of the top 2 elements is 4. The next three elements are [5, 2, 7] and the mean of the top 2 elements is 6. The next three elements are [2, 7, 5] and the mean of the top 2 elements is again 6. …

Hence, the result for the above array would be [4, 6, 6, 6, 5.5, 7, 7].

What is the nicest way to write such a function?

Solution:

Solution

You can use some fancy slicing of your list to manipulate subsets of elements. Simply grab each three element sublist, sort to find the top two elements, and then find the simple average (aka. mean) and add it to a result list.

Code

def get_means(input_list):
    means = []
    for i in xrange(len(input_list)-2):
        three_elements = input_list[i:i+3]
        top_two = sorted(three_elements, reverse=True)[:2]
        means.append(sum(top_two)/2.0)
    return means

Example

print(get_means([3, 5, 2, 7, 5, 3, 6, 8, 4]))
# [4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]

Maximum distance between two different element in an array

I have a problem where I need to find the maximum distance between two different elements in an array.

For example: given an array 4,6,2,2,6,6,4 , the method should return 5 as the max distance.

I am able to solve the problem using two for loops but it is not an optimized solution. Am trying to optimize it by doing it in a single for loop.

here is my current solution:

int [] A = {4,6,2,2,6,6,4};
int N = A.length;
int result = 0;

for (int i = 0; i < N; i++){
    for (int j = i; j < N; j++) {
        if(A[i] != A[j]){
            result = Math.max(result, j - i);
        }
    }
}

// tried below code but it is not efficient
//      for (int i = 0; i < N; i++){
//          
//          if(A[N-1] != A[i]){
//              result = Math.max(result, N-1-i);
//          }
//      }

System.out.println(result);

How to make this better in terms of time complexity?

Solution:

Simple (not nested) loop is enough, but two cases should be taken into
account: either the best result is

  4,6,2,2,6,6,4
    ^         ^ - moving first

or

  4,6,2,2,6,6,4
  ^         ^   - moving last

for instance: [4, 2, 4, 4, 4] moving first brings the answer, when in case of [4, 4, 4, 2, 4] moving last should be used.

  int first = 0;
  int last = A.length - 1;

  // 1st case: moving "first"
  while (first < last) {
    if (A[first] == A[last])
      first++;
    else
      break;
  }

  int diff1 = last - first;

  first = 0;
  last = A.length - 1;

  // 2nd case: moving "last"
  while (first < last) {
    if (A[first] == A[last])
      last--;
    else
      break;
  }

  int diff2 = last - first;

  // result is the max between two cases
  int result = diff1 > diff2
    ? diff1
    : diff2;

So we have O(N) time complexity.

Edit: Let’s proof that at least one of the indexes is either 0 or length - 1. Let’s do it by contradiction. Suppose we have a solution like

  a, b, c, .... d, e, f, g
        ^ ..... ^  <- solution indexes (no borders)

Items to the left of c must be d, otherwise we can take a or b indexes and have an improved solution. Items to right of d must be c or we can once again push last index to the right and have a better solution. So we have

  d, d, c .... d, c, c, c
        ^ .... ^  <- solution indexes 

Now, since d <> c (c..d is a solution) we can improve the solution into

  d, d, c .... d, c, c, c
        ^ .... ^           <- solution indexes 
  ^       ....          ^  <- better solution

We have a contradiction (the supposed solution is not one – we have a better choice) and that’s why at least one index must be 0 or length - 1.

Now we have 2 scenarions to test:

  a, b, ..... y, z
     ^  ......   ^ <- moving first
  ^  ......   ^    <- moving last

We can combine both conditions into if and have just one loop:

  int result = 0;

  for (int i = 0; i < A.length; ++i)
    if (A[i] != A[A.length - 1] || A[0] != A[A.length - 1 - i]) {
      result = A.length - i - 1;

      break;
    }

Java – Check if array is sorted descendant

I need to check if array was sorted strictly descendant.
I wrote following code

public boolean isSortedDescendant(int [] array){
    if ((array.length == 0) || (array.length == 1)) {
        return true;
    } else {
        for(int i = 0; i < array.length - 1; i++){
            if (array[i] > array[i + 1]) {
                return true;
            }
        }
        return false;
    }
}

But it not working correctly. for

   int[] array2 = {3, 2, 2};

at least. I spend a lot of time for different approaches, but without any luck.

Solution:

You should only return true after checking all the pair of elements:

public boolean isSortedDescendant(int [] array){
    if ((array.length == 0) || (array.length == 1)) {
        return true;
    } else {
        for(int i = 0; i < array.length - 1; i++){
            if (array[i] <= array[i + 1]) {
                return false;
            }
        }
        return true;
    }
}

Finding the odd number out in an array

I am trying to solve a problem where I’m given an array, such as [0, 0, 1, 1, 2, 2, 6, 6, 9, 10, 10] where all numbers are duplicated twice, excluding one number, and I need to return the number that is not duplicated.

I am trying to do it like this:

def findNumber(self, nums):

    if (len(nums) == 1):
        return nums[0]

    nums_copy = nums[:]

    for i in nums:
        nums_copy.remove(i)

        if i not in nums:
            return i
        else:
            nums_copy.remove(i)

However when it reaches the else statement, there is the following error:

ValueError: list.remove(x): x not in list

This is occurring when i is in nums_copy, so I do not understand why this error occurs in this situation?

Solution:

You already nums_copy.remove(i) so you can’t nums_copy.remove(i) again

You could do:

a = [0, 0, 1, 1, 2, 2, 6, 6, 9, 10, 10]

def get_single_instance(array):
  d = {}

  for item in a:
    if item not in d:
      d[item] = 1
    else:
      d[item] += 1

  print d

  for k, v in d.iteritems():
    if v == 1:
      return k

print get_single_instance(a)

Result: 9