About - Sandeep Dasgupta

My Reading List

2021-04-01T00:00:00+00:00

My reading-list organised based on various topics.

Topics	Video	Papers	Blogs
Symbolic execution	Dynamic Symbolic Execution	X	X
Symbolic execution	X	X	X
Dataflow Analysis	Dataflow Analysis	X	X
Dataflow Analysis	X	X	X
Instruction Scheduling	X	Swing Modulo Scheduling: A Lifetime-Sensitive Approach	X
Instruction Scheduling	X	Thesis: AN IMPLEMENTATION OF SWING MODULO SCHEDULING WITH EXTENSIONS FOR SUPERBLOCKS	X

X86 FAQs & Binary Analysis Tools

2018-05-14T00:00:00+00:00

X86 FAQs & Binary Analysis Tools.

Tools

objdump

objdump -d -M=x86-64,att --no-show-raw-insn ./a.out

Calling C from Assembly

main:
  pushq   %rax
  movq $-1, %rdi
  movq $-1, %rax
  movl    $65, %edi
  callq   putchar
  xorl    %eax, %eax
  popq    %rcx
  retq
// as test.s -o test.o
// gcc test.o

Some useful gcc options

// Command used by compiler explorer V0.1
gcc test.c -02 -c -S -o - -masm=att | c++filt | grep -vE '\s+\.'

-fno-asynchronous-unwind-tables: disable CFI directives on gas assembler output


-march=haswell // Targetting ISA

Articles

Ida

Tutorial on x86 assembly programming (Syntax/Semantics)

Calling Conventions

x86 Disassembly/Floating Point Numbers

Adressing modes

X86-64 is a complex instruction set (CISC), so the MOV instruction has many different variants that move different types of data between different cells.

MOV, like most instructions, has a single letter suffix that determines the amount of data to be moved. The following names are used to describe data values of various sizes:

Suffix	Name	Size
B	BYTE	1
W	WORD	2
L	LONG	4
Q	QUADWORD	8

It is possible to leave off the suffix, and the assembler will attempt to choose the right size based on the arguments. However, this is not recommended, as it can have unexpected effects.

The arguments to MOV can have one of several addressing modes. Here is an example of using each kind of addressing mode to load a 64-bit value into %rax:

Mode	Example
Global Symbol	MOVQ x, %rax
Immediate	MOVQ $56, %rax
Register	MOVQ %rbx, %rax
Indirect	MOVQ (%rsp), %rax
Base-Relative	MOVQ -8(%rbp), %rax
Offset-Scaled-Base-Relative	MOVQ -16(%rbx,%rcx,8), %rax

-16(%rbx,%rcx,8) refers to the value at the address -16+%rbx+%rcx*8

For the most part, the same addressing modes may be used to store data into registers and memory locations. However, not all modes are supported. For example, it is not possible to use base-relative for both arguments of MOV: MOVQ -8(%rbx), -8(%rbx).

FAQs

How main works

Why %eax is made zero before printf

Code in C:

printf("%d", 1);
Output:

Assembly

movl    $1, %esi
leaq    LC0(%rip), %rdi
movl    $0, %eax  ; WHY?
call    _printf

From the x86_64 System V ABI:

  Register    Usage
  %rax        temporary register; with variable arguments
            passes information about the number of vector
            registers used; 1st return register

For calls that may call functions that use varargs or
stdargs (prototype-less calls or calls to functions
containing ellipsis (. . . ) in the declaration) %al is
used as hidden argument to specify the number of vector
registers used. The contents of %al do not need to
match exactly the number of registers, but must be an
upper bound on the number of vector registers used
and is in the range 0–8 inclusive.

printf is a function with variable arguments, and the number of vector registers used is zero. Note that printf must check only %al, because the caller is allowed to leave garbage in the higher bytes of %rax.

Reference

link

Decimal-Hexadecimal-2s Complement Binary Converter

2017-11-01T00:00:00+00:00

Getting rid of my dependecies on online convertors. The online version coming soon...

#!/usr/bin/python

##############################################################################################################
#   Extract information about a number.                                                                         #
#                                                                                                            #
#   Example Usage: python infonum.py --bit 4 0xf                                                             #
#   Output:                                                                                                  #
#           Base 10: -1                                                                                      #
#	    Base 16: f                                                                                       #
#	    2's Compliment binary: 1111                                                                      #
#   Example Usage: python infonum.py --bit 64 -1                                                             #
#   Output:                                                                                                  #
#	Base 10: -1                                                                                          #
#	Base 16: ff ff ff ff ff ff ff ff                                                                     #
#	2's Compliment binary: 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111       #
#                                                                                                            #
##############################################################################################################


import argparse
import pdb
import re

BIT= int(64)
NEGATE = {'1': '0', '0': '1'}

# Convert a hex number (in 2's negate) to Decmal
def toDec(hexstr):
    msb4bits = hexstr[0]
    n = int(msb4bits, 16)
    if n >= 8:
        p = -1*pow(2,BIT-1)
        addend = int(str(n-8) + hexstr[1:], 16)
        return str( p + addend)
    else:
        return str(int(hexstr, 16))


# Convert a decimal number to  2's negate Hex
def toHex(n):

    num = int(n)
    if num == 0:
        return '0'

    M = '0123456789abcdef'  # like a map
    ans = ''

    chunks = int(BIT) / int(4)

    for i in range(chunks):
        n = num & 15       # this means num & 1111b
        c = M[n]          # get the hex char
        ans = c + ans
        num = num >> 4
    return ans

def twocomplement(n, size_in_bits):
    number = int(n)
    if number < 0:
        return negate(bin(abs(number) - 1)[2:]).rjust(size_in_bits, '1')
    else:
        return bin(number)[2:].rjust(size_in_bits, '0')

def negate(value):
    return ''.join(NEGATE[x] for x in value)

def prettybinary(value, separator):
    ans = ''
    for i in range(0,len(value)):
        if i != 0 and i % separator == 0:
            ans = ans + ' ' + value[i]
        else:
            ans = ans + value[i]
    return ans

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--bit",
        default=64,
        help="Bit width of the number",
        required=True)
    parser.add_argument(
        "num", nargs='?',
        help="Number to be analyzed")

    args = parser.parse_args()
    if args.bit:
        BIT = int(args.bit)
        print("Using {} bit".format(BIT))

    num = (args.num)
    matchObj = re.match( r'0x(.*)|0X(.*)', num, re.M|re.I)

    if matchObj:
        # Hex input
        hexnum =  matchObj.group(1)
        decimalnum = toDec(hexnum)
        hexnum =  toHex(decimalnum) # To get the right padding
        binarynum = twocomplement(decimalnum, BIT)
    else:
        # Decimal input
        decimalnum = num
        hexnum =  toHex(decimalnum)
        binarynum = twocomplement(decimalnum, BIT)


    print("\tBase 10: {}".format(decimalnum))
    print("\tBase 16: {}".format(prettybinary(hexnum,2)))
    print("\t2's Compliment binary: {}".format(prettybinary(binarynum, 8)))

x86-64 Stack Frame Layout

2017-10-12T00:00:00+00:00

x86-64 Stack Frame Layout

Process Stack

High Addres --> -----------
                |         |
                |         |
                |         |
                |         |
                |         |
                -----------
  Oxffffff08    | foo     |  <-- XSP
                -----------
  Oxffffff00    |         |
Low Addres -->  |         |

To push new data onto the stack we use the push instruction push %rax Is actually equivalent to this:

sub $8, %rsp
mov %rax, (%rsp)

High Addres --> -----------
                |         |
                |         |
                |         |
                |         |
                |         |
                -----------
  Oxffffff08    | foo     |
                -----------
  Oxffffff00    | %rax val| <-- XSP
                -----------
Low Addres -->  |         |

Similarly, the pop instruction takes a value off the top of stack and places it in its operand, increasing the stack pointer afterwards. In other words, this: pop rax Is equivalent to this:

mov (%rsp), %rax
add  $8, %rsp

High Addres --> -----------
                |         |
                |         |
                |         |
                |         |
                |         |
                -----------
  Oxffffff08    | foo     |  <-- XSP
                -----------
  Oxffffff00    | %rax val| <-- Still there
                -----------
Low Addres -->  |         |

Stack frames & Calling convention

int foobar(int a, int b, int c)
{
    int xx = a + 2;
    int yy = b + 3;
    int zz = c + 4;
    int sum = xx + yy + zz;

    return xx * yy * zz + sum;
}

int main()
{
    return foobar(77, 88, 99);
}

Right before the return statement, the stack frame for foobar looks like this:

The green data were pushed onto the stack by the calling function, and the blue ones by foobar itself.

An x86-64 instruction may be at most 15 bytes in length. It consists of the following components in the given order, where the prefixes are at the least-significant (lowest) address in memory.

Argument Passing

According to the System V AMD 64 ABI, the first 6 integer or pointer arguments to a function are passed in registers. The order being:

 rdi:rsi:rdx:rcx:r8:r9

The 7th argument and onwards are passed on the stack.

long myfunc(long a, long b, long c, long d,
            long e, long f, long g, long h)

{

}
rdi: a
rsi: b
rdx: c
rdx: d
r8: e
r9: f
g & h are passed onto stack

Reference

Stack frame layout on x86-64

LLVM Compiler Bugs related to UD

2017-10-06T00:00:00+00:00

Here I am collecting some important points from various articiles mentioned in the reference section.

LLVM has three distinct kinds of undefined behavior. Together, they enable many desirable optimizations, and LLVM aggressively exploits these opportunities.

Undefined behavior in LLVM resembles undefined behavior in C/C++: anything may happen to a program that executes it. The compiler may simply assume that undefined behavior does not oc- cur; this assumption places a corresponding obligation on the pro- gram developer (or on the compiler and language runtime, when a safe language is compiled to LLVM) to ensure that undefined op- erations are never executed. An instruction that executes undefined behavior can be replaced with an arbitrary sequence of instructions. When an instruction executes undefined behavior, all subsequent instructions can be considered undefined as well.

Following Table shows an arithmetic instructions have defined behavior, following the LLVM IR specification. For example, the shl instruction is defined only when the shift amount is less than the bitwidth of the instruction.

Coming to the memory related instructions, the getelementptr instruction supports structured address computations: it uses a sequence of additions and multiplications to compute the address of a specific array element or structure field. For example, an array dereference in C such as val = a[b][c] can be translated to the following LLVM code:

%ptr = getelementptr %a, %b, %c
%val = load %ptr

Unstructured memory accesses are supported by the inttoptr instruction. The load and store instructions support typed memory reads and writes. Out-of-bounds and unaligned loads and stores result in true undefined behavior, but a load from valid, uninitialized memory returns an undef.

Instruction	Definedness Constraint
sdiv a, b	b != 0 ∧ (a ?= INT MIN ∨ b != −1)
udiv a, b	b != 0
srem a, b	b != 0 ∧ (a ?= INT MIN ∨ b != −1)
urem a, b	b != 0
shl a, b	b
lshr a, b	b
ashr a, b	b

undef

Explicit value in the IR
Acts like a free-floaLng hardware register
- Takes all possible bit pakerns at the specified width
- Can take a different value every Lme it is used
Comes from uniniLalized variables
Further reading

poison

Ephemeral effect of math instrucLons that violate
- nsw – no signed wrap for add, sub, mul, shl
- nuw – no unsigned wrap for add, sub, mul, shl
- exact – no remainder for sdiv, udiv, lshr, ashr
Designed to support speculative execuLon of operaLons that might overflow. For example we may host loop invariant x+1 outside the loop as signed add might overflow.
Poison propagates via instrucLon results
If poison reaches a side-effecting instrucLon, the result is true UB.

True UD

True undefined behavior

Triggered by
- Divide by zero
- Illegal memory accesses

Example 1

%1 = add  %x, 1

=>
%1 = add nsw %x, 1

ERROR: Target is more poisonous than Source for i4 %1

Example:
%x i4 = 0x7 (7)
Source value: 0x8 (8, -8)
Target value: poison

Example 2

%1 = add nsw %x, 1

=>
%1 = add  %x, 1

Optimization is correct

Example 3

%1 = add nsw %x, 1
%2 = icmp sgt %1, %x

=>

%2 = true

Done: 1
Optimization is correct

shl

= shl , ; yields ty:result

Both arguments to the ‘shl‘ instruction must be the same integer or vector of integer type. ‘op2‘ is treated as an unsigned value.

The value produced is op1 * 2^op2 mod 26n, where n is the width of the result. If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, this instruction returns a poison value. If the arguments are vectors, each vector element of op1 is shifted by the corresponding shift amount in op2.

If the nuw keyword is present, then the shift produces a poison value if it shifts out any non-zero bits. or

(a << b) >>u b = a where >>u is logical shift.

If the nsw keyword is present, then the shift produces a poison value it shifts out any bits that disagree with the resultant sign bit. (a<>b

for n = 4; 0111 << 1 leads to poison as

0111 << 1 == 1110 >>1 == 1111 (!= 0111)

sdiv

Division by zero is undefined behavior. For vectors, if any element of the divisor is zero, the operation has undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by doing a 32-bit division of -2147483648 by -1.

If the exact keyword is present, the result value of the sdiv is a poison value if the result would be rounded.

srem

Taking the remainder of a division by zero is undefined behavior. For vectors, if any element of the divisor is zero, the operation has undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by taking the remainder of a 32-bit division of -2147483648 by -1. (The remainder doesn’t actually overflow, but this rule lets srem be implemented using instructions that return both the result of the division and the remainder.)

Bug 20186

The transformation of -(X/C) to X/(-C) is invalid if C == INT_MIN.

%a = sdiv %X, C
%r = sub 0, %a
=>
%r = sdiv %X, -C

ERROR: Domain of definedness of Target is smaller than Source's for i4 %r

Example:
%X i4 = 0x8 (8, -8)
C i4 = 0x1 (1)
%a i4 = 0x8 (8, -8)
Source value: 0x8 (8, -8)
Target value: undef

Bug 20189

%B = sub 0, %A
%C = sub nsw %x, %B
=>
%C = add nsw %x, %A

ERROR: Target is more poisonous than Source for i4 %C

Example:
%A i4 = 0x8 (8, -8)
%x i4 = 0x8 (8, -8)
%B i4 = 0x8 (8, -8)
Source value: 0x0 (0) // -8 - (0 - (-8)) == -8 - (-8)
Target value: poison // -8 + -8

Bug 21242

Pre: isPowerOf2(C1)
%r = mul nsw %x, C1

=>

%r = shl nsw %x, log2(C1)

ERROR: Target is more poisonous than Source for i4 %r

Example:
%x i4 = 0x1 (1)
C1 i4 = 0x8 (8, -8)
Source value: 0x8 (8, -8)
Target value: poison

// Source : mul nsw 1 * -8 (defined)
// Target : shl nsw 0001, 3 (poison)

Bug 21243

Pre: !WillNotOverflowSignedMul(C1, C2)
%Op0 = sdiv %X, C1
%r = sdiv %Op0, C2

=>
%r = 0

ERROR: Mismatch in values of i4 %r

Example:
%X i4 = 0x8 (8, -8)
C1 i4 = 0x2 (2)
C2 i4 = 0x4 (4)
%Op0 i4 = 0xC (12, -4)
Source value: 0xF (15, -1)
Target value: 0x0 (0)

Source: 4/ (-8/2) = -1
Target : 0

Bug 21245

Pre: C2 % (1<
%r = sdiv %X, (C2 / (1 << C1))

ERROR: Mismatch in values of i4 %r

Example:
%X i4 = 0xF (15, -1)
C1 i4 = 0x3 (3)
C2 i4 = 0x8 (8, -8)
%s i4 = 0x8 (8, -8)
Source value: 0x1 (1)
Target value: 0xF (15, -1)

Source:
1111 shl 3 bit == 1000 == -8
-8/C2 = 1

Target:
-1 / 8/8 or -1/ -8/-8 == -1 (15 or -1)

Bug 21255

%Op0 = lshr %X, C1
%r = udiv %Op0, C2
  =>
%r = udiv %X, (C2 << C1)

ERROR: Domain of definedness of Target is smaller than Source's for i4 %r

Example:
%X i4 = 0x0 (0)
C1 i4 = 0x4 (4)
C2 i4 = 0x1 (1)
%Op0 i4 = poison
Source value: 0x0 (0)
Target value: UB

Source: lshr 0, 4 (poison as shift amount >= bitwidth)
Target: %x / 1 << 4 ( == 0) i.e. UD

And bypassing the undef case:

Pre: ((C2 << C1) != 0)
%Op0 = lshr exact %X, C1
%r = udiv %Op0, C2
  =>
%r = udiv %X, (C2 << C1)


ERROR: Mismatch in values of i4 %r

Example:
%X i4 = 0x8 (8, -8)
C1 i4 = 0x2 (2)
C2 i4 = 0x9 (9, -7)
%Op0 i4 = 0x2 (2)
Source value: 0x0 (0)
Target value: 0x2 (2)

Source: (lshr exact -8,2 ) / 9 == 2/9 = 0
Target: 8 / (-7 << 2) = 8 / (1001 << 2) = 8 / 4 = 2

And finally we have:

Pre: WillNotOverflowUnsignedShl(C2, C1)
%Op0 = lshr %X, C1
%r = udiv %Op0, C2
  =>
%r = udiv %X, (C2 << C1)

Done
Optimization is correct!

Bug 21256

%Op1 = sub 0, %X
%r = srem %Op0, %Op1
  =>
%r = srem %Op0, %X


ERROR: Domain of definedness of Target is smaller than Source's for i4 %r

Example:
%X i4 = 0xF (15, -1)
%Op0 i4 = 0x8 (8, -8)
%Op1 i4 = 0x1 (1)
Source value: 0x0 (0)
Target value: undef

Source: -8 % (0 - (-1)) = -8 % 1 = 0
Target: -8 % -1 =  UD

Bug 31633

InstCombine currently folds "select %c, undef, %foo" into %foo, because it assumes that undef can take any value that %foo may take.

%y2 = add nsw i32 %y, 1
%s = select i1 %c, i32 undef, i32 %y2
%r = icmp sgt i32 %s, %y

=>
%r = true

ERROR: Mismatch in values of i1 %r

Example:
%y i32 = 0x7FFFFFFF (2147483647)
%c i1 = 0x1 (1, -1)
%y2 i32 = poison
%s i32 = 0x00000000 (0)
Source value: 0x0 (0)
Target value: 0x1 (1, -1)

%y2 overflows and becomes poison, but the select should return undef only, not poison.

Refereces

Lambda Calculus

2017-09-12T00:00:00+00:00

Lambda Calculus.

Lambda Calculus

syntax

E → ID
E → λ ID. E
E → E E
E → (E)

The grammar is ambiguous like

xyz could be x(yz) or xy(z)
λx.yz could be (λx.y)z  or λx.(yz)

The grammar rules are not changed to make it unambiguous, but some disambiguation rules are added outside of the grammar.

E → E E is left assocative:  xyz == (xy)z
λ x.yz == λ x.(yz)
λx.λy.zw == λx.(λy.(zw))

Note: let x = e in e' is nothing but syntactic sugar for

(λ x . e') e

Semantics

Every ID that we see in lambda calculus is called a variable
E → ID . E is called an abstraction
- The ID is the variable of the abstraction (also metavariable)
- E is called the body of the abstraction
E → E E
- This is called an application
λ ID . E defines a new anonymous function
- ID is the formal parameter of the function
- E Body is the body of the function
E → E1 E2, function application, is similar to calling function E1 and setting its formal parameter to the actual parameter E2

Examples

Expl I

λ x . + x 1 == λ x . (+ x 1)
- Represents a function that adds one to its argument
(λ x . + x 1) 2
- Represents calling the original function by supplying 2 for x and it would "reduce" to (+ 2 1) = 3
Computing with lambda expressions involves rewriting; for each application, we replace all occurrences of the formal parameter variable in the function body with the value of the actual parameter (a lambda expression). It is easier to understand if we use the abstract-syntax tree of a lambda expression instead of just the text. Here's our simple example application again:

(λx.x+1)3 And here's the abstract-syntax tree (where λ is the abstraction operator, and apply is the application operator):

We rewrite the abstract syntax tree by finding applications of functions to arguments, and for each, replacing the formal parameter with the argument in the function body. To do this,

we must find an apply node whose left child is a lambda node, since only lambda nodes represent functions.
The right subtree of the apply node is the argument.
The left subtree of the apply node (with a lambda at its root) is the function.
The left child of the lambda is the formal parameter.
The right child of the lambda is the function body.

There is only one apply node in our example; the argument is 3, the function is λx.x+1; the formal parameter is x, and the function body is x+1. Here's the rewriting step:

        apply      =>      +
        /   \             / \
       λ     3           3   1
      / \
     x   +
        / \
       x   1

Here's an example with two applications: (λx.x+1)((λy.y+2)3)

        apply         =>   apply     =>  apply   =>  +  =>  6
       /     \             /   \         /   \      / \
      λ       apply       λ     +       λ     5    5   1
     / \       /  \      / \   / \     / \
    x   +     λ    3    x   + 3   2   x   +
       / \   / \           / \           / \
      x   1 y   +         x   1         x   1
               / \
              y   2

apply         =>    +     =>  +   =>  +  =>  6
/     \             / \       / \     / \
λ       apply     apply 1     +   1   5   1
/ \       /  \      / \       / \
x   +     λ    3    λ   3    3    2
/ \   / \       / \
x   1 y   +     y   +
       / \       / \
y   2     y   2

Expl II

Note that the result of rewriting a non-pure lambda expression can be a constant (as in the examples above), but the result can also be a lambda expression: a variable, or an abstraction, or an application. For a pure lambda expression, the result of rewriting will always itself be a lambda expression. Here are some more examples:

(λf.λx.fx)λy.y+1

        apply      =>   λ        =>    λ        λx.x+1
       /     \         / \            / \
      λ       λ       x  apply       x   +
     / \     / \         /   \          / \
    f   λ   y   +       λ     x        x   1
       / \     / \     / \
      x  apply y  1   y   +
         /  \            / \
        f    x          y   1

Note that the result of the rewriting is a function. Also note that in this example, although there are initially two "apply" nodes, only one of them has a lambda node as its left child, so there is only one rewrite that can be done initially.

(λx.λy.x)(λz.z)

           apply            λ         λy.λz.z
          /     \          / \
         λ       λ    =>  y   λ
        / \     / \          / \
       x   λ   z   z        z   z
          / \
         y   x

(λx.λy.xy)(λz.z)

  apply      =>       λ     =>        λ      λy.y
 /     \             / \             / \
λ       λ           y   apply       y   y
/ \     / \             /     \
x   λ   z   z           λ       y
   / \                 / \
  y  apply            z   z
  /     \
 x       y

Currying

Technique to translate the evaluation of a function that takes multiple arguments into a sequence of functions that each take a single argument
Define adding two parameters together with functions that only take one parameter:
- λ x . λ y . ((+ x) y)
- (λ x . λ y . ((+ x) y)) 1
  - λ y . ((+ 1) y)
- (λ x . λ y . ((+ x) y)) 10 20
  - (λ y . ((+ 10) y)) 20
  - ((+ 10) 20) = 30
Example in the context of programming

  #include 
  using namespace std;

  int F(int a, int b, int c) { return a + b + c; }

  int F_curry() {
    auto f = [](int a) {
      return [a](int b) { return [a, b](int c) { return a + b + c; }; };
    };

    return ((f(1))(2))(3);
  }

  int main() {
    cout << F(1, 2, 3) << endl;
    cout << F_curry() << endl;
  }

Problems with the naive rewriting rule

Problem 1

We don't, in general, want to replace all occurrences of x.

To see why, consider the following (non-pure) lambda expression: (λx.(x + ((λx.x+1)3)))2

This expression should reduce to 6;

the inner expression: (λx.x+1)3 takes one argument, the value 3, and adds 1, producing 4. The outer expression is now: (λx.(x + 4))2 i.e., it takes one argument, the value 2, and adds 4, producing 6.

However, if we rewrite the outer application first, using the naive rewriting rule, here's what happens:

  apply
   /\
  λ  2
 / \
x   +                        +
   / \                      / \
 x   apply     =>          2  apply     =>  +    => 5
      / \    (bad              / \         / \
     λ   3   application)     λ   3       2   +
    / \                      / \             / \
   x   +                    x   +           2   1
      / \                      / \
     x   1                    2   1

We get the wrong answer (5 instead of 6), because we replaced the occurrence of x in the inner expression with the value supplied as the parameter for the outer expression.

Problem 2

Consider the (pure) lambda expression

((λx.λy.x)y)z

The expression λx.λy.x should simply return the first argument, so in this case the result of rewriting should be y. However, if we use the naive rewriting rule, replacing all occurrences of the formal parameter x with the argument y, we get: (λy.y)z

and now if we rewrite that expression we get z

i.e., we got the second argument instead of the first one!

This example illustrates what is called the "capture" or "name clash" problem.

Free variable

A variable is free if it does not appear within the body of an abstraction with a metavariable of the same name

x free in λ x . x y z? No
y free in λ x . x y z? Yes
x free in (λ x . (+ x (No) 1)) x (Yes)?
z free in λ x . λ y . λ z . z y x? No
x free in (λ x . z foo) (λ y . y x)? Yes

x is free in E if:

E = x
E = λ y . E1, where y != x and x is free in E1
E = E1 E2, where x is free in E1 or E2

x free in x λ x . x == x Yes (λ x . x No) --> Yes, from rule 3
x free in (λ x . x y) x ? Yes
x free in λ x . y x ? No

Bound Variables

If an occurrence of x is free in E, then it is bound by λ x . in λ x . E
If an occurrence of x is bound by a particular λ x . in E, then x is bound by the same λ x . in λ z . E
- Even if z == x
- Example: λ x . λ x . x
  - Can also be written as λ y . λ x . x; So x is bound by λ x preceding it.
- If an occurrence of x is bound by a particular λ x . in E1, then that occurrence in E1 is tied by the same abstraction λ x . in E1 E2 and E2 E1

Example

(λ x . x (λ y . x y z y) x) x y
- (λ x . x (λ y . x y z y) x) x(Free) y(Free)
(λ x . λ y . x y) (λ z . x(Free) z)
(λ x . x λ x . z x)
- (λ x . x) (λ x . z(Free) x)

Alpha Reduction

To solve Problem 2, we use a technique called alpha-reduction. The basic idea is that formal parameter names are unimportant; so rename them as needed to avoid capture. Alpha-reduction is used to modify expressions of the form "λx.M". It renames all the occurrences of x that are free in M to some other variable z that does not occur in M (and then λx is changed to λz). For example, consider λx.λy.x+y (this is of the form λx.M). Variable z is not in M, so we can rename x to z; i.e., λx.λy.x+y alpha-reduces to λz.λy.z+y

Beta Reduction

Defined by:

(λx. e1) e2 ⇒ e1[e2/x]

where the notation e1[e2/x] denotes the result of substituting e2 for all free occurrences of x in e1.

(lambda z. (z z)) (lambda x. lambda y. (x y)); A: for apply

    A                        A                 λ        λ
                          /     \              /\       /\
   / \                   λ      λ             w  A     w λ
  /   \                  /\     /\               /\     /\
 λ     λ        ==>     x  λ   x  λ    ->        λ w -> y A
 /\   / \                  /\     /\            /\       /\
z  A  x  λ                w  A   y  A          x λ      w y
   / \    /\                  /\    /\           /\
 z   z   y A                 x w    x y         y A
            /\                                    /\
           x  y                                  x y

λw.(λy.wy)

References

Compilers Leveraging Undefined Behavious

2017-09-12T00:00:00+00:00

One of the classic examples of compilers making use of undefined behaviour is as follows: C standars says signed integers overflow is undefined.

Knowing this information help compiler to optimize x+1>x to true. As compilers know that INT_MAX+1 is undefined so it can safely make the optimization.

Had signed integer overflow been defined (with a definition of say wrap around), then we will not be able to do the optimization as x + 1 is not > x if x == INT_MAX (under the wrap around defintion)

Leveraging undefined behavious by optimizing compilers.

Undefined behaviors facilitate optimizations by permitting a compiler to assume that programs will only execute defined operations.

Case I

#include 

int fermat() {
  const int MAX = 1000;
  int a=1,b=1,c=1;
  // Endless loop with no side effects is UB
  while (1) {
    if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1;
    a++;
    if (a>MAX) { a=1; b++; }
    if (b>MAX) { b=1; c++; }
    if (c>MAX) { c=1;}
  }
  return 0;
}

int main() {
  if (fermat())
    std::cout << "Fermat's Last Theorem has been disproved.\n";
  else
    std::cout << "Fermat's Last Theorem has not been disproved.\n";
}

Result:

Fermat's Last Theorem has been disproved.

Despite the fact that this program does not contain any arithmetic overflows (multiplier factors vary in the range from 1 to 1000, the sum of their cubes does not exceed 2^31), the C++ standard defines an infinite loop as an undefined action, without changing the external state. That’s why C++ compilers are entitled to consider similar loops as finite.

The compiler can easily see that the only way out of the while(1) loop is the return 1; statement, while the return 0; statement at the end of fermat() cannot be reached. Therefore, it optimizes this function to

int fermat (void) { return 1; } In other words, the only possibility to write an infinite loop that could not be removed by the compiler is to add a modification of the external state to the loop body.

Case II

int table[4];
bool exists_in_table(int v)
{
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

First of all, you might notice the off-by-one error in the loop control. The result is that the function reads one past the end of the table array before giving up. A classical compiler wouldn't particularly care. It would just generate the code to read the out-of-bounds array element (despite the fact that doing so is a violation of the language rules), and it would return true if the memory one past the end of the array happened to match.

A post-classical compiler, on the other hand, might perform the following analysis:

The first four times through the loop, the function might return true.
When i is 4, the code performs undefined behavior. Since undefined behavior lets me do anything I want, I can totally ignore that case and proceed on the assumption that i is never 4. (If the assumption is violated, then something unpredictable happens, but that's okay, because undefined behavior grants me permission to be unpredictable.)
The case where i is 5 never occurs, because in order to get there, I first have to get through the case where i is 4, which I have already assumed cannot happen.
Therefore, all legal code paths return true. As a result, a post-classical compiler can optimize the function to C bool exists_in_table(int v) { return true; }

Case III

int foo(int x) {
    return x+1 > x; // either true or UB due to signed overflow
}

may be compiled as (demo)

foo(int):
        movl    $1, %eax
        ret

Because in all legal cases true is returned.

Case IV

std::size_t f(int x)
{
    std::size_t a;
    if(x) // either x nonzero or UB
        a = 42;
    return a;
}

May be compiled as (demo)

f(int):
        mov     eax, 42
        ret

Because in all legal cases 42 is returned.

bool p; // uninitialized local variable
if(p) // UB access to uninitialized scalar
    std::puts("p is true");
if(!p) // UB access to uninitialized scalar
    std::puts("p is false");

Possible output:

p is true
p is false

The code will ub are optimized out.

Case V

int foo(int* p) {
    int x = *p;
    if(!p) return x; // Either UB above or this branch is never taken
    else return 0;
}

may be compiled as

foo(int*):
        xorl    %eax, %eax
        ret

If p is null, the if access is UD; If p is non null, then return 0 will happen. So in legal cases, 0 is returned and so is the optimization.

Pointer Overflow

It is undefined behavior to perform pointer arithmetic where the result is outside of an object, with the exception that it is permissible to point one element past the end of an array:

int a[10];
int *p1 = a - 1; // UB
int *p2 = a; // ok
int *p3 = a + 9; // ok
int *p4 = a + 10; // ok, but can't be dereferenced
int *p5 = a + 11; // UB

Interesting case

char buffer[BUFLEN];
char *buffer_end = buffer + BUFLEN;

/* ... */
unsigned int len;

if (buffer + len >= buffer_end)
  die_a_gory_death("len is out of range\n");

Here, the programmer is trying to ensure that len (which might come from an untrusted source) fits within the range of buffer. There is a problem, though, in that if len is very large, the addition could cause an overflow, yielding a pointer value which is less than buffer. So a more diligent programmer might check for that case by changing the code to read:

if (buffer + len >= buffer_end || buffer + len < buffer)
  loud_screaming_panic("len is out of range\n");

This code should catch all cases; ensuring that len is within range. There is only one little problem: recent versions of GCC will optimize out the second test (returning the if statement to the first form shown above), making overflows possible again. So any code which relies upon this kind of test may, in fact, become vulnerable to a buffer overflow attack.

This behavior is allowed by the C standard, which states that, in a correct program, pointer addition will not yield a pointer value outside of the same object. So the compiler can assume that the test for overflow is always false and may thus be eliminated from the expression. It turns out that GCC is not alone in taking advantage of this fact: some research by GCC developers turned up other compilers (including PathScale, xlC, LLVM, TI Code Composer Studio, and Microsoft Visual C++ 2005) which perform the same optimization. So it seems that the GCC developers have a legitimate reason to be upset: CERT would appear to be telling people to avoid their compiler in favor of others - which do exactly the same thing.

The right solution to the problem, of course, is to write code which complies with the C standard. In this case, rather than doing pointer comparisons, the programmer should simply write something like:

if (len >= BUFLEN)
    launch_photon_torpedoes("buffer overflow attempt thwarted\n");

References

Build SPEC2006 using LLVM’s cmake infrastructure

2017-08-21T00:00:00+00:00

Instructions to build Spec2006 using LLVM's Cmake infrastructure.

  # Setup the environment.
  export LLVM_SRC=source code tree>
  export LLVM_BLD=
  export SPEC_SRC=source code tree>
  TESTSUITE_BUILD_DIR=dir of test-suite>

  # Make SPEC source available to LLVM build system.
  mkdir $LLVM_SRC/projects/test-suite/test-suite-externals
  ln -s $SPEC_SRC $LLVM_SRC/projects/test-suite/test-suite-externals/speccpu2006

  mkdir $TESTSUITE_BUILD_DIR && cd $TESTSUITE_BUILD_DIR
  # Configure
  cmake $LLVM_SRC/projects/test-suite -DCMAKE_C_COMPILER= -DCMAKE_CXX_COMPILER=
  # Build the binaries.
  cd External/SPEC/CINT2006/
  make -j 8
  # Run
  lit -v -j 8 . -o results.json

C++ Timers

2017-08-09T00:00:00+00:00

The following code illustrates the usage of various APIs to time execution in C++ code. Some of these are applicable in C as well.


#include 
#include 
#include   /* sqrt */
#include  /* printf */
#include 
#include  /* clock_t, clock, CLOCKS_PER_SEC */

using namespace std::chrono;

// THe function to be timed.
int frequency_of_primes(int n) {
  int i, j;
  int freq = n - 1;
  for (i = 2; i <= n; ++i)
    for (j = sqrt(i); j > 1; --j)
      if (i % j == 0) {
        --freq;
        break;
      }
  return freq;
}

int main() {
  clock_t start1, end1;
  time_t start2, end2;
  timeval start3, end3;
  std::chrono::time_point start4, end4;
  std::chrono::high_resolution_clock::time_point start5, end5;

  /* Collect the start times using different methods. */
  // Method 1
  start1 = clock();
  // Method 2
  time(&start2);
  // Method 3
  gettimeofday(&start3, NULL);
  // Method 4
  start4 = std::chrono::system_clock::now();
  // Method 5
  start5 = std::chrono::high_resolution_clock::now();

  // The computaton to be timed.
  printf("Calculating...\n");
  int f = frequency_of_primes(999999);
  printf("The number of primes lower than 100,000 is: %d\n\n", f);

  /* Collect the end times using different methods. */
  // Method 1
  end1 = clock();
  // Method 2
  time(&end2);
  // Method 3
  gettimeofday(&end3, NULL);
  // Method 4
  end4 = std::chrono::system_clock::now();
  // Method 5
  end5 = std::chrono::high_resolution_clock::now();

  /* Compute elapsed times using different methods. */
  // Method 1
  long int elapsed_seconds1 = end1 - start1;
  // Method 2
  double elapsed_seconds2 = difftime(end2, start2);
  // Method 3
  double elapsed_seconds3 = (double(end3.tv_sec - start3.tv_sec)) * 1000000.00 +
                            double(end3.tv_usec - start3.tv_usec);
  // Method 4
  std::chrono::duration elapsed_seconds4 = end4 - start4;
  // Method 5
  std::chrono::duration elapsed_seconds5 = (end5 - start5);
  double elapsed_seconds6 =
      double(duration_cast(end5 - start5).count());

  /* Display the elapsed times */
  printf("Clock: %ld clicks (%f us).\n", elapsed_seconds1,
         (double(elapsed_seconds1)) * 1000000.00 / CLOCKS_PER_SEC);
  printf("Time: %f us).\n", elapsed_seconds2 * 1000000.00);
  printf("gettimeofday: %f us).\n", elapsed_seconds3);
  printf("chrono::system_clock %f us).\n",
         elapsed_seconds4.count() * 1000000.00);
  printf("chrono::high_resolution_clock %f us).\n",
         elapsed_seconds5.count() * 1000000.00);
  printf("chrono::high_resolution_clock %f us).\n", elapsed_seconds6);

  return 0;
}

Assembly Language Debugging with gdb

2016-12-16T00:00:00+00:00

Assembly Language Debugging with gdb.

Useful commands

Signal Handlers

info signal SIGUSR1
handle SIGUSR1 noprint nostop

Run gdb in Text User Interface (TUI) mode

gdb -tui

Change the layout to Assembly

layout asm

Or split the layout to C and Assembly

layout split

Apply the following customization

set disassembly-flavor att/intel
set print asm-demangle
set disassemble-next-line on  //ask gdb to show us the next instruction every time

Puts break point on main and invoke run

start

Examine memory: x/FMT ADDRESS.
- ADDRESS is an expression for the memory address to examine.
- FMT == [NUM][FORMAT][SIZE]
  - [Format] ::= [ o(octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char), s(string), z(hex, zero padded on the left)].
  - [Size] ::= [b(byte), h(halfword), w(word), g(giant, 8 bytes)]
  - The specified number of objects of the specified size are printed according to the format. If a negative number is specified, memory is examined backward from the address.
  - Example
```
x/10hb $rsp
x/Ni $pc
```
Use nexti,stepi instead of next, step which traverse the source lines
Printing registers

info registers
info all-registers
info registers regname …
info registers eflags

Printing & setting xmm registers

(gdb) print $xmm1
$1 = {
  v4_float = {0, 3.43859137e-038, 1.54142831e-044, 1.821688e-044},
  v2_double = {9.92129282474342e-303, 2.7585945287983262e-313},
  v16_int8 = "\000\000\000\000\3706;\001\v\000\000\000\r\000\000",
  v8_int16 = {0, 0, 14072, 315, 11, 0, 13, 0},
  v4_int32 = {0, 20657912, 11, 13},
  v2_int64 = {88725056443645952, 55834574859},
  uint128 = 0x0000000d0000000b013b36f800000000
}

To set values of such registers, you need to tell GDB which view of the register you wish to change, as if you were assigning value to a struct member:

 (gdb) set $xmm1.uint128 = 0x000000000000000000000000FFFFFFFF

References

Debugging with GDB