Decide on (Cleartext) Integer Semantics for HEIR's Python Frontend

Python famously has arbitrary precision integers, though a lot of the time "interesting" programs use `np.int64`  or something similar anyway. This raises the question how a basic `a + b` statement between two integers should be typed.

This is of course assuming we know the type of `a` and `b`, either from an annotation (ahead of time compilation) or because we have their actual runtime values (jit compilation). 

There are, afaik, basically three different approaches we can take:

1. "Pythonic Ideal": Track the bitwidth required to represent the result perfectly: 16-bit + 16-bit is 17-bit (or 32-bit, if rounded up to next power of two), 16-bit * 16-bit is 32-bit, etc. 
2. "Numba NBEP1": Numba at some point made the decision to basically upcast smaller types to the machine type, so `int16 + int16` is actually `int64` (at least on a 64-bit machine).  See https://numba.readthedocs.io/en/stable/proposals/integer-typing.html. Note that Numba does _not_ do this for arrays, so `array(int32) + array(int32)` is still `array(int32)`.
3. "MLIR  /  overflow=none": This is how we currently use the `arith` dialect. Adding/Multiplying two values of type `i32` still results in an `i32`. Overflow is essentially considered "Undefined Behavior" and the compiler simply says "not my problem".

In discussions about this so far, we've always gone with Option 3, as it's by far the easiest to deal with for arithmetic FHE, where we need to impose a fixed plaintext modulus. While Option 1 has some appeal, I think the costs of trying to handle this far outweigh the benefits. Finally, while I understand why Numba chose to "snap to pointer size", I don't think adapting this makes sense for us.

If there's consensus on this (or at least no active outcries against), I propose we go for option 3. However, that poses a bit of an issue with just using Numba's Type Inference out of the box:

<details>
<summary><b>Python code to see Numba Type Inference in action</b></summary>

```python
from numba.core.registry import cpu_target
from numba.core import compiler, sigutils
from numba.core.typed_passes import type_inference_stage

# Define a test function
def example_function(x, y):
    z = x + y
    return z

sig_string = "int16(int16, int16)"

test_ir = compiler.run_frontend(example_function)
typingctx = cpu_target.typing_context
targetctx = cpu_target.target_context
typingctx.refresh()
targetctx.refresh()

fn_args, fn_retty = sigutils.normalize_signature(sig_string)
typing_res = type_inference_stage(typingctx, targetctx, test_ir, fn_args,
                                    None)

# Get inferred types
typemap = typing_res.typemap
for var, typ in typemap.items():
    print(f"Variable: {var}, Type: {typ}")

# Variable: arg.x, Type: int16
# Variable: arg.y, Type: int16
# Variable: x, Type: int16
# Variable: y, Type: int16
# Variable: z, Type: int64
# Variable: $16return_value.4, Type: int64
```

</details>

We can get around this by either (a) doing some hacky stuff with forking numba (the culprit is `integer_binop_cases` in `numba.core.typing.builins`) which is what I did for testing, (b) not relying on numba type inference at all or (c) adding our own custom integer types via Numba extensions. We probably need to do some of this anyway because Numba treats all array types as dynamically sized, and we probably want statically known shapes. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide on (Cleartext) Integer Semantics for HEIR's Python Frontend #1252

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Decide on (Cleartext) Integer Semantics for HEIR's Python Frontend #1252

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions