Skip to content

Decide on (Cleartext) Integer Semantics for HEIR's Python Frontend #1252

@AlexanderViand-Intel

Description

@AlexanderViand-Intel

Python famously has arbitrary precision integers, though a lot of the time "interesting" programs use np.int64 or something similar anyway. This raises the question how a basic a + b statement between two integers should be typed.

This is of course assuming we know the type of a and b, either from an annotation (ahead of time compilation) or because we have their actual runtime values (jit compilation).

There are, afaik, basically three different approaches we can take:

  1. "Pythonic Ideal": Track the bitwidth required to represent the result perfectly: 16-bit + 16-bit is 17-bit (or 32-bit, if rounded up to next power of two), 16-bit * 16-bit is 32-bit, etc.
  2. "Numba NBEP1": Numba at some point made the decision to basically upcast smaller types to the machine type, so int16 + int16 is actually int64 (at least on a 64-bit machine). See https://numba.readthedocs.io/en/stable/proposals/integer-typing.html. Note that Numba does not do this for arrays, so array(int32) + array(int32) is still array(int32).
  3. "MLIR / overflow=none": This is how we currently use the arith dialect. Adding/Multiplying two values of type i32 still results in an i32. Overflow is essentially considered "Undefined Behavior" and the compiler simply says "not my problem".

In discussions about this so far, we've always gone with Option 3, as it's by far the easiest to deal with for arithmetic FHE, where we need to impose a fixed plaintext modulus. While Option 1 has some appeal, I think the costs of trying to handle this far outweigh the benefits. Finally, while I understand why Numba chose to "snap to pointer size", I don't think adapting this makes sense for us.

If there's consensus on this (or at least no active outcries against), I propose we go for option 3. However, that poses a bit of an issue with just using Numba's Type Inference out of the box:

Python code to see Numba Type Inference in action
from numba.core.registry import cpu_target
from numba.core import compiler, sigutils
from numba.core.typed_passes import type_inference_stage

# Define a test function
def example_function(x, y):
    z = x + y
    return z

sig_string = "int16(int16, int16)"

test_ir = compiler.run_frontend(example_function)
typingctx = cpu_target.typing_context
targetctx = cpu_target.target_context
typingctx.refresh()
targetctx.refresh()

fn_args, fn_retty = sigutils.normalize_signature(sig_string)
typing_res = type_inference_stage(typingctx, targetctx, test_ir, fn_args,
                                    None)

# Get inferred types
typemap = typing_res.typemap
for var, typ in typemap.items():
    print(f"Variable: {var}, Type: {typ}")

# Variable: arg.x, Type: int16
# Variable: arg.y, Type: int16
# Variable: x, Type: int16
# Variable: y, Type: int16
# Variable: z, Type: int64
# Variable: $16return_value.4, Type: int64

We can get around this by either (a) doing some hacky stuff with forking numba (the culprit is integer_binop_cases in numba.core.typing.builins) which is what I did for testing, (b) not relying on numba type inference at all or (c) adding our own custom integer types via Numba extensions. We probably need to do some of this anyway because Numba treats all array types as dynamically sized, and we probably want statically known shapes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    python frontendIssues related to the python frontend

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions