Skip to content

Unions of Floats on ARM64 #1177

@cfis

Description

@cfis

For a long time, some of the proj4rb tests fail on MacOS on ARM64 silicon. The failures are like this:

 1) Failure:
ConversionTest#test_pipeline [test/conversion_test.rb:276]:
Expected |1450880.2910605003 - 2.0e-323| (1450880.2910605003) to be <= 0.001.

See See https://github.com/cfis/proj4rb/actions/runs/22570514174/job/65376882406.

This looks like some ABI problem to me, so I finally took a look. The code in question uses this Union:

    class PJ_COORD < FFI::Union
      layout :v, [:double, 4],
             :xyzt, PJ_XYZT,
             :uvwt, PJ_UVWT,
             :lpzt, PJ_LPZT,
             :geod, PJ_GEOD,
             :opk, PJ_OPK,
             :enu, PJ_ENU,
             :xyz, PJ_XYZ,
             :uvw, PJ_UVW,
             :lpz, PJ_LPZ,
             :xy,  PJ_XY,
             :uv, PJ_UV,
             :lp, PJ_LP

https://github.com/cfis/proj4rb/blob/master/lib/api/api_5_0.rb#L128C1-L141C24

Since I don't know anything about the ARM64 ABI, I asked Claude to see it could work out what was going on and it pointed to this code:

https://github.com/ffi/ffi/blob/master/ext/ffi_c/StructLayout.c#L558

const ffi_type *alignment_types[] = { &ffi_type_sint8, &ffi_type_sint16, &ffi_type_sint32, &ffi_type_sint64,
                                          &ffi_type_float, &ffi_type_double, &ffi_type_longdouble, NULL };
    StructLayout* layout;
    ffi_type *t = NULL;
    int count, i;

    TypedData_Get_Struct(self, StructLayout, &rbffi_struct_layout_data_type, layout);

    for (i = 0; alignment_types[i] != NULL; ++i) {
        if (alignment_types[i]->alignment == layout->align) {
            t = (ffi_type *) alignment_types[i];
            break;
        }
    }

Notice that my union has all floats, but ffi_type_sint64 (alignment 8) appears before ffi_type_double (alignment 8) in the array. So ffi describes this to ibffi as [sint64, sint64, sint64, sint64] instead of [double, double, double, double].

The problem is on ARM64 (aarch64), the calling convention for passing and returning structs by value depends on whether the type is a Homogeneous Floating-point Aggregate (HFA). HFAs are passed in floating-point registers (d0-d3), while non-HFAs use integer registers or memory.

This doesn't mater on x86_64 because both type classes use the same register/stack passing mechanism. On ARM64 it causes a silent ABI mismatch: libffi passes the union in integer registers while the C function expects it in floating-point registers, resulting in garbage values (typically ~2.48e-314, i.e. denormalized near-zero doubles read from uninitialized FP registers).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions