-
Notifications
You must be signed in to change notification settings - Fork 342
Description
For a long time, some of the proj4rb tests fail on MacOS on ARM64 silicon. The failures are like this:
1) Failure:
ConversionTest#test_pipeline [test/conversion_test.rb:276]:
Expected |1450880.2910605003 - 2.0e-323| (1450880.2910605003) to be <= 0.001.
See See https://github.com/cfis/proj4rb/actions/runs/22570514174/job/65376882406.
This looks like some ABI problem to me, so I finally took a look. The code in question uses this Union:
class PJ_COORD < FFI::Union
layout :v, [:double, 4],
:xyzt, PJ_XYZT,
:uvwt, PJ_UVWT,
:lpzt, PJ_LPZT,
:geod, PJ_GEOD,
:opk, PJ_OPK,
:enu, PJ_ENU,
:xyz, PJ_XYZ,
:uvw, PJ_UVW,
:lpz, PJ_LPZ,
:xy, PJ_XY,
:uv, PJ_UV,
:lp, PJ_LPhttps://github.com/cfis/proj4rb/blob/master/lib/api/api_5_0.rb#L128C1-L141C24
Since I don't know anything about the ARM64 ABI, I asked Claude to see it could work out what was going on and it pointed to this code:
https://github.com/ffi/ffi/blob/master/ext/ffi_c/StructLayout.c#L558
const ffi_type *alignment_types[] = { &ffi_type_sint8, &ffi_type_sint16, &ffi_type_sint32, &ffi_type_sint64,
&ffi_type_float, &ffi_type_double, &ffi_type_longdouble, NULL };
StructLayout* layout;
ffi_type *t = NULL;
int count, i;
TypedData_Get_Struct(self, StructLayout, &rbffi_struct_layout_data_type, layout);
for (i = 0; alignment_types[i] != NULL; ++i) {
if (alignment_types[i]->alignment == layout->align) {
t = (ffi_type *) alignment_types[i];
break;
}
}Notice that my union has all floats, but ffi_type_sint64 (alignment 8) appears before ffi_type_double (alignment 8) in the array. So ffi describes this to ibffi as [sint64, sint64, sint64, sint64] instead of [double, double, double, double].
The problem is on ARM64 (aarch64), the calling convention for passing and returning structs by value depends on whether the type is a Homogeneous Floating-point Aggregate (HFA). HFAs are passed in floating-point registers (d0-d3), while non-HFAs use integer registers or memory.
This doesn't mater on x86_64 because both type classes use the same register/stack passing mechanism. On ARM64 it causes a silent ABI mismatch: libffi passes the union in integer registers while the C function expects it in floating-point registers, resulting in garbage values (typically ~2.48e-314, i.e. denormalized near-zero doubles read from uninitialized FP registers).