Why 16-byte alignment for `long double`?

64 bit architecture like x86-64 have word size of 64bits. In this case, if a memory access crosses over the word boundary, then it will require double the time to access data. So alignment is required. - This is what I know. Correct me if I am wrong.

Now, GCC uses 16 byte alignment (msvc atleast uses 8 byte alignment) for long double whose non-padding size is 10 bytes. But anyways, with 8 byte alignment it requires 2 read cycles and it is the same case with 16 byte alignment. So why stricter 16 byte alignment? What is the purpose of alignment other than that I mentioned above?

Also, in fact, since the non-padding part of long double (the 80-bit x87 extended FP) is 10 bytes, actually 4 byte alignment is sufficient for that. In this case also, it can read data within 2 read cycles (either 4-6 or 8-2). So, also explain where this assumption has gone wrong.

(The actual sizeof(long double) is 12 in the i386 System V ABI, 16 in x86-64 System V. Multiples of their respective alignof() of 4 and 16)

edited Jul 10, 2021 at 18:48

Peter Cordes

381k53 gold badges759 silver badges1k bronze badges

asked Jul 10, 2021 at 6:02

Sourav Kannantha B

3,4541 gold badge23 silver badges58 bronze badges

3

x86-64 doesn't have a "word size", that's not a meaningful concept for x86, which can load/store any power-of-2 width from 1 byte to 32 bytes (or 64 with AVX-512 capable CPUs), with near-equal performance as long as the load doesn't cross a 64-byte cache line boundary.

Peter Cordes
– Peter Cordes

2021-07-10 18:50:31 +00:00
Commented Jul 10, 2021 at 18:50
1

Probably because older CPUs could only do 16-byte load/store efficiently when it was naturally aligned. Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?. 80-bit x87 is slow anyway, so it's a somewhat questionable decision to use that much extra space in arrays, although fld m80 does decode into a 2-byte and an 8-byte load, so 8 or 16 byte alignment are both sufficient to avoid cache-line splits in either of the halves. But only if the size is 16 bytes, so you might as well make the align match the size for SSE copying.

Peter Cordes
– Peter Cordes

2021-07-10 18:55:12 +00:00
Commented Jul 10, 2021 at 18:55
2

Intel Optimization Manual recommends a 16 byte alignment for 80bit long double, but it does not explain why or what the impact is. My quick experiments showed no impact of (mis)alignment, only of crossing cache line boundaries, as expected.

user555045
– user555045

2021-07-10 19:06:29 +00:00
Commented Jul 10, 2021 at 19:06
1

Re: the concept of a "word": see Weird data sizes? and Does Word length == number of bits transferred between memory and CPU per access?, and my longish answer at How does the CPU reads a double value? re: how CPUs access memory through cache. Also What's the actual effect of successful unaligned accesses on x86? / How can I accurately benchmark unaligned access speed on x86_64

Peter Cordes
– Peter Cordes

2021-07-10 21:06:59 +00:00
Commented Jul 10, 2021 at 21:06
2

Aligning each object to a multiple of its size is the easiest way to ensure that no object crosses a cache line boundary.

prl
– prl

2021-07-10 22:12:47 +00:00
Commented Jul 10, 2021 at 22:12

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why 16-byte alignment for `long double`?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked