23106 – Division followed by modulo generates longer machine code than vice versa

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 23106 - Division followed by modulo generates longer machine code than vice versa

Summary: Division followed by modulo generates longer machine code than vice versa

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-04-02 07:43 PDT by Ed Schouten
Modified:	2015-05-10 15:08 PDT (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ed Schouten 2015-04-02 07:43:24 PDT

Consider the following piece of C code:

#include <stdint.h>

struct tv {
  int64_t tv_sec;
  int32_t tv_usec;
};

void convert1(uint64_t ts, struct tv *tv) {
  tv->tv_sec = ts / 1000000000;
  tv->tv_usec = (ts % 1000000000) / 1000;
}

void convert2(uint64_t ts, struct tv *tv) {
  ts /= 1000;
  tv->tv_sec = ts / 1000000;
  tv->tv_usec = ts % 1000000;
}

Essentially they are functions that convert a UNIX timestamp in nanoseconds to a struct timeval-like structure (with microseconds precision). Both functions should be identical.

Anyway, if I compare the machine code generated by Clang r233700 with -O3, it generates the following machine code:

0000000000000000 <convert1>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 f8                mov    %rdi,%rax
   7:   48 c1 e8 09             shr    $0x9,%rax
   b:   48 b9 53 5a 9b a0 2f    mov    $0x44b82fa09b5a53,%rcx
  12:   b8 44 00 
  15:   48 f7 e1                mul    %rcx
  18:   48 c1 ea 0b             shr    $0xb,%rdx
  1c:   48 89 16                mov    %rdx,(%rsi)
  1f:   48 69 c2 00 ca 9a 3b    imul   $0x3b9aca00,%rdx,%rax
  26:   48 29 c7                sub    %rax,%rdi
  29:   48 c1 ef 03             shr    $0x3,%rdi
  2d:   48 b9 cf f7 53 e3 a5    mov    $0x20c49ba5e353f7cf,%rcx
  34:   9b c4 20 
  37:   48 89 f8                mov    %rdi,%rax
  3a:   48 f7 e1                mul    %rcx
  3d:   48 c1 ea 04             shr    $0x4,%rdx
  41:   89 56 08                mov    %edx,0x8(%rsi)
  44:   5d                      pop    %rbp
  45:   c3                      retq   

0000000000000000 <convert2>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 f8                mov    %rdi,%rax
   7:   48 c1 e8 03             shr    $0x3,%rax
   b:   48 b9 cf f7 53 e3 a5    mov    $0x20c49ba5e353f7cf,%rcx
  12:   9b c4 20 
  15:   48 f7 e1                mul    %rcx
  18:   48 89 d1                mov    %rdx,%rcx
  1b:   48 c1 e9 04             shr    $0x4,%rcx
  1f:   48 c1 ef 09             shr    $0x9,%rdi
  23:   48 ba 53 5a 9b a0 2f    mov    $0x44b82fa09b5a53,%rdx
  2a:   b8 44 00 
  2d:   48 89 f8                mov    %rdi,%rax
  30:   48 f7 e2                mul    %rdx
  33:   48 c1 ea 0b             shr    $0xb,%rdx
  37:   48 89 16                mov    %rdx,(%rsi)
  3a:   48 ba db 34 b6 d7 82    mov    $0x431bde82d7b634db,%rdx
  41:   de 1b 43 
  44:   48 89 c8                mov    %rcx,%rax
  47:   48 f7 e2                mul    %rdx
  4a:   48 c1 ea 12             shr    $0x12,%rdx
  4e:   69 c2 40 42 0f 00       imul   $0xf4240,%edx,%eax
  54:   29 c1                   sub    %eax,%ecx
  56:   89 4e 08                mov    %ecx,0x8(%rsi)
  59:   5d                      pop    %rbp
  5a:   c3

As a 30% increase in code size is not negligible, I thought it would make sense to file a bug. Maybe there room for an optimization here?