Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Conversation

@alejandro-colomar
Copy link
Contributor

@alejandro-colomar alejandro-colomar commented Jun 10, 2022

That allows the linker to put them in read-only memory, and
optimize program startup time.

I also merged identical definitions into a single one, which the
linker would have done anyway, but that way we reduce source code
duplication.

See also: https://www.akkadia.org/drepper/dsohowto.pdf 2.4.3
Closes: #720

@ac000
Copy link
Member

ac000 commented Jun 10, 2022

I'm assuming it's called nxt_week and not nxt_day as it's matching tm.tm_wday (week day)?

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 10, 2022

I'm assuming it's called nxt_week and not nxt_day as it's matching tm.tm_wday (week day)?

@ac000
Now you ask, I don't know why it was called that way. I just kept the old name. I'd maybe call it nxt_wday now?

@hongzhidao
Copy link
Contributor

The patch does two things.

  1. Avoid relocation modifying that's done by the linker.
  2. Removed duplicated code about a few same variables.

With (1), from the point of readability, the original way is better. And the way is often used.
With (2), the patch introduced two global scope variables. We should introduce global variables as less as possible.

But there is no issue with the original way, its code is clear, and it hasn't changed in a long time.
Compared to their cons and prods, I'm not sure if it's good to do the refactoring.

If similar operations are required in the future, how to use them?
For example, we must ask devs to use static const char s[][n] instead of const char *s[]?
And if there are similar duplicated variables, do we have to refactor them out to global variables?

Both of the ways are fine for me, what I'm afraid of is it introduces more rules.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 11, 2022

The patch does two things.

1. Avoid relocation modifying that's done by the linker.

Yes. In the case of a standalone program as unitd(8) (as opposed to a shared library), I'm not sure who does the job of initializing the array of pointers; I'm not sure if it's ld(1) or who does it. If it's at runtime, we have a startup performance problem. @ac000 do you know the answer to this?

With (1), from the point of readability, the original way is better.

I think the readability of const char *strings[] and const char strings[][4] should be mostly the same. I agree that having to write the 4 is something I'm not entirely happy with; especially since C has a feature (or bug, depending on your needs) that allows unterminated strings as char buf[3] = "asd";. But overall, making sure we get the size right, I think readability should not be a problem here.

2. Removed duplicated code about a few same variables.

With (2), the patch introduced two global scope variables. We should introduce global variables as less as possible.

Global variables are normally undesired for many reasons.

For one (applicable here), they pollute the global namespace, but so do functions, and I don't think we might want nxt_month or nxt_wday for something else.

But the most common reason to avoid globals is that they can change value in places that one wouldn't expect, since they can be called anywhere in the code (not volatile, but almost). But since this is a const global, it's stored in the read-only section of the program, and should be fine.

But there is no issue with the original way, its code is clear, and it hasn't changed in a long time. Compared to their cons and prods, I'm not sure if it's good to do the refactoring.

There are two issues actually:

  • Uses more memory (needs arrays for all of the strings, and then another array to store the pointers). Exactly (if I'm correct, and I think I am) 96 bytes more than the array of arrays in the case of the months. 96 comes from 12 * 8, since the array of pointers needs to store 12 pointers (of 8 bytes each) for the 12 strings. Then another 7 * 8 = 56 bytes for nxt_wday. Not a huge thing, but we could avoid that. And I suspect that even though the string literals are shared thanks to linker optimizations, the array of pointers is probably not shared (not sure about this either), so maybe we need to multiply this by the times the arrays appear in code. In the worst case, that sums close to 1 KiB, which is clearly not a good thing.

  • It slows startup (I'm not sure about this; maybe since it's a standalone program, the static linker (ld(1)) can create the array of pointers). Using const char *const strings[] might still be a good move (but I think the array of arrays is the best move).

If similar operations are required in the future, how to use them? For example, we must ask devs to use const char s[][n] instead of const char *s[]?

I'd say yes. The array of arrays reduces code size, and maybe also startup time. Not a bad thing if the only bad thing is getting dev's eyes used to it, I think. :)

And if an array of pointers is preferred, we should use const char *const s[], to help the array of pointers be read-only too (although I'm not sure it's possible).

And if there are similar duplicated variables, do we have to refactor them out to global variables?

I'd also prefer that, yes; but only for read-only things. Global variables are bad. Global constants are good.

Both of the ways are fine for me, what I'm afraid of is it introduces more rules.

Yup. I hope they are not complex ones.

I opened the following question to help clarify what happens in this case: https://software.codidact.com/posts/286578

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 11, 2022

Some experiment seems to confirm my theory:

$ git switch master
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
   text	   data	    bss	    dec	    hex	filename
 374088	  29640	   1224	 404952	  62dd8	build/unitd
$ git switch const_array
$ git clean -dffx
$ ./configure
$ make -j
$ size build/unitd
   text	   data	    bss	    dec	    hex	filename
 375266	  29000	   1224	 405490	  62ff2	build/unitd

The numbers don't lie.

I'm actually surprised that the total binary size is larger after this patch. But the good thing is that the extra size goes to the text section. The data section is smaller by 0.6 KiB (wiped 2% of the data section in a single no-op patch; I'm proud of that xD), which is a considerable improvement.

I'll try to use static const char *const str just to see what it does.

Edit: making it static const char *const doesn't improve anything (makes sense):

$ size build/unitd
   text	   data	    bss	    dec	    hex	filename
 374088	  29640	   1224	 404952	  62dd8	build/unitd

Edit2:

More detailed sizes:

master:

$ size -A build/unitd   
build/unitd  :
section                 size     addr
.interp                   28      736
.note.gnu.build-id        36      764
.note.ABI-tag             32      800
.gnu.hash               2512      832
.dynsym                11040     3344
.dynstr                 7041    14384
.gnu.version             920    21426
.gnu.version_r           336    22352
.rela.dyn              28128    22688
.rela.plt               3720    50816
.init                     23    57344
.plt                    2496    57376
.plt.got                   8    59872
.text                 219793    59888
.fini                      9   279684
.rodata                43490   282624
.eh_frame_hdr           8164   326116
.eh_frame              46312   334280
.tbss                    560   386544
.init_array                8   386544
.fini_array                8   386552
.data.rel.ro           14192   386560
.dynamic                 560   400752
.got                      88   401312
.got.plt                1264   401408
.data                  13520   402688
.bss                     664   416224
.comment                  39        0
.debug_aranges          4656        0
.debug_info          1523680        0
.debug_abbrev         100266        0
.debug_line           292434        0
.debug_str             68973        0
.debug_loc            665566        0
.debug_ranges          22384        0
Total                3082950

const_array:

$ size -A build/unitd
build/unitd  :
section                 size     addr
.interp                   28      736
.note.gnu.build-id        36      764
.note.ABI-tag             32      800
.gnu.hash               2520      832
.dynsym                11088     3352
.dynstr                 7071    14440
.gnu.version             924    21512
.gnu.version_r           336    22440
.rela.dyn              26472    22776
.rela.plt               3720    49248
.init                     23    53248
.plt                    2496    53280
.plt.got                   8    55776
.text                 221921    55792
.fini                      9   277716
.rodata                43938   278528
.eh_frame_hdr           8180   322468
.eh_frame              46464   330648
.tbss                    560   383088
.init_array                8   383088
.fini_array                8   383096
.data.rel.ro           13552   383104
.dynamic                 560   396656
.got                      88   397216
.got.plt                1264   397312
.data                  13520   398592
.bss                     664   412128
.comment                  39        0
.debug_aranges          4704        0
.debug_info          1526026        0
.debug_abbrev         100821        0
.debug_line           294460        0
.debug_str             69162        0
.debug_loc            673514        0
.debug_ranges          22384        0
Total                3096600

And a diff, which is more interesting (removing the addresses, of course):

$ diff -u master const_array
--- master	2022-06-12 00:12:39.011952337 +0200
+++ const_array	2022-06-12 00:13:07.047952166 +0200
@@ -4,36 +4,36 @@
 .interp                   28
 .note.gnu.build-id        36
 .note.ABI-tag             32
-.gnu.hash               2512
-.dynsym                11040
-.dynstr                 7041
-.gnu.version             920
+.gnu.hash               2520
+.dynsym                11088
+.dynstr                 7071
+.gnu.version             924
 .gnu.version_r           336
-.rela.dyn              28128
+.rela.dyn              26472
 .rela.plt               3720
 .init                     23
 .plt                    2496
 .plt.got                   8
-.text                 219793
+.text                 221921
 .fini                      9
-.rodata                43490
-.eh_frame_hdr           8164
-.eh_frame              46312
+.rodata                43938
+.eh_frame_hdr           8180
+.eh_frame              46464
 .tbss                    560
 .init_array                8
 .fini_array                8
-.data.rel.ro           14192
+.data.rel.ro           13552
 .dynamic                 560
 .got                      88
 .got.plt                1264
 .data                  13520
 .bss                     664
 .comment                  39
-.debug_aranges          4656
-.debug_info          1523680
-.debug_abbrev         100266
-.debug_line           292434
-.debug_str             68973
-.debug_loc            665566
+.debug_aranges          4704
+.debug_info          1526026
+.debug_abbrev         100821
+.debug_line           294460
+.debug_str             69162
+.debug_loc            673514
 .debug_ranges          22384
-Total                3082950
+Total                3096600

@ac000
Copy link
Member

ac000 commented Jun 12, 2022

Fascinating stuff!

Whether this helps or not, I'll join the party!

Lets cut it down to the bare bones.

a.c

const char *wday[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
$ gcc -S a.c
$ cat a.s
        .file   "a.c"
        .text
        .globl  wday
        .section        .rodata
.LC0:
        .string "Sun"
.LC1:
        .string "Mon"
.LC2:
        .string "Tue"
.LC3:
        .string "Wed"
.LC4:
        .string "Thu"
.LC5:
        .string "Fri"
.LC6:
        .string "Sat"
        .data
        .align 32
        .type   wday, @object
        .size   wday, 56
wday:
        .quad   .LC0
        .quad   .LC1
        .quad   .LC2
        .quad   .LC3
        .quad   .LC4
        .quad   .LC5
        .quad   .LC6
        .ident  "GCC: (GNU) 12.1.1 20220507 (Red Hat 12.1.1-1)"
        .section        .note.GNU-stack,"",@progbits

So we can see the seven strings in the .rodata section and the seven pointers to those strings in the .data section and the wday variable is also in .data.

$ gcc -c -o a.o a.c
$ size -A a.o
a.o  :
section              size   addr
.text                   0      0
.data                  56      0
.bss                    0      0
.rodata                28      0
.comment               47      0
.note.GNU-stack         0      0
.note.gnu.property     48      0
Total                 179

So yeah, .data is our seven pointers and .rodata is the seven strings.

a2.c

const char wday[][4] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
$ gcc -S a2.c
$ cat a2.s
        .file   "a2.c"
        .text
        .globl  wday
        .section        .rodata
        .align 16
        .type   wday, @object
        .size   wday, 28
wday:
        .string "Sun"
        .string "Mon"
        .string "Tue"
        .string "Wed"
        .string "Thu"
        .string "Fri"
        .string "Sat"
        .ident  "GCC: (GNU) 12.1.1 20220507 (Red Hat 12.1.1-1)"
        .section        .note.GNU-stack,"",@progbits

So this time we just have the seven strings in .rodata along with the wday variable.

$ gcc -c -o a2.o a2.c
$ size -A a2.o
a2.o  :
section              size   addr
.text                   0      0
.data                   0      0
.bss                    0      0
.rodata                28      0
.comment               47      0
.note.GNU-stack         0      0
.note.gnu.property     48      0
Total                 123

And just for completeness here's how 'const char *const wday[] =' changes things

$ gcc -S aa.c
$ cat aa.s
        .file   "aa.c"
        .text
        .globl  wday
        .section        .rodata
.LC0:
        .string "Sun"
.LC1:
        .string "Mon"
.LC2:
        .string "Tue"
.LC3:
        .string "Wed"
.LC4:
        .string "Thu"
.LC5:
        .string "Fri"
.LC6:
        .string "Sat"
        .align 32
        .type   wday, @object
        .size   wday, 56
wday:
        .quad   .LC0
        .quad   .LC1
        .quad   .LC2
        .quad   .LC3
        .quad   .LC4
        .quad   .LC5
        .quad   .LC6
        .ident  "GCC: (GNU) 12.1.1 20220507 (Red Hat 12.1.1-1)"
        .section        .note.GNU-stack,"",@progbits

So you can see that the wday variable has moved into the .rodata section.

$ gcc -c -o aa.o aa.c
$ size -A aa.o
aa.o  :
section              size   addr
.text                   0      0
.data                   0      0
.bss                    0      0
.rodata                88      0
.comment               47      0
.note.GNU-stack         0      0
.note.gnu.property     48      0
Total                 183

.rodata is now 56 + 28 for pointers and strings + 4 bytes padding.

Compiling the 'const char wday[][4] =' version as a shared object doesn't really change anything while compiling the 'const char *wday[] =' as a shared object seems to place the wday variable and pointers into a relocation section, so I guess that matches up with Ulrich's paper.

So while I agree that char *wday{} seems nicer, I think in this case wday[][4] (and the same with month) is worth doing, Alex showed a nice saving, all the strings are the same length so there's no wasted memory there either.

As for making them globals and removing duplicated code. Of course generally you want to minimize duplicated code and whilst sometimes it may be warranted, I don't think that applies in this case. And as Alex says they will be marked const and placed in read-only memory and the compiler will nosily complain if some code tries to modify them.

@hongzhidao
Copy link
Contributor

Thanks!

In my feeling, it's a good thing to make devs happy. If the patch is to introduce a new function, it's fine to use [][n] and global helper variables as they like. There are no problems at all, both their ways are fine.

Since it's a refactoring but not a new function, we need to care about its cons and pros.
About the size:

const char *wday[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

The size is 56: sizeof(char *) * 7

const char wday[][4] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

The size is 28: = 4 * 7
Since we want to optimize performance, sometimes we have to lose a bit of readability. But actually, it's ok compared to the performance. Just take a look at another way.

diff -r 40aca190c76e src/nxt_router.c
--- a/src/nxt_router.c  Wed Jun 08 13:12:51 2022 +0100
+++ b/src/nxt_router.c  Sun Jun 12 09:22:41 2022 +0800
@@ -3821,8 +3821,9 @@ nxt_router_access_log_date(u_char *buf,
     u_char  sign;
     time_t  gmtoff;

-    static const char  *month[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun",
-                                    "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
+    static const char  *month = "Jan" "\0" "Feb" "\0" "Mar" "\0" "Apr" "\0"
+                                "May" "\0" "Jun" "\0" "Jul" "\0" "Aug" "\0"
+                                "Sep" "\0" "Oct" "\0" "Nov" "\0" "Dec" "\0";

     gmtoff = nxt_timezone(tm) / 60;

@@ -3835,7 +3836,7 @@ nxt_router_access_log_date(u_char *buf,
     }

     return nxt_sprintf(buf, buf + size, format,
-                       tm->tm_mday, month[tm->tm_mon], tm->tm_year + 1900,
+                       tm->tm_mday, &month[tm->tm_mon * 4], tm->tm_year + 1900,
                        tm->tm_hour, tm->tm_min, tm->tm_sec,
                        sign, gmtoff / 60, gmtoff % 60);
 }

The size would be only 8. I would say there are one more ways to improve performance.

With the duplicated code. There is also another way.

/* some header file */
static struct {
        const char *week[7];
        const char *month[12];
        int mday[12];

    } nxt_{some naming} = {
        { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" },

        { "Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" },

        { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
    };

We should put all the related data into only one place, maybe it's ok to just in one .h file not .c file.
But we need to find a good naming, and which file to put (I'm afraid that nxt_parse.h is not a good place).
And I think we can find a way to eliminate nxt_controller_date(). It needs more work.
If the nxt_week is only used in one place, we will need to think of changing it into a local variable again.
I don't think there is a maintenance problem currently for the two variables.

Back to the previous point, if it's a patch to add new features, it's good. But it's a refactoring, we need to compare its cons and prods.

@ac000
Copy link
Member

ac000 commented Jun 12, 2022

+    static const char  *month = "Jan" "\0" "Feb" "\0" "Mar" "\0" "Apr" "\0"
+                                "May" "\0" "Jun" "\0" "Jul" "\0" "Aug" "\0"
+                                "Sep" "\0" "Oct" "\0" "Nov" "\0" "Dec" "\0";

Heh, OK, this one I can definitely say I'm not a fan of!.

     return nxt_sprintf(buf, buf + size, format,
-                       tm->tm_mday, month[tm->tm_mon], tm->tm_year + 1900,
+                       tm->tm_mday, &month[tm->tm_mon * 4], tm->tm_year + 1900,

And here we now have a magic number that you need to remember and no, I don't think making it a #define would be any better.

The size would be only 8. I would say there are one more ways to improve performance.

Well 8 + the length of the string which is 48 (actually 49, there is an extra. '.string ""' entry at the end) so 57. Compared to 48 for month[][4]

$ size -A s.o
s.o  :
section              size   addr
.text                   0      0
.data                   8      0
.bss                    0      0
.rodata                49      0
.comment               47      0
.note.GNU-stack         0      0
.note.gnu.property     48      0
Total                 152

With the duplicated code. There is also another way.

/* some header file */
static struct {
        const char *week[7];
        const char *month[12];
        int mday[12];

    } nxt_{some naming} = {
        { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" },

        { "Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" },

        { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
    };

OK, so I like that better than the above!, Although it does take quite a bit more memory (than if it were just arrays of arrays). Even without the mday field we get

$ size -A m.o
m.o  :
section              size   addr
.text                   0      0
.data                 152      0
.bss                    0      0
.rodata                76      0
.comment               47      0
.note.GNU-stack         0      0
.note.gnu.property     48      0
Total                 323

.data is the 19 pointers for the day/months.
.rodata is the day/month strings.

We should put all the related data into only one place, maybe it's ok to just in one .h file not .c file. But we need to find a good naming, and which file to put (I'm afraid that nxt_parse.h is not a good place). And I think we can find a way to eliminate nxt_controller_date(). It needs more work. If the nxt_week is only used in one place, we will need to think of changing it into a local variable again. I don't think there is a maintenance problem currently for the two variables.

Back to the previous point, if it's a patch to add new features, it's good. But it's a refactoring, we need to compare its cons and prods.

Perhaps this should be split into two patches, one to do the de-duplication and one to do the optimisation, or not, depending on which way the first patch is done.

@hongzhidao
Copy link
Contributor

int mday[12];

It's inspired by nxt_timer_parse.c.

nxt_time_t
nxt_time_parse(const u_char *p, size_t len)
{
   ...

    static nxt_int_t  mday[12] = {
        31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
    };

And here we now have a magic number that you need to remember and no

Well, it can be replaced with strlen(...).

Well 8 + the length of the string which is 48 (actually 49, there is an extra. '.string ""' entry at the end) so 57. Compared to 48 for month[][4]

There are four types:

static const char *wday[] = {"Sun", "Mon", "Tue"};
static const char wday2[][4] = {"Sun", "Mon", "Tue"};
static const char *wday3 = "Sun\0" "Mon\0" "Tue";
static const char wday4[] = "Sun\0" "Mon\0" "Tue";

For me all of them are fine, it's important to make devs happy with clear stuff. I would not ask them to use only one way.

Currently the cons and pods:
pods:
a) The data section is smaller by 0.6 KiB
b) duplicated 2-3 same variables.

cons:
a) have to introduce a refactoring
b) introduce one or more global varibles.
c) the original way works well and is clear but has to be changed.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

Re: @ac000

Fascinating stuff!

Whether this helps or not, I'll join the party!

It is =)
And yes it helps!

Lets cut it down to the bare bones.

[...]

So you can see that the wday variable has moved into the .rodata section [with the addition of const].

Just a catch: as we use compiler optimizations, and the variables are local to a function, the compiler is being smart enough to act as if we were using const char *const, but this would still be interesting in global (extern) variables.

As expected, const helps (in unoptimized code, or in globals, which cannot be optimized).

[...]

Compiling the 'const char wday[][4] =' version as a shared object doesn't really change anything while compiling the 'const char *wday[] =' as a shared object seems to place the wday variable and pointers into a relocation section, so I guess that matches up with Ulrich's paper.

Interesting.

So while I agree that char *wday{} seems nicer, I think in this case wday[][4] (and the same with month) is worth doing, Alex showed a nice saving, all the strings are the same length so there's no wasted memory there either.

Actually, I changed another string array which does introduce some wasted 0s, but for some reason we're not using that code yet: *nxt_unit_log_levels[] from nxt_unit.c (or the compiler is optimizing it so heavily that I don't notice any changes in there).

Re: @hongzhidao

About the size:

const char *wday[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

The size is 56: sizeof(char *) * 7

And you're forgetting the 28 bytes of the strings themselves. So this is 56 + 28 = 84 B.

const char wday[][4] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

The size is 28: = 4 * 7 Since we want to optimize performance, sometimes we have to lose a bit of readability. But actually, it's ok compared to the performance. Just take a look at another way.

diff -r 40aca190c76e src/nxt_router.c
--- a/src/nxt_router.c  Wed Jun 08 13:12:51 2022 +0100
+++ b/src/nxt_router.c  Sun Jun 12 09:22:41 2022 +0800
@@ -3821,8 +3821,9 @@ nxt_router_access_log_date(u_char *buf,
     u_char  sign;
     time_t  gmtoff;

-    static const char  *month[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun",
-                                    "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
+    static const char  *month = "Jan" "\0" "Feb" "\0" "Mar" "\0" "Apr" "\0"
+                                "May" "\0" "Jun" "\0" "Jul" "\0" "Aug" "\0"
+                                "Sep" "\0" "Oct" "\0" "Nov" "\0" "Dec" "\0";

This is almost literally what the compiler does when I do the array of arrays approach (month[][4]).

But, this code has some issues:

  • Now the string is not aligned to 4, but to 1 (which is the alignment of its members, which are bytes). You need to use _Alignas(4) to not drop performance considerably.
  • You have to manually calculate the offsets (* 4).
  • The code is really ugly. Much worse than the array approach.
 gmtoff = nxt_timezone(tm) / 60;

@@ -3835,7 +3836,7 @@ nxt_router_access_log_date(u_char *buf,
}

 return nxt_sprintf(buf, buf + size, format,
  •                   tm->tm_mday, month[tm->tm_mon], tm->tm_year + 1900,
    
  •                   tm->tm_mday, &month[tm->tm_mon * 4], tm->tm_year + 1900,
                      tm->tm_hour, tm->tm_min, tm->tm_sec,
                      sign, gmtoff / 60, gmtoff % 60);
    

}


The size would be only 8. I would say there are one more ways to improve performance.

You're forgetting about the string literals themselves, which need to be stored in memory. This uses the exact same size as the array of arrays, + 8.

With the duplicated code. There is also another way.

/* some header file */
static struct {
        const char *week[7];
        const char *month[12];
        int mday[12];

    } nxt_{some naming} = {
        { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" },

        { "Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" },

        { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
    };

I like the idea of having a new file for all of this. Not sure about the struct, but I'll have it floating in my mind.

Maybe nxt_calendar.c and nxt_calendar.h would be good names. I'm not inspired to do the naming now. But I'll use that for now.

We should put all the related data into only one place, maybe it's ok to just in one .h file not .c file. But we need to find a good naming, and which file to put (I'm afraid that nxt_parse.h is not a good place).

Yeah, I wasn't very happy with that file.

And I think we can find a way to eliminate nxt_controller_date(). It needs more work.

That'll be for another topic :)

If the nxt_week is only used in one place, we will need to think of changing it into a local variable again. I don't think there is a maintenance problem currently for the two variables.

Okay, but remember you have a patch that's adding another use case for month and I don't remember if it also uses week. Seems like a commonly used variable.

Back to the previous point, if it's a patch to add new features, it's good. But it's a refactoring, we need to compare its cons and prods.

Sure, I'm going to do a more detailed review. Coming soon!

Re: @ac000

Perhaps this should be split into two patches, one to do the de-duplication and one to do the optimisation, or not, depending on which way the first patch is done.

Yes, I'm going to do that. It helps review the performance implications of each small change. I didn't expect this patch to so fascinating when I first wrote it!

Re: @hongzhidao

There are four types:

static const char *wday[] = {"Sun", "Mon", "Tue"};

That's inefficient, as we've talked.

static const char wday2[][4] = {"Sun", "Mon", "Tue"};

That's the most efficient thing to do. And not so ugly, IMO.

static const char *wday3 = "Sun\0" "Mon\0" "Tue";

That's even more inefficient than the first one (*wday[]), due to alignment issues.
It also has serious usability issues.
And ugly as hell.

static const char wday4[] = "Sun\0" "Mon\0" "Tue";

Not very different from 3. Same memory savings as 2, but still has serious alignment and usability issues; and even uglier than 3.


For me all of them are fine, it's important to make devs happy with clear stuff. I would not ask them to use only one way.

The savings are huge. Let's continue discussing about it, but if the final findings are similar to what we've seen until now, I think the minor readability concerns are compensated by the performance improvements. I'd kindly ask devs to get used to seeing strings as char arrays, which is what they've always been underneath :)

I'll publish some very interesting things soon.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

@hongzhidao

Demonstration of alignment issues of 3 and 4 (as numbered in the discussion above):

$ cat string.c
extern const char s[];

const char *foo(int i)
{
	return &s[i * 4];
}
$ cat string.s
	.file	"pointer_string.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	leal	0(,%rdi,4), %eax
	leaq	s(%rip), %rdi
	cltq
	addq	%rdi, %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits
$ cat array_of_strings4.c 
extern const char s[][4];

const char *foo(int i)
{
	return s[i];
}
$ cat array_of_strings4.s 
	.file	"array_of_strings4.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movslq	%edi, %rdi
	leaq	s(%rip), %rax
	leaq	(%rax,%rdi,4), %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits

All of them compiled with cc -O3 -S ...

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

Other important issues to consider:

non-power-of-two array sizes should be rounded up to the next power of two (we waste a little bit of space in exchange for a faster run time); this aligns the inner arrays (i.e., the strings) to power-of-two boundaries:

$ cat array_of_strings7.c
extern const char s[][7];

const char *foo(int i)
{
	return s[i];
}
$ cat array_of_strings7.s
	.file	"array_of_strings.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movslq	%edi, %rdi
	leaq	s(%rip), %rdx
	leaq	0(,%rdi,8), %rax
	subq	%rdi, %rax
	addq	%rdx, %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits
$ cat array_of_strings8.c
extern const char s[][8];

const char *foo(int i)
{
	return s[i];
}
$ cat array_of_strings8.s
	.file	"array_of_strings8.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movslq	%edi, %rdi
	leaq	s(%rip), %rax
	leaq	(%rax,%rdi,8), %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits

@hongzhidao
Copy link
Contributor

hongzhidao commented Jun 12, 2022

Makes sense. Good to know.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

I pushed a set of commits that break this change into very specific changes. All of them have a very detailed analysis of the performance impact they create in our code (and a theoretical rationale that explains it). I hope you enjoy reading them. I enjoyed optimizing this! :)

I hope this documents for posterity that string literals should always be stored in arrays of char (if possible).

The first commit, Added const to string pointers that never change., is not strictly needed, but I kept it to show that I tried all possible improvements, and that no cards are under my sleeve.

@ac000
Copy link
Member

ac000 commented Jun 12, 2022

So I'll summarize my feelings on this.

I'm perfectly happy with the global arrays of arrays for this case. Although I do like the struct idea...

FWIW the patches look good to me.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

So I'll summarize my feelings on this.

I'm perfectly happy with the global arrays of arrays for this case. Although I do like the struct idea...

Oh, yeah, after the travel I forgot some of this. I wanted to test a 5th commit that puts the variables in a struct, per @hongzhidao suggestion, to see how it affects the resulting binary. We may see surprises there. Hold my hand :)

FWIW the patches look good to me.

=)

Constant string pointers that don't change the string to which
they point can use const to help the compiler create optimal code.

This patch doesn't affect at all the current resulting binary,
because since our strings are stored in static variables, the
compiler ha enough information to know that it can make them
const, but it's still a good thing to add it, because the compiler
may not always be able to optimize things, and also readers can
see that the pointers never point to other things.

===

See the sizes of the binary before and after this patch
(`size -B build/unitd > ...`):

$ cat sz.B.ptr
   text	   data	    bss	    dec	    hex	filename
 436246	  29648	   1224	 467118	  720ae	build/unitd
$ cat sz.B.const
   text	   data	    bss	    dec	    hex	filename
 436246	  29648	   1224	 467118	  720ae	build/unitd
If a string is stored as an array, the binary only contains the
string literal.  But if it is stored as a pointer, the binary
needs to contain the string literal _and_ a pointer to it, which
means an extra 8 bytes for every string.

In the case of arrays of strings, the size of the inner arrays
(the strings) has to be the same for all of them, so we need, at
least, the size required by the longest string (strlen + 1).

===

See the sizes of the binary before and after this patch
(`size -B build/unitd > ...`):

$ cat sz.B.const
   text	   data	    bss	    dec	    hex	filename
 436246	  29648	   1224	 467118	  720ae	build/unitd
$ cat sz.B.arr
   text	   data	    bss	    dec	    hex	filename
 434846	  29008	   1224	 465078	  718b6	build/unitd

Both 'text' and 'data' have decreased after this patch.  Good.
text by 1.4 KB (0.3%), and data by 0.6 KB (2%).

===

See a more detailed difference of the sizes
(`size -A build/unitd | sed -E 's/^(\S+\s+\S+)\S+\s+/\1/' > ...`):

$ diff -u sz.A.{const,arr}
--- sz.A.const	2022-06-12 20:32:15.029643211 +0200
+++ sz.A.arr	2022-06-12 20:33:12.187521769 +0200
@@ -9,20 +9,20 @@
 .dynstr                 7151
 .gnu.version             932
 .gnu.version_r           352
-.rela.dyn              27984
+.rela.dyn              26328
 .rela.plt               3744
 .init                     23
 .plt                    2512
 .plt.got                   8
 .text                 264353
 .fini                      9
-.rodata                57626
+.rodata                57882
 .eh_frame_hdr           8220
 .eh_frame              49488
 .tbss                    560
 .init_array                8
 .fini_array                8
-.data.rel.ro           14192
+.data.rel.ro           13552
 .dynamic                 560
 .got                      88
 .got.plt                1272
@@ -30,13 +30,13 @@
 .bss                     664
 .comment                  30
 .debug_aranges          4656
-.debug_info          1610066
+.debug_info          1610114
 .debug_abbrev         112065
-.debug_line           250577
+.debug_line           250579
 .debug_str             67547
 .debug_line_str         4061
 .debug_loclists       282985
 .debug_rnglists        12075
-Total                2811180
+Total                2809190

Ignoring debug sections, we see that:

.rela.dyn    decreased by ~ 1.6 KB (6%)
.rodata      INCREASED by ~ 0.3 KB (0.6%)
.data.rel.ro decreased by ~ 0.6 KB (4%)

This also looks great.  Most of the changes are for good, and
there's a section that increases, but not very much (and .rodata
will decrease again in the last patch of this patch set,
"Merged identical strings into unique extern constant variables.").
Arrays of strings are being stored as arrays of arrays of char.
Since arrays can't have padding, the compiler is forced to create
arrays of arrays of the size we specify.  If that size is not a
power of 2, the compiler is able to align the outer array to the
next power of two, but it can't help with the alignment of the
inner arrays.

To improve performance when getting pointers to a string contained
in an array of strings, we can add some padding ourselves, by
rounding up the array size to a power of two.

Since the strings are not huge, the padding shouldn't cause too
much wasted space, and it's still far from the waste that an array
of pointers would cause.

Although this should have an impact in the binary size, it
doesn't, which makes me think this code is being unused.

===

See the sizes of the binary before and after this patch
(`size -B build/unitd > ...`):

$ cat sz.B.arr
   text	   data	    bss	    dec	    hex	filename
 434846	  29008	   1224	 465078	  718b6	build/unitd
$ cat sz.B.pow
   text	   data	    bss	    dec	    hex	filename
 434846	  29008	   1224	 465078	  718b6	build/unitd

===

Although our code doesn't show an improvement, see the following
experiment, which demonstrates the improvement:

```c
$ cat array_of_strings7.c
extern const char s[][7];

const char *foo(int i)
{
	return s[i];
}
```
```asm
$ cat array_of_strings7.s
	.file	"array_of_strings.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movslq	%edi, %rdi
	leaq	s(%rip), %rdx
	leaq	0(,%rdi,8), %rax
	subq	%rdi, %rax
	addq	%rdx, %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits

```
```c
$ cat array_of_strings8.c
extern const char s[][8];

const char *foo(int i)
{
	return s[i];
}
```
```asm
$ cat array_of_strings8.s
	.file	"array_of_strings8.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movslq	%edi, %rdi
	leaq	s(%rip), %rax
	leaq	(%rax,%rdi,8), %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
	.section	.note.GNU-stack,"",@progbits
```
Although the linker is smart enough to merge most of the
duplicated code, it's better to don't duplicate source code
(unless it has significant advantages).

In this case, we can see that we get a small improvement after
this patch.

===

See the sizes of the binary before and after this patch
(`size -B build/unitd > ...`):

$ cat sz.B.pow
   text	   data	    bss	    dec	    hex	filename
 434846	  29008	   1224	 465078	  718b6	build/unitd
$ cat sz.B.ext
   text	   data	    bss	    dec	    hex	filename
 434590	  29008	   1224	 464822	  717b6	build/unitd

'text' has decrased by 0.35 KB (0.08%).

===

See a more detailed difference of the sizes
(`size -A build/unitd | sed -E 's/^(\S+\s+\S+)\S+\s+/\1/' > ...`):

$ diff -u sz.A.{pow,ext}
--- sz.A.pow	2022-06-12 21:06:19.925747567 +0200
+++ sz.A.ext	2022-06-12 21:06:19.925747567 +0200
@@ -16,7 +16,7 @@
 .plt.got                   8
 .text                 264353
 .fini                      9
-.rodata                57882
+.rodata                57626
 .eh_frame_hdr           8220
 .eh_frame              49488
 .tbss                    560
@@ -29,12 +29,12 @@
 .data                  13520
 .bss                     664
 .comment                  30
-.debug_aranges          4656
-.debug_info          1610114
-.debug_abbrev         112065
-.debug_line           250579
-.debug_str             67547
-.debug_line_str         4061
-.debug_loclists       282985
+.debug_aranges          4688
+.debug_info          1611151
+.debug_abbrev         112343
+.debug_line           250702
+.debug_str             67555
+.debug_line_str         4095
+.debug_loclists       282939
 .debug_rnglists        12075
-Total                2809190
+Total                2810400

Ignoring debug sections, we see that:

.rodata decreased by 0.35 KB (6%).

Which is the exact amount that it increased in the previous patch
"Storing string literals as 'char []' instead of 'char *'.".

It's good that after the complete patch set transforming strings
no section (other than the debug ones) has seen a net increase.
This reduces the binary size, probably because the compiler can
derivate the address of one of the variables, from the address of
the other, with an offset from the same structure.

===

See the sizes of the binary before and after this patch
(`size -B build/unitd > ...`):

$ cat sz.B.ext
   text	   data	    bss	    dec	    hex	filename
 434590	  29008	   1224	 464822	  717b6	build/unitd
$ cat sz.B.struct
   text	   data	    bss	    dec	    hex	filename
 434574	  29008	   1224	 464806	  717a6	build/unitd

'text' has decreased by 16 B (0.004%).

This seems to confirm the prediction that the compiler derivates
one address from the other, since nxt_calendar.wday is used in
exactly two places (8 * 2 = 16).

===

See a more detailed difference of the sizes
(`size -A build/unitd | sed -E 's/^(\S+\s+\S+)\S+\s+/\1/' > ...`):

$ diff -u sz.A.{ext,struct}
--- sz.A.ext	2022-06-12 21:06:19.925747567 +0200
+++ sz.A.struct	2022-06-12 23:15:12.114790906 +0200
@@ -14,7 +14,7 @@
 .init                     23
 .plt                    2512
 .plt.got                   8
-.text                 264353
+.text                 264337
 .fini                      9
 .rodata                57626
 .eh_frame_hdr           8220
@@ -30,11 +30,11 @@
 .bss                     664
 .comment                  30
 .debug_aranges          4688
-.debug_info          1611151
-.debug_abbrev         112343
-.debug_line           250702
-.debug_str             67555
+.debug_info          1611450
+.debug_abbrev         112329
+.debug_line           250700
+.debug_str             67570
 .debug_line_str         4095
-.debug_loclists       282939
+.debug_loclists       282961
 .debug_rnglists        12075
-Total                2810400
+Total                2810704

Ignoring debug sections, we see that:

.text decreased by 16 B (0.006%).

===

I didn't move mday <src/nxt_time_parse.c> into the structure too,
because it increases the binary size, which looks like it removes
some optimization opportunity (which makes sense, because we're
changing a variable from being static to being extern).
@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 12, 2022

@hongzhidao @ac000

Aaand (suspense drums), yet another surprise: the struct helps remove two addresses from the .text section. So we shrink the binary by another 16 B; not much, but I like it, and it's for free. Thanks for the suggestion!

I didn't move mday into the structure, as that increases binary size (.rodata by 64 B (0.1%)), and makes me suspect that we would be disabling some optimization:

$ diff -u sz.A.{struct,mday}
--- sz.A.struct	2022-06-12 23:15:12.114790906 +0200
+++ sz.A.mday	2022-06-12 23:04:42.055173590 +0200
@@ -16,7 +16,7 @@
 .plt.got                   8
 .text                 264337
 .fini                      9
-.rodata                57626
+.rodata                57690
 .eh_frame_hdr           8220
 .eh_frame              49488
 .tbss                    560
@@ -30,11 +30,11 @@
 .bss                     664
 .comment                  30
 .debug_aranges          4688
-.debug_info          1611450
+.debug_info          1611657
 .debug_abbrev         112329
-.debug_line           250700
+.debug_line           250705
 .debug_str             67570
 .debug_line_str         4095
 .debug_loclists       282961
 .debug_rnglists        12075
-Total                2810704
+Total                2810980

@hongzhidao
Copy link
Contributor

hongzhidao commented Jun 13, 2022

Just a reminder.

There are only two places using week/month. The one is in nxt_router_access_log(), the other is in nxt_http_date().
We will introduce $time_local prepared for custom log format, and the code related to week/month in nxt_router_access_log() will be removed by using custom log format instsead. Anyway, there are two places using them.

But now, we introduced two more files nxt_calendar.h and nxt_calendar.c, we changed the files auto/source, nxt_main.h. And make the functions above depend on global variables instead of local static variables. Especially this place.

#include <nxt_calendar.h>
#include <nxt_unix.h>
#include <nxt_clang.h>
#include <nxt_types.h>

I like the design introduced by Igor, he puts the most basic dependencies at the top, and it has not been changed from the first version. But now the whole project has to depend on one more file. I'm not sure if it's a good idea.

@ac000
Copy link
Member

ac000 commented Jun 13, 2022

version. But now the whole project has to depend on one more file. I'm not sure if it's a good idea.

I'm not sure I see that as a big problem. As the project grows we are likely to add more files, either with new functionality or with existing functionality factored out. So I only see the number of source files increasing.

One thing I do notice is that nxt_main.h includes a lot of header files and then often nxt_main.h is the only included header file in a .c file.

Now I don't know if that's something that will continue or if the plan is to move to a more 'include what you use' style system?

@hongzhidao
Copy link
Contributor

if the plan is to move to a more 'include what you use' style system?

Sorry, I didn't get it, could you show more details?

@ac000
Copy link
Member

ac000 commented Jun 13, 2022

Sure and I think I get what you mean by

... But now the whole project has to depend on one more file.

In that if you change one header file you can end up rebuilding the whole project, perhaps unnecessarily and I suspect that's more a symptom of how the #includes are currently done which could be resolved by something along the lines of include what you use. You don't even need to use the tools, just follow the principles...

@hongzhidao
Copy link
Contributor

hongzhidao commented Jun 13, 2022

You don't even need to use the tools, just follow the principles...

Did you mean to do like below?

  1. split nxt_main.h into a few header files or even eliminate it.
  2. Make the .c files contain as few header files as possible.

Plz correct me if I don't understand correctly.

@alejandro-colomar
Copy link
Contributor Author

Re: @hongzhidao :

There are only two places using week/month. The one is in nxt_router_access_log(), the other is in nxt_http_date(). We will introduce $time_local prepared for custom log format, and the code related to week/month in nxt_router_access_log() will be removed by using custom log format instsead. Anyway, there are two places using them.

I've checked that $time_local is not using wday. So, when/if we remove nxt_router_access_log(), we will only have one place using it. So when we do that, we can check again if it's useful to bring it back to static.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

Re: @hongzhidao :

You don't even need to use the tools, just follow the principles...

Did you mean to do like below?

1. split `nxt_main.h` into a few header files or even eliminate it.

2. Make the .c files contain as few header files as possible.

Plz correct me if I don't understand correctly.

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

It's based on Google's C++ style guide https://google.github.io/styleguide/cppguide.html#Header_Files, and then a tool was developed around it, to check that a piece of code has the correct includes: iwyu(1) https://include-what-you-use.org/

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

@ac000

version. But now the whole project has to depend on one more file. I'm not sure if it's a good idea.

I'm not sure I see that as a big problem. As the project grows we are likely to add more files, either with new functionality or with existing functionality factored out. So I only see the number of source files increasing.

One thing I do notice is that nxt_main.h includes a lot of header files and then often nxt_main.h is the only included header file in a .c file.

Now I don't know if that's something that will continue or if the plan is to move to a more 'include what you use' style system?

There's no plan, but yes, I have the same feeling about the include system. As you can probably guess by my recent contributions to iwyu(1), I have in mind cleaning this up to at least follow the principles, and maybe even use it in our build system :)

@hongzhidao
Copy link
Contributor

@alejandro-colomar

I've checked that $time_local is not using wday. So, when/if we remove nxt_router_access_log(), we will only have one place using it. So when we do that, we can check again if it's useful to bring it back to static.

Did you mean you would do that by the steps?

  1. introduced nxt_calendar.h.
  2. check if nxt_router_access_log doesn't depend on it.
  3. if there is one place using nxt_calendar.h, then consider thinking of replaing global with static again?

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

Do you plan to rewrite the design in how to use header files in the unit project?

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

@hongzhidao

I've checked that $time_local is not using wday. So, when/if we remove nxt_router_access_log(), we will only have one place using it. So when we do that, we can check again if it's useful to bring it back to static.

Did you mean you would do that by the steps?

1. introduced `nxt_calendar.h`.

2. check if `nxt_router_access_log` doesn't depend on it.

3. if there is one place using `nxt_calendar.h`, then consider thinking of replaing `global` with `static` again?

Yes. Unless you have a plan to clean nxt_router_access_log() soon, in which case I may delay this patch set to after that, and then we can skip some steps and start directly with step 3. What are your plans about it?

BTW, month would still be extern (global); I'd only consider changing wday to static.

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

Do you plan to rewrite the design in how to use header files in the unit project?

I wouldn't call that a plan. But I'll have it floating in my mind. If some day I feel like attempting to do it, I'll write some proposal and show it for discussion. It has some benefits, as for example faster build and (especially) rebuild times, a better (more precise) dependency tree (which is the cause of the previous benefit), and also smaller preprocessed files (.i, which we don't use now, but it would be nice to be able to inspect those). But it has a clear drawback: a huge diff. So I'm not sure about it yet. The good thing is that the diff should only change #include lines, so as long as it doesn't touch anything else, and the final binary is identical, it should be easy to review; but it's still a considerable drawback.

If your question is if I'm planning to do it soon, then no.

@hongzhidao
Copy link
Contributor

Yes. Unless you have a plan to clean nxt_router_access_log() soon, in which case I may delay this patch set to after that, and then we can skip some steps. What are your plans about it?

It is three behind the patch you are reviewing now.

changeset:   2070:035b023cac7c
user:        Zhidao HONG <z.hong@f5.com>
date:        Thu Dec 02 12:59:32 2021 +0800
summary:     Log: customizable access log format.

changeset:   2069:81a3f35ef5b6
user:        Zhidao HONG <z.hong@f5.com>
date:        Tue Nov 30 13:22:19 2021 +0800
summary:     Log: split access log from nxt_router.c.

changeset:   2068:b8718527aa17
user:        Zhidao HONG <z.hong@f5.com>
date:        Tue Nov 30 12:57:31 2021 +0800
summary:     HTTP: added more variables.

changeset:   2067:f8bc0b6deed3
user:        Zhidao HONG <z.hong@f5.com>
date:        Fri Dec 03 11:04:13 2021 +0800
summary:     Var: dynamic variables support.

In short, we moved nxt_router_access_log_date() from nxt_router.c into nxt_http_variables.c.

@alejandro-colomar
Copy link
Contributor Author

Yes. Unless you have a plan to clean nxt_router_access_log() soon, in which case I may delay this patch set to after that, and then we can skip some steps. What are your plans about it?

It is three behind the patch you are reviewing now.

changeset:   2070:035b023cac7c
user:        Zhidao HONG <z.hong@f5.com>
date:        Thu Dec 02 12:59:32 2021 +0800
summary:     Log: customizable access log format.

changeset:   2069:81a3f35ef5b6
user:        Zhidao HONG <z.hong@f5.com>
date:        Tue Nov 30 13:22:19 2021 +0800
summary:     Log: split access log from nxt_router.c.

changeset:   2068:b8718527aa17
user:        Zhidao HONG <z.hong@f5.com>
date:        Tue Nov 30 12:57:31 2021 +0800
summary:     HTTP: added more variables.

changeset:   2067:f8bc0b6deed3
user:        Zhidao HONG <z.hong@f5.com>
date:        Fri Dec 03 11:04:13 2021 +0800
summary:     Var: dynamic variables support.

In short, we moved nxt_router_access_log_date() from nxt_router.c into nxt_http_variables.c.

Ahhh, that's great then! I'll hold this patch set to after that.

@andrey-zelenkov
Copy link
Contributor

Just a few thoughts from me to share:

  1. As far as I understood, patch is going to change existing code style. Some people prefer one code style, some another. So, from my point of view there is a need to provide strong arguments why it should be changed and all other team members should be happy with it (and also why this is more important than other tasks would be great too).
    Also, if you are not sure about current code style then I would recommend asking the authors why it was done that way. I am pretty sure that someone from the old guards such as Igor, Valentin or Maxim (by the way I consider Zhidao as one of them) will be happy to answer. Also, as far as I know nginx and njs projects have quite similar design and code style so it make sense to ask their authors too (as long as we borrow features from them from time to time).

  2. In the beginning of the thread I saw a statement that this patch will reduce start up time. And I expected to see numbers related to our project. Instead I can see many numbers and discussions about binary size. Why is binary size relevant? And why is start-up time a problem at all (or binary size)?

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

Just a few thoughts from me to share:

1. As far as I understood, patch is going to change existing code style. Some people prefer one code style, some another. So, from my point of view there is a need to provide strong arguments why it should be changed and all other team members should be happy with it (and also why this is more important than other tasks would be great too).
   Also, if you are not sure about current code style then I would recommend asking the authors why it was done that way. I am pretty sure that someone from the old guards such as Igor, Valentin or Maxim (by the way I consider Zhidao as one of them) will be happy to answer. Also, as far as I know nginx and njs projects have quite similar design and code style so it make sense to ask their authors too (as long as we borrow features from them from time to time).

I'd be very happy if @igorsysoev , @VBart , and @mar0x would like to comment. And it's one of the main reasons I moved my reviews from the internal review board to the public github (and in some cases the mailing list). I welcome anyone's review.

I think reducing some important sections of the binary, by the order of KB (some sections are reduced by up to 6% of their current size; check commit logs), is a strong argument by itself.

I also pointed to @drepper 's paper, who has an extensive experience optimizing glibc and gcc(1), so I also consider that a strong argument.

Also, it was my opinion (beforehand), that

const char *const  c = "ccc";
const char         *b = "bbb";
const char         a[] = "aaa";

b (the current style) and a (my proposal) are almost equivalent visually.
But of course that was my opinion, and I've seen after this discussion that some people find it uglier than I do.

My question now that I know your aesthetic opinion is: Does it really matter so much?

Let's summarize the net gains of the patch set, stated in the commit logs:

.rela.dyn decreased by 1.6 KB (6%).
.data.rel.ro decreased by 0.6 KB (4%).
.text decreased by 16 B (0.006%).

I didn't count the changes in the debug sections, which increased a bit, but they're debug. If someone wants the best performance, those can be stripped from the binary easily.

2. In the beginning of the thread I saw a statement that this patch will reduce start up time. And I expected to see numbers related to our project.

That's not so easy to measure precisely, and we don't have benchmarks already written, which would be great.

I can try to measure an estimate of program startup, as I'm also curious. That would be very much system-dependent, but still interesting, in lack of other timings.

Instead I can see many numbers and discussions about binary size. Why is binary size relevant?

Binary size, especially in initialized data sections, is directly translated into startup time.

And why is start-up time a problem at all (or binary size)?

It's not a huge problem, especially in a program expected to live for an extended period of time.

But it comes at the small cost of *s -> s[], which I considered negligible, and it has no performance drawbacks, at all.

Small systems such as raspberry pi, might also find it nice to have a slimmer binary and faster startup times.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

@andrey-zelenkov

Okay, so here go some numbers for startup time performance.

Test specs:

CPU:		Intel i7-5775C
RAM:		32 GiB
Linux:		5.17.0-2-amd64 (5.17.6-1+b1)
glibc:		2.33-7
bash(1):	5.1.16(1)-release (5.1-6+b1)
gcc(1):         11.3.0 (11.3.0-3)
OS:		Debian GNU/Linux bookworm/sid
Moon:		98.3%
Temperature:	301 K (ambient)

I applied the following patch for testing only the startup time:

diff --git a/src/nxt_main.c b/src/nxt_main.c
index 26bee873..ef9f451d 100644
--- a/src/nxt_main.c
+++ b/src/nxt_main.c
@@ -16,6 +16,8 @@ main(int argc, char **argv)
 {
     nxt_int_t  ret;
 
+    exit(4);
+
     if (nxt_lib_start("unit", argv, &environ) != NXT_OK) {
         return 1;
     }

And I measured a loop of 1000 executions, 9 times before this patch set, and another 9 after this patch set:

Before:

alx@asus5775:~/src/nginx/unit$ git switch master 
M	src/nxt_main.c
Switched to branch 'master'
Your branch is up to date with 'alx/master'.
alx@asus5775:~/src/nginx/unit$ git clean -dffx >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ ./configure >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ make -j >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.747s
user	0m0.544s
sys	0m0.276s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.751s
user	0m0.554s
sys	0m0.275s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.750s
user	0m0.550s
sys	0m0.277s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.751s
user	0m0.560s
sys	0m0.269s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.645s
user	0m0.470s
sys	0m0.240s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.738s
user	0m0.552s
sys	0m0.262s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.750s
user	0m0.556s
sys	0m0.270s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.655s
user	0m0.497s
sys	0m0.224s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.750s
user	0m0.574s
sys	0m0.253s

After:

alx@asus5775:~/src/nginx/unit$ git switch const_array 
M	src/nxt_main.c
Switched to branch 'const_array'
Your branch is up to date with 'alx/const_array'.
alx@asus5775:~/src/nginx/unit$ git clean -dffx >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ ./configure >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ make -j >/dev/null; echo $?
0
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.750s
user	0m0.568s
sys	0m0.260s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.537s
user	0m0.418s
sys	0m0.173s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.659s
user	0m0.498s
sys	0m0.229s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.638s
user	0m0.489s
sys	0m0.213s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.588s
user	0m0.457s
sys	0m0.192s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.726s
user	0m0.534s
sys	0m0.266s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.746s
user	0m0.557s
sys	0m0.264s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.559s
user	0m0.430s
sys	0m0.187s
alx@asus5775:~/src/nginx/unit$ time for i in {1..1000}; do ./build/unitd; done

real	0m0.748s
user	0m0.558s
sys	0m0.268s

For a simple visual difference, see the 'real' times, sorted, so that we can see the median easily:

alx@asus5775:~/tmp$ grep real <times_before.txt | cut -f2 | sort
0m0.645s
0m0.655s
0m0.738s
0m0.747s
0m0.750s
0m0.750s
0m0.750s
0m0.751s
0m0.751s
alx@asus5775:~/tmp$ grep real <times_after.txt | cut -f2 | sort
0m0.537s
0m0.559s
0m0.588s
0m0.638s
0m0.659s
0m0.726s
0m0.746s
0m0.748s
0m0.750s

The median before was 0.750 s, and after is 0.659 s. That's a drop of 12% startup time. YMMV, of course.

Worst case seems to be almost the same. Best case improved considerably. As always, these tests are not something perfect.

@ac000
Copy link
Member

ac000 commented Jun 13, 2022

Just a few thoughts from me to share:

1. As far as I understood, patch is going to change existing code style. Some people prefer one code style, some another. So, from my point of view there is a need to provide strong arguments why it should be changed and all other team members should be happy with it (and also why this is more important than other tasks would be great too).

I'm not sure I'd go as far as calling it a change in coding style. As

const char *const  c = "ccc";
const char         *b = "bbb";
const char         a[] = "aaa";

are not entirely equivalent (even less so when you remove the consts) and indeed you may end up using all three variants in a code base. The same goes for arrays of strings, sometimes arrays of pointers would be the right choice, sometimes arrays of arrays would be better.

Of course the other thing that this patch set does is remove code duplication.

@VBart
Copy link
Contributor

VBart commented Jun 13, 2022

@alejandro-colomar to evaluate any numbers it's better to use statistics methods, rather than visual difference as direct comparison of numbers doesn't take into account confidence range.

% ./ministat before.txt after.txt
x before.txt
+ after.txt
+----------------------------------------------------------------------------+
|+      +        +                + x  x+                     +   x +**      |
|            |__________________________MA______|_____________A______|M_____||
+----------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   9         0.645         0.751          0.75    0.72633333    0.04353734
+   9         0.537          0.75         0.659    0.66122222   0.085572744
No difference proven at 95.0% confidence

Here nothing can be concluded from the numbers. It may be possible to prove something if many more measurements are made, but with only 9 measurements, nothing is proven.

And this is pretty expected, I'd be surprised to see any measurable difference in start-up time with such small (in absolute numbers) reduction in size. I'm not even sure that it makes any difference till the relevant memory page is actually accessed (and if there any difference in number of memory pages read at all?). The little variations in time are likely caused by various other factors in your system like interference of different processes running in parallel.

Note also that the difference in size is smaller than the difference between any two versions of Unit.

So overall, since this patch doesn't cause any meaningful effect for users, I'd think it should be only evaluated from the code style, maintainability, and development costs points of view. The burden of these evaluations I'll prefer to leave on active colleagues. =)

That's my 2cents.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

@VBart

I did a test with 10000 samples. The result confirms your theory. The time differences are non-existent:

  /home/alx/tmp/unitd-before /home/alx/tmp/unitd-after
N 10000 10000
Min 12 13
Max 94 81
Median 16 16
Avg 20.1779 20.0758
Stddev 10.2785902015622 10.0972793298488

The program to get the times is much more precise (and resistant to interferences, thanks to randomness) this time:

#include <err.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/random.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>


extern char  **environ;


static void alx_iterate(ptrdiff_t n, char *const prog[2]);


int
main(int argc, char *argv[])
{
	if (argc != 4)
		errx(EXIT_FAILURE, "Usage:  %s N PROG_A PROG_B", argv[0]);

	alx_iterate(atoi(argv[1]), &argv[2]);
	exit(EXIT_SUCCESS);
}


static void
alx_iterate(ptrdiff_t n, char *const prog[2])
{
	int        wstatus;
	char       *argv[2];
	pid_t      pid, w;
	ptrdiff_t  x;
	ptrdiff_t  i[2];
	clock_t    t[2][n][4];

	/* The first one is for warming up */
	for (int j = 0; j < 2; j++) {
		memset(t, 0, sizeof(t));
		memset(i, 0, sizeof(i));

		while (i[0] < n || i[1] < n) {
			if (i[0] == n) {
				x = 1;
			} else if (i[1] == n) {
				x = 0;
			} else {
				getrandom(&x, 1, 0);
				x &= 1;
			}

			argv[0] = prog[x];
			argv[1] = NULL;

			t[x][i[x]][0] = clock();

			pid = vfork();
			switch (pid) {
			case 0:
				execve(argv[0], argv, environ);
				err(EXIT_FAILURE, "execve(2)");
			case -1:
				err(EXIT_FAILURE, "vfork(2)");
			default:
				do {
					w = waitpid(pid, &wstatus, 0);
					if (w == -1)
						err(EXIT_FAILURE, "waitpid(2)");
				} while (!WIFEXITED(wstatus));

				t[x][i[x]][1] = clock();

				if (WEXITSTATUS(wstatus) != 4)
					errx(EXIT_FAILURE, prog[x]);
			}

			t[x][i[x]][2] = t[x][i[x]][1] - t[x][i[x]][0];
			i[x]++;
		}
	}

	printf("%s\t%s\n", prog[0], prog[1]);
	for (ptrdiff_t j = 0; j < n; j++) {
		printf("%ji\t%ji\n", (intmax_t) t[0][j][2],
		                     (intmax_t) t[1][j][2]);
	}
}

And then

./cmp-exec 10000 $HOME/tmp/unitd-before $HOME/tmp/unitd-after > times.tsv

The statistics were calculated with localc(1).

So, the only benefit right now would be having a smaller binary. I'd expect that when we continue adding features, we might arrive to a point where it does affect startup time. But there's the possibility that we won't arrive to that point ever. Right now, the only benefit is size, and the only drawback is readability/maintainability.


A second run of the whole experiment, with the order of the parameters reversed (just in case), shows more or less the same:

./cmp-exec 10000 $HOME/tmp/unitd-after  $HOME/tmp/unitd-before > times-rev.tsv
  /home/alx/tmp/unitd-after /home/alx/tmp/unitd-before
N 10000 10000
Min 13 13
Max 86 89
Median 16 16
Avg 21.2786 21.1663
Stddev 11.4729483958785 11.2317968124412

@alejandro-colomar
Copy link
Contributor Author

Since the size gainings are not great in absolute numbers (they improve important sections of the binary, but that would be relevant to performance), meaning that the absolute total binary size doesn't change by a meaningful amount, I'll consider this change superfluous.

And since some of us here tend to consider arrays uglier than pointers for storing string literals, I'll consider that to be more important in this case. So I'm closing this patch set for now.

Maybe we could revisit this in the future if our string literal usage increases.

@ac000
Copy link
Member

ac000 commented Jun 13, 2022

But the code de-duplication part is still valid right? Though that could maybe go in a new PR...

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

But the code de-duplication part is still valid right? Though that could maybe go in a new PR...

Wait a bit more. I did something really dumb. I compiled main() with a call to exit(3), and didn't consider that the optimizer would create an empty binary:

alx@asus5775:~/tmp$ size -B unitd-*
   text	   data	    bss	    dec	    hex	filename
   1827	    584	      8	   2419	    973	unitd-after
   1827	    584	      8	   2419	    973	unitd-before

I recompiled just now the binaries with an opaque call to exit(3), which now creates the right binaries:

alx@asus5775:~/tmp$ size -B unitd-*
   text	   data	    bss	    dec	    hex	filename
 372957	  29000	   1224	 403181	  626ed	unitd-after
 374629	  29640	   1224	 405493	  62ff5	unitd-before
diff --git a/src/nxt_main.c b/src/nxt_main.c
index 26bee873..2a00d478 100644
--- a/src/nxt_main.c
+++ b/src/nxt_main.c
@@ -11,11 +11,16 @@
 extern char  **environ;
 
 
+void alx_exit(int x);
+
+
 int nxt_cdecl
 main(int argc, char **argv)
 {
     nxt_int_t  ret;
 
+    alx_exit(4);
+
     if (nxt_lib_start("unit", argv, &environ) != NXT_OK) {
         return 1;
     }
diff --git a/src/nxt_time.c b/src/nxt_time.c
index dfead51c..a67a96ed 100644
--- a/src/nxt_time.c
+++ b/src/nxt_time.c
@@ -7,6 +7,16 @@
 #include <nxt_main.h>
 
 
+void alx_exit(int x);
+
+
+void
+alx_exit(int x)
+{
+    exit(x);
+}
+
+
 /* OS-specific real, monotonic, and local times and timezone update. */

I still wonder if the arrays are initialized at program startup or at first use (which would be much harder to measure), but I'll run the tests and see if we see something there.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 13, 2022

Okay, I can confirm, now with the right binaries in place, that at program startup there's no performance improvement by this patch set. Can't talk about first use, though, which my experiment didn't test (and I don't even know how/if would we could test that):

  /home/alx/tmp/unitd-before /home/alx/tmp/unitd-after
N 100000 100000
min 14 14
max 132 144
median 40 40
avg 35.64983 35.58983
stddev 15.9537392541383 15.9523695842345

So, I'll keep this PR closed. I'll keep the patch set in my git repository, in case someone wants to reopen it in the future.

Regarding the de-duplication of code, I still think it's a good move. @hongzhidao would you like to apply that change?

@hongzhidao
Copy link
Contributor

I prefer not to do so at this stage, since there are only two places having same variables, compared to introducing more changes.

@andrey-zelenkov
Copy link
Contributor

I think reducing some important sections of the binary, by the order of KB (some sections are reduced by up to 6% of their current size; check commit logs), is a strong argument by itself.

Not for me to be honest. Never checked out binary size before this conversation or thought about it as a relevant metric that should be taken into account. In the case of infinite time and resources I probably could agree with you that keeping binary size/build time/configure time less may be a good idea. But still we will need to trade developer's time for a few KBs or speed up build for tenths of a second (and I am not sure that it is worth).

My question now that I know your aesthetic opinion is: Does it really matter so much?

I am just used to the existing code style. From my side the drawback I can see: if somebody wants to add or remove something from an array then length also has to be changed with increasing overall diff.

I'm not sure I'd go as far as calling it a change in coding style.

Sorry for incorrect terminology (I am really bad at C). The main idea I was trying to convey is that this project was developed by people with solid C expertise too, so before changing something it makes sense to understand or ask why it was done that way (to avoid several rewriting). Again, I am not addressing this statement to you personally, it's just my thoughts.

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

Why? Nginx works fine without this principle (even on raspberry pi afaik).

@ac000
Copy link
Member

ac000 commented Jun 14, 2022

I think reducing some important sections of the binary, by the order of KB (some sections are reduced by up to 6% of their current size; check commit logs), is a strong argument by itself.

Not for me to be honest. Never checked out binary size before this conversation or thought about it as a relevant metric that should be taken into account. In the case of infinite time and resources I probably could agree with you that keeping binary size/build time/configure time less may be a good idea. But still we will need to trade developer's time for a few KBs or speed up build for tenths of a second (and I am not sure that it is worth).

I guess it can also help with CPU cache utilisation.

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

Why? Nginx works fine without this principle (even on raspberry pi afaik).

Well, if you have a header file that's included in every .c file whether it needs to be or not, editing that header file will cause the whole project to be rebuilt, when maybe only one or two files need to be.

@hongzhidao
Copy link
Contributor

But the code de-duplication part is still valid right? Though that could maybe go in a new PR...

I'm sure there are many places with similar cases, not only happens in Unit. Just take unit as an example.

nxt_http_arguments_parse(nxt_http_request_t *r)

nxt_http_cookie_parse(nxt_array_t *cookies, u_char *start, u_char *end)

They are quite similar, they do parsing and return array-type with the type of the same elements: nxt_http_name_valule_t.
Currently, their parsing logic is independent of each other. If we refactor them out to a more generic API just because of duplicated code, we have to spend more time thinking about an ideal solution. I think we can do it but it will take more time.
Sometimes it's worth giving up the urge to do premature refactoring, especially if it's already working well and rarely changing.

But it doesn't mean that refactoring is not welcome. Every member of the team did a lot, including Alex. Take this patch as an example, they have already worked well for a long time without being changed.

And I think we can find a way to eliminate nxt_controller_date(). It needs more work.
That'll be for another topic :)

It's good to do such refactoring if we want to do such refactoring about these variables. And We encourage this because the value is greater and more obvious. But as Andrei said, we have no unlimited time and resources. There are times to prioritize, especially since unit has a lot of functionality to support. It's not clear enough to merge this patch at this stage from the value view of point.

Well, if you have a header file that's included in every .c file whether it needs to be or not, editing that header file will cause the whole project to be rebuilt, when maybe only one or two files need to be.

Then we will spend N times on the precise replacement of header files. unit like nginx and njs just includes everything to not care while programming, otherwise with any change we will need to find headers and include relevant... and remove not needed... and again and again. Note that the header files of router and http, which are more often changed are not in the nxt_main.h. Actually, from our experience of developing unit, we rarely change header files, especially the common ones.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 14, 2022

Re: iwyu(1):

@andrey-zelenkov

Yes, that's the basics. Each file should include exactly the header files that provide the identifiers it uses. Not more; not less.

Why? Nginx works fine without this principle (even on raspberry pi afaik).

@ac000

Well, if you have a header file that's included in every .c file whether it needs to be or not, editing that header file will cause the whole project to be rebuilt, when maybe only one or two files need to be.

@hongzhidao

Then we will spend N times on the precise replacement of header files. unit like nginx and njs just includes everything to not care while programming, otherwise with any change we will need to find headers and include relevant... and remove not needed... and again and again. Note that the header files of router and http, which are more often changed are not in the nxt_main.h. Actually, from our experience of developing unit, we rarely change header files, especially the common ones.

In this specific case, where we have a really small project, since a complete build is relatively fast

$ time make -j >/dev/null

real	0m3.150s
user	0m9.785s
sys	0m1.092s

I think I could even agree with the simplicity of having a main header included everywhere. It's really simple, and reduces the time we have to think about updating header files.

Okay, that time would ideally be also 0 if iwyu(1) was part of the build system and iwyu(1) didn't have important bugs, and if iwyu(1) had full knowledge of glibc internals. But I've worked on iwyu(1), and I know its glibc support is close to 0 (I've been improving that, but mostly, it's far from usable), and it also has important bugs.

As long as there are no cyclic dependencies, the main header can be easily written, just by putting every include as high as it can be, and I don't expect it to cause much trouble.

3 seconds is more than 10 milliseconds for a rebuild, but I can live with that. The idea will still float in my mind, but I don't see this changing in the near future. Maybe when iwyu(1) improves considerably, I'll rethink.

@alejandro-colomar
Copy link
Contributor Author

alejandro-colomar commented Jun 14, 2022

@VBart @ac000

I think reducing some important sections of the binary, by the order of KB (some sections are reduced by up to 6% of their current size; check commit logs), is a strong argument by itself.

Not for me to be honest. Never checked out binary size before this conversation or thought about it as a relevant metric that should be taken into account. In the case of infinite time and resources I probably could agree with you that keeping binary size/build time/configure time less may be a good idea. But still we will need to trade developer's time for a few KBs or speed up build for tenths of a second (and I am not sure that it is worth).

I guess it can also help with CPU cache utilisation.

I've been thinking about this overnight. I still believe this optimization is meaningful; only that I can't demontrate it, and in this program, which is long lived, it's hard to test specifically for that. But let me summarize my thoughts.

In the worst case, which is a global extern const char *e[];, the compiler has to emit 2 things: the consecutive strings (unaligned), and an array of pointers. Each of those 2 objects will probably be far from the other in memory.

In case of an extern const char e[][N], there's just 1 object, aligned if we want (by choosing an appropriate array size).

So my expectations of performance would be that when we use the strings, we need to load 2 pages from memory (if they aren't hot in cache), one for the array of pointers, and one for the strings. Then there's the issue that each string will (probably) be unaligned, and operations such as memcpy() might have to do a few extra operations (not so bad, but it's a thing).

With arrays, we will only need to load the strings themselves (1 page); that's less opportunity for cache misses. And we can control the alignment of the strings, which as long as it's a power of 2 we should be more or less fine (multiples of 8 ideally).

However, I'm not able to prove any of this right now. So I'm keeping this closed. I suggested to some open source project, in which I actively participate in the mailing list, to apply this optimization on their program; since that program is a one-shot, we could measure things easily. We could then extrapolate those numbers to unit. So this is not (yet) a dead end, but I won't push on this until I have some proof for my theories.

@ac000
Copy link
Member

ac000 commented Jun 14, 2022

In this specific case, where we have a really small project, since a complete build is relatively fast

$ time make -j >/dev/null

real	0m3.150s
user	0m9.785s
sys	0m1.092s

Heh, my numbers look a little different...

$ time make -j >/dev/null

real    0m7.753s
user    0m23.475s
sys     0m3.381s

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicated strings, and arrays of pointers

5 participants