Skip to content

WIP: Add a stand-alone "timing" program#15088

Closed
richsalz wants to merge 1 commit intoopenssl:masterfrom
richsalz:temp-add-timing
Closed

WIP: Add a stand-alone "timing" program#15088
richsalz wants to merge 1 commit intoopenssl:masterfrom
richsalz:temp-add-timing

Conversation

@richsalz
Copy link
Contributor

There have been some discussions about how long decoding objects takes. I don't expect this to be merged and will keep this as a draft forever. :)

Right now it can load X509 certs or PrivateKeys. Happy to take additions.

@richsalz richsalz changed the title Add a stand-alone "timing" program WIP: Add a stand-alone "timing" program Apr 29, 2021
@richsalz richsalz marked this pull request as draft April 29, 2021 17:57
@richsalz
Copy link
Contributor Author

For master, with ./Configure -d --strict-warnings enable-md2 enable-ec_nistp_64_gcc_128 I get:

gcc -g -Iinclude test/timing.c libcrypto.a -ldl -lpthread
; ./a.out -wc test/timing-cert.pem 
user      0 sec 5066 microsec
sys       0 sec 8759 microsec
; ./a.out -wp test/timing-key.pem 
user      0 sec 220603 microsec
sys       0 sec 247751 microsec
; 

I just switched to OpenSSL_1_1_1-stable and tried ./Configure linux-x86_64 and it failed :( For another day I guess.

@richsalz
Copy link
Contributor Author

Here's the 1.1.1 version, Configured with ./Configure linux-x86_64:

; gcc -g -Iinclude test/timing.c libcrypto.a -ldl -lpthread
; ./a.out -wc test/timing-cert.pem
user      0 sec 2067 microsec
sys       0 sec 3704 microsec
; ./a.out -wp test/timing-key.pem 
user      0 sec 3865 microsec
sys       0 sec 5651 microsec
; 

Compared to master, 1.1.1 is twice as fast for loading a cert, and more than 40 times faster for loading a private key.

@richsalz
Copy link
Contributor Author

Updated to have same compiler on master:

; ./a.out -wc test/timing-cert.pem
user      0 sec 2729 microsec
sys       0 sec 5100 microsec
; ./a.out -wp test/timing-key.pem 
user      0 sec 127453 microsec
sys       0 sec 138800 microsec

So loading a cert is roughly the same and loading a key is 24 times slower.

@t8m
Copy link
Member

t8m commented Apr 30, 2021

You might want to try #15045
That should improve the key timing numbers.

@richsalz
Copy link
Contributor Author

With the non-SM2 parts merged, we get much better. Master is 28 times slower than 1.1.1 and #15045 is 18 times slower.

; ./a.out -wc test/timing-cert.pem  
user      0 sec 2616 microsec
sys       0 sec 4887 microsec
; ./a.out -wp test/timing-key.pem 
user      0 sec 78390 microsec
sys       0 sec 92085 microsec
; 
; echo -n '1.1.1    ' ; echo 3865 5651 + p | dc
1.1.1    9516
; echo -n 'mater    ' ; echo 127453 138800 + p | dc
mater    266253
; echo -n 'pr 15045 ' ; echo -n 78390 92085 + p | dc
pr 15045 170475
; echo 2 k 170475 9516 / p | dc
17.91
; echo 2 k 266523 9516 / p | dc
27.97

@richsalz
Copy link
Contributor Author

richsalz commented May 4, 2021

Re-ran tests. Added malloc counting. Ugh.

; CC='ccache gcc' ; mk -s clean ; ./config ; mk -sj4
; gcc -g -Iinclude test/timing.c libcrypto.a -ldl -lpthread
; ./a.out -wc test/timing-cert.pem
For 100 iterations:
 timing user       0 sec 2785 microsec
 timing sys        0 sec 5132 microsec
 mallocs 11300 frees 16200 reallocs 1900 bytes 1114400
; ./a.out -wp test/timing-key.pem
For 100 iterations:
 timing user       0 sec 150561 microsec
 timing sys        0 sec 168984 microsec
 mallocs 689200 frees 1758100 reallocs 12900 bytes 25561500

PR #15045
; gcc -g -Iinclude test/timing.c libcrypto.a -ldl -lpthread
; ./a.out -wc test/timing-cert.pem
For 100 iterations:
 timing user       0 sec 2444 microsec
 timing sys        0 sec 4523 microsec
 mallocs 11300 frees 16200 reallocs 1900 bytes 1114400
; ./a.out -wp test/timing-key.pem
For 100 iterations:
 timing user       0 sec 92763 microsec
 timing sys        0 sec 105526 microsec
 mallocs 319100 frees 580500 reallocs 10600 bytes 11844700

For 1.1.1
; ./a.out -wc /tmp/timing-cert.pem
For 100 iterations:
 timing user       0 sec 1492 microsec
 timing sys        0 sec 7203 microsec
 mallocs 9700 frees 14000 reallocs 1600 bytes 966400
; ./a.out -wp /tmp/timing-key.pem
For 100 iterations:
 timing user       0 sec 1368 microsec
 timing sys        0 sec 2151 microsec
 mallocs 5500 frees 6600 reallocs 200 bytes 1193900
;

Ping @DDvO, @t8m. Please look at the test/timing.c file and see if I am doing anything wrong.

@t8m t8m added branch: master Applies to master branch triaged: feature The issue/pr requests/adds a feature labels Aug 6, 2021
@t8m t8m added this to the Post 3.0.0 milestone Aug 6, 2021
@richsalz richsalz closed this Sep 7, 2021
@DDvO
Copy link
Contributor

DDvO commented Sep 8, 2021

Re-ran tests. Added malloc counting. Ugh.

Thanks for providing this! Results look similarly depressing to those I got.

Ping @DDvO, @t8m. Please look at the test/timing.c file and see if I am doing anything wrong.

Looks good. Sorry that I did not respond earlier - I had missed the whole thread until you closed it yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch: master Applies to master branch triaged: feature The issue/pr requests/adds a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants