Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).
For 1.13.2: using TensorRT 8.5.2
For 1.14.0: using TensorRT 8.6.1
The loading weight is: 18b
In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds
In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.
I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2
Have you @lightvector or @hyln9 observed this? Thanks!