Introduction
Depending on the application field non functional requirements like response time, throughput or availability may be as important as functional ones. Not being able to fulfil them may make an application unfit for production.
Based on that it may be critical to keep an eye on response times or throughput during the development and maintenance life cycles. This is best addressed by automated tests, which can be orchestrated by a continuous integration tool like Jenkins.
In this blog post I don’t want to write about big design, complex infrastructure or comparison between tools. My aim is more humble. I just one to mention two pitfalls when you integrate your performance or load tests in your favourite CI tool and to give possible workarounds.
Be on time
You want to start your load tests when your application has fully started but not before. Sure, you can just let enough time for your application to start before you launch your tests but it may not be that much more complicate to watch for an event stating that your application has finished starting. Most of the applications would write something in the logs. On a Linux system like RHEL it is straightforward to watch the logs so that you may not need to do any change to your log monitoring system if you happen to use something like Logstash or Graylog. This short shell script does the job:
#!/bin/sh
FILE=${1:-server.log}
SEMAPHORE=${2:-JBAS015874}
if [ ! -f “$FILE” ]; then
echo “File could not be found!”
echo “Usage: start-monitor.sh FILE SEMAPHORE”
exit 1
fi
tail -f “$FILE” | while read LOGLINE
do
[[ “${LOGLINE}” == *${SEMAPHORE}* ]] && pkill -P $$ tail
done
echo “Server started, tests can begin!”
It is self explanatory the trick is to read from tail and to check whether the start completed event has been recorded. Note with Wildfly AS 7 / JBoss EAP 6 it is: JBAS015874.
Quick but not too quick
So we are now able to start our load tests when we are sure that the application has started. This is already a good point. But some time ago I was doing some performance testing by a customer and it reminds me that from a statistic point of view when you look at a decent size population a few elements may have a significant impact on the global outcome. By Telcos it is quite usual to monitor the 95 percentile for round trip time and other metrics. By doing so extreme values get filtered out. What does it have to do with our load tests? We are already taking care of the fact that the application is fully started. Fully started yes, but initialised?
An application server like JBoss EAP may load modules on demand depending on its configuration to reduce the starting time. In a same way threads or connection pools may only get fed when there is the need. Are you using a layer 2 or some other type of cache? Values need first to get populated into it before the database round trips can be avoided.
Coming back to my customer. He was simulating 60 parallel user sessions. 4 web services were repeatedly called.
Starting the tests right after sever start-up gave the following results:
| number of runs | max time (ms) | average time (ms) | |
|---|---|---|---|
| Service 1 | ~50 | 31000 | 3750 |
| Service 2 | ~4000 | 23000 | 560 |
| Service 3 | ~2000 | 22000 | 310 |
| Service 4 | ~1500 | 11000 | 260 |
A second time we started recording measures after we had performed a few calls on each service and the results looked quite different:
| number of runs | max time (ms) | average time (ms) | |
|---|---|---|---|
| Service 1 | ~50 | 1400 | 1150 |
| Service 2 | ~4000 | 2000 | 50 |
| Service 3 | ~2000 | 1600 | 50 |
| Service 4 | ~1500 | 850 | 45 |
I am not saying that edge conditions should not be considered but know what you are measuring otherwise you may get surprised… as I did.