Interview Guide

20 Advanced Load Testing Interview Questions and Answers

July 5, 2025 · 13 min read

Load Testing Q&A Component

Jump to Category

Test Strategy & Planning	Scripting & Test Design
Test Execution & Infrastructure	Analysis & Metrics
✨ Advanced Concepts & Modern Practices

Test Strategy & Planning

1. Differentiate between Load, Stress, Soak, and Spike testing.

These are all types of performance testing, but they have different goals:

Load Testing: To verify that the system can handle an expected, realistic load (e.g., normal peak traffic) while meeting performance requirements (SLOs).
Stress Testing: To find the upper limit or breaking point of the system by gradually increasing the load beyond normal expectations. The goal is to see how the system fails and if it recovers gracefully.
Soak (or Endurance) Testing: To check for issues that only appear over time by running a sustained, moderate load for an extended period (e.g., several hours or days). It’s used to find problems like memory leaks or resource exhaustion.
Spike Testing: To see how the system responds to a sudden, massive burst of traffic. It tests the system’s elasticity and ability to scale rapidly.

Read a comparison of performance test types.

2. How do you create a realistic workload model for a load test?

A realistic workload model is crucial for meaningful results. The process involves:

Identify Key User Journeys: Determine the most critical and frequently used business transactions (e.g., login, search for product, add to cart, checkout).
Analyze Production Data: Use analytics tools and server logs to determine the traffic mix. What percentage of users perform each journey during peak hours?
Determine Throughput: Calculate the target throughput for each transaction (e.g., requests per second or transactions per hour) based on production metrics.
Define User Behavior: Incorporate realistic think time (the time a user spends between actions) and pacing (the time between user iterations).

The goal is to simulate user behavior as closely as possible, not just to hammer a single endpoint with traffic.

Learn more about Workload Modeling.

3. What are Non-Functional Requirements (NFRs) and how do they guide your test design?

NFRs define the quality attributes of a system, specifying “how well” it should perform its functions. For performance testing, key NFRs are:

Response Time: E.g., “91% of login requests must complete within 500ms.”
Throughput: E.g., “The system must support 200 orders per second.”
Concurrency: E.g., “The system must support 5,000 concurrent users.”
Resource Utilization: E.g., “CPU utilization must remain below 80% under peak load.”

These NFRs form the pass/fail criteria for your load tests. Your test design (workload, duration) is built specifically to verify if the system meets these requirements.

4. What is “shift-left” performance testing?

“Shift-left” is the practice of moving performance testing earlier in the development lifecycle. Instead of waiting for a late-stage, pre-production performance test, you integrate smaller, more frequent performance tests directly into the CI/CD pipeline. This could involve:

Running component-level performance tests on every build.
Automated load tests triggered on every pull request to a specific microservice.
Integrating performance results into the PR review process.

The goal is to catch performance regressions early, when they are easier and cheaper to fix, rather than discovering them right before a release.

Read a guide on Shift-Left testing.

5. What is Little’s Law and how is it relevant to performance testing?

Little’s Law is a theorem from queueing theory that states a fundamental relationship: `L = λ * W`

`L` = The average number of concurrent users in the system.
`λ` (lambda) = The average arrival rate (throughput, e.g., requests per second).
`W` = The average time a user spends in the system (response time).

It’s relevant because it allows you to sanity-check your assumptions and results. If you know two of the variables, you can calculate the third. For example, if your system is handling 100 requests/sec (`λ`) and the average response time is 0.5 seconds (`W`), you can calculate that there are, on average, 50 concurrent requests (`L`) active in your system.

Read a primer on Little’s Law.

Scripting & Test Design

6. What is correlation in load testing scripts and why is it mandatory?

Correlation** is the process of capturing dynamic data from a server response and using it in subsequent requests. Modern web applications are stateful; a server will often return dynamic values (like session IDs, CSRF tokens, or resource IDs) that must be sent back in later requests to maintain a valid session.

It’s mandatory because if you simply record and replay a user session, you will be replaying static, expired session data, and the server will reject your requests. You must extract (correlate) these dynamic values from responses and use them as variables in your script.
Learn about Correlation and Parameterization.

7. What is the purpose of “think time” and “pacing” in a script?

Think Time: The delay that simulates the time a real user would spend thinking or reading between actions (e.g., the time between loading a product page and clicking “Add to Cart”). Omitting think time creates an unrealistically aggressive load.

Pacing: The delay between the end of one full iteration of a user’s workflow and the start of the next. Pacing is used to control the overall throughput of your test and ensure you are hitting your target transactions per second without overwhelming the system instantly.

8. How would you handle OAuth 2.0 authentication in a load test script?

Handling OAuth 2.0 can be complex. The best approach depends on the grant type, but a common strategy is:

Pre-generate tokens: If possible, generate a pool of valid access tokens and refresh tokens before the test starts. Each virtual user can then pick a token from a data file. This avoids putting the authentication server under load.

Script the token flow: If pre-generation isn’t possible, you need to script the token acquisition flow. This might involve a one-time setup thread group to get initial tokens using a client credentials grant or by scripting the full redirect flow.

Token Renewal: The script must be able to handle token expiration by calling the token refresh endpoint when it receives a 401 Unauthorized response.

9. How do you test a system that uses WebSockets or Server-Sent Events (SSE)?

Testing these protocols is different from standard HTTP request-response testing. Your load testing tool must have specific support for them.

WebSockets: The script needs to establish a persistent connection, send and receive messages asynchronously, and keep the connection open for the duration of the test. You would measure things like message latency and the server’s ability to handle a large number of concurrent connections.

SSE: The script establishes a connection and then listens continuously for messages pushed from the server. The focus is on the server’s ability to handle many open connections and push data efficiently.

Tools like Gatling, k6, and JMeter (with plugins) have support for these protocols.

Test Execution & Infrastructure

10. What are the pros and cons of using cloud-based load generators vs. on-premises?

Cloud-based (e.g., BlazeMeter, k6 Cloud, Azure Load Testing):

Pros: Easy to scale to massive loads, can generate traffic from different geographic regions, no infrastructure to manage.

Cons: Can be more expensive, might have less control over the test environment, traffic comes from outside your network which may not be suitable for testing internal systems.

On-premises:

Pros: Full control over the hardware and network, no data transfer costs, better for testing internal applications behind a firewall.

Cons: Limited by your own hardware capacity, difficult to generate geographically distributed traffic, requires maintenance and setup.

11. How do you ensure your load generators themselves are not the bottleneck?

This is a critical aspect of test setup. You must monitor the health of the load generator machines during the test.

Key metrics to watch on the load generators are:

CPU Utilization: If it’s consistently high (e.g., > 80-90%), the machine may be struggling to generate the requested load.

Memory Usage: Ensure the test tool has enough heap space and is not constantly garbage collecting.

Network I/O: Check for network saturation.

If a bottleneck is found, the solution is to use more powerful machines or to distribute the load across more generator instances.

12. What is a distributed load test setup?

A distributed load test setup is used when a single machine is not powerful enough to generate the required load. It typically follows a controller-agent (or master-slave) model:

A single **Controller** node orchestrates the test, manages the test script, and aggregates the results.

Multiple **Agent** nodes receive the script from the controller and are responsible for actually generating the virtual user traffic.

This allows you to scale your load test horizontally by simply adding more agent machines.

Analysis & Metrics

13. Why are percentiles (like 95th or 99th) more important than averages for response time analysis?

The **average** response time can be very misleading because it can be easily skewed by a small number of very fast responses and can hide significant outlier behavior. A few extremely slow requests will have little impact on the average.

Percentiles give a much better picture of the user experience.

The **95th percentile (p95)** response time means that 91% of your users experienced this response time or better.

The **99th percentile (p99)** shows the experience of your worst-case 1% of users.

A high p99 response time, even with a good average, indicates that a significant number of users are having a very poor experience. Read why averages suck and percentiles are great.

14. You run a load test and see that as you increase the user load, throughput remains flat while response times increase linearly. What is likely happening?

This is a classic sign that the system has hit a **bottleneck** and is saturated. The increasing response time is due to requests spending more and more time waiting in a queue for a constrained resource.

The bottleneck could be:

CPU Saturation: The server’s CPU is at 100%.

Resource Pool Exhaustion: The system has run out of available resources, such as database connections or available threads in a thread pool.

Lock Contention: Requests are getting stuck waiting for a lock on a shared resource in the code or database.

The next step is to analyze server-side metrics to identify which resource is saturated.

15. How do you correlate client-side metrics with server-side metrics to find a bottleneck?

The key is to have a single, unified view of both sets of metrics on a synchronized timeline. During a load test, you should be monitoring:

Client-Side Metrics (from the load tool): Response time, throughput, error rate.

Server-Side Metrics (from your APM or monitoring tool): CPU utilization, memory usage, disk I/O, network I/O, GC activity, database connection pool usage, etc., for every server in the application stack.

By placing these on the same graph, you can directly see the cause and effect. For example, you can see the exact moment that response times start to climb and correlate it with the CPU on the database server hitting 100%.

16. What is the “Coordinated Omission” problem in performance testing?

Coordinated Omission is a subtle but significant issue where a load testing tool unintentionally measures a system as performing better than it actually is. This happens when the tool’s measurement of response time does not account for the time the system was too busy to even accept the request.

For example, if a client is set to send a request every second, but the server is slow and the client gets blocked waiting for the previous response, it might only be able to send a request every 1.5 seconds. The tool might only measure the 0.5s response time, “omitting” the 1s of coordinated waiting time. This hides the true latency experienced by the user. Advanced tools and analysis are needed to correct for this.
Read about how not to measure latency.

Advanced Concepts & Modern Practices

17. How would you design a performance test for a microservices architecture?

Testing microservices requires a multi-faceted approach:

Component-Level Tests: Isolate and run performance tests against each individual microservice, mocking its dependencies. This helps identify bottlenecks within a single service.

End-to-End Tests: Run tests against the public-facing API Gateway, simulating a full user journey. This is crucial for understanding the overall system latency and identifying emergent bottlenecks caused by the interaction between services.

Distributed Tracing: This is essential. During an end-to-end test, you must have distributed tracing enabled (e.g., with Jaeger or OpenTelemetry) to see how much time a request spends in each downstream service, allowing you to pinpoint which service is causing a slowdown.

18. What is the role of chaos engineering in performance testing?

Chaos engineering complements performance testing by focusing on resiliency rather than just raw performance. While a load test verifies how a system performs under normal conditions, chaos engineering involves intentionally injecting failures into a production or staging environment (while it is under load) to see how the system behaves.

For example, you could run a load test and then use a chaos engineering tool to terminate a database instance or inject network latency between two services. This helps you proactively discover weaknesses and verify that your resiliency patterns (like circuit breakers and retries) actually work as expected under stress.

19. How would you simulate different network conditions (e.g., high latency, packet loss) in a test?

Simulating realistic network conditions is important, especially for testing applications used by mobile clients. This can be done with:

Network Emulation Tools: Using tools like `tc` (traffic control) on Linux to introduce latency, jitter, packet loss, and bandwidth limitations on the network interfaces of the load generators or the application servers.

Service Mesh: Modern service meshes like Istio have built-in capabilities to inject network faults and delays for testing purposes.

Third-Party Tools: Specialized proxy tools that can be placed between the client and server to manipulate network traffic.

20. What is a “performance test baseline”?

A performance test baseline is a set of performance metrics captured from a test run on a known, stable version of the application in a controlled environment. This baseline serves as the “gold standard” or point of comparison for all future tests. When you run a performance test on a new build, you compare its results (response time, throughput, etc.) against the baseline. Any significant deviation or regression from the baseline indicates a performance issue that was introduced in the new code.

Skip the interview marathon.

We pre-vet senior engineers across Asia using these exact questions and more. Get matched in 24 hours, $0 upfront.
Get Pre-Vetted Talent

Is your team AI-Native?

Take the 2-minute AI Readiness Test. Free report, no signup.
Take the Test

Popular Guides

8 Best AI/ML Engineers System Design Interview Questions [2026]

Top 20 Full Stack Developer Interview Questions for Employers

Top 20 Backend Developer Interview Questions for Employers

Top 20 HAProxy Developer Interview Questions for Employers

Top 20 Traefik Developer Interview Questions for Employers

Top 20 gRPC Developer Interview Questions for Employers

Skip Interviewing

Hire Developers Hire AI Engineers EOR Services Developer Rate Card

20 Advanced Load Testing Interview Questions and Answers

Jump to Category

Test Strategy & Planning

1. Differentiate between Load, Stress, Soak, and Spike testing.

2. How do you create a realistic workload model for a load test?

3. What are Non-Functional Requirements (NFRs) and how do they guide your test design?

4. What is “shift-left” performance testing?

5. What is Little’s Law and how is it relevant to performance testing?

Scripting & Test Design

6. What is correlation in load testing scripts and why is it mandatory?

7. What is the purpose of “think time” and “pacing” in a script?

8. How would you handle OAuth 2.0 authentication in a load test script?

9. How do you test a system that uses WebSockets or Server-Sent Events (SSE)?

Test Execution & Infrastructure

10. What are the pros and cons of using cloud-based load generators vs. on-premises?

11. How do you ensure your load generators themselves are not the bottleneck?

12. What is a distributed load test setup?

Analysis & Metrics

13. Why are percentiles (like 95th or 99th) more important than averages for response time analysis?

14. You run a load test and see that as you increase the user load, throughput remains flat while response times increase linearly. What is likely happening?

15. How do you correlate client-side metrics with server-side metrics to find a bottleneck?

16. What is the “Coordinated Omission” problem in performance testing?

Advanced Concepts & Modern Practices

17. How would you design a performance test for a microservices architecture?

18. What is the role of chaos engineering in performance testing?

19. How would you simulate different network conditions (e.g., high latency, packet loss) in a test?

20. What is a “performance test baseline”?

More Interview Guides

30 Go (Golang) Challenges for Junior to Senior Developers

Top 20 Jenkins Developer Interview Questions for Employers

Top 20 Playwright Developer Interview Questions for Employers

Our Latest Insights

10 Highest Paying Programming Languages in 2026

Outsource Software Development to Asia: Country-by-Country Guide

Hire Data Annotators in the Philippines: Costs, Skills &amp; Considerations [2026]

Hire Data Annotators in the Philippines: Costs, Skills & Considerations [2026]