Performance Testing Fallacies

When testing software performance, there are several erroneous assumptions commonly made about when and how to go about it, what is to be measured, and how to make improvements based on the results of such testing.

Performance, Load and Stress Testing Are Not Equivalent

Thinking that load and stress testing are the same as performance testing is a common fallacy, especially among developers. Not understanding the distinctions leads to inefficient approaches to measuring a software system’s responsiveness, infrastructure needs and its fragility.

  • Performance testing evaluates software runtime efficiency under moderate loads and should be performed as early as possible in the development cycle.
  • Load testing takes place after performance balancing is accomplished. It simulates real-world conditions to measure the software’s endurance and response to high volumes of I/O or concurrent users. It is most effectively performed using test automation tools.
  • Stress testing comes last. It applies extreme pressure in order to find the software’s breaking points and measure its ability to degrade gracefully and preserve data under crash conditions.

The Fallacy That Performance Testing Comes Later

The best time to apply true performance testing is during development. It should begin as soon as unit/regression tests are in place and full functionality of the software is nearing completion.

It should be performed by the developers with close guidance from testers. This is because you still have full mind share from developers, so any load balancing required as the result of performance testing can be accomplished quickly and efficiently. Throwing the task of performance testing over a virtual wall to the test team creates an unnecessary and expensive disconnect that increases defect detect-repair cycle time.

The One-For-All Testing Fallacy

Performance tests are expensive when they require testing a broad swath of functionality, many alternative flows and a full end-to-end infrastructure including client devices, servers and databases. Because of this, there is a temptation to limit the number of these tests and hope that a subset is able to shake out all the problems.

The better approach is to create a set of tests that exercise as much of the system as possible and prioritize these in terms of the cost or risk of defects in the parts of the system under test. Secondarily, take into account the cost of the tests themselves.

Then, create the highest priority tests, run them and find solutions to defects as you go along. In this manner, you are more likely to uncover the most serious defects that, once fixed, are likely to improve software performance in subsequent tests.

Assuming Your Test Environment Is “Good Enough”

The larger the delta between your in-house performance testing environment and the actual production or deployment environment, the more likely you will fail to uncover showstopper defects. The extreme cases of this fallacy are using a different OS or OS version than that used where the software is deployed.

There are many other differences possible between the test and production environments including the type or version of hardware or database in use. The exact provisioning and configuration of the runtime systems including resident applications and library versions must also be accounted for. The most advanced testing environments keep all hardware and software elements under a single version control system.

The Extrapolation Fallacy

Making conclusions about how a software system’s performance will scale is fraught with risk. For one thing, even software that theoretically scales linearly may succumb to potential non-linearity of the underlying infrastructure on which it runs.

For example, as the load on a particular hardware platform’s memory or disk usage increases there comes a point where swapping and thrashing create a bottleneck. Furthermore, as such limits are reached, the pressure these exert on one part of the software may lead to unexpected breakdowns in performance for other components.

For software systems with a prominent networking component, Metcalfe’s law is particularly applicable, which says that the number of network connections will grow in proportion to the square of the number of networking end points. In such a case, making an extrapolation from a test using an unrealistically small set of users could catastrophically miss defects.

The corollary to scaling assumptions is the additional fallacy that such problems can be solved by simply adding more hardware or networking.

Conclusion

Testing resources are always at a premium. In order to effectively take advantage of them with regard to performance testing, it is critical to understand performance testing types and when and where to apply them. Consistency and completeness in test environments is essential as is taking into account real-world software and hardware scaling issues. Finally, whenever possible, you should take full advantage of test automation to increase testing efficacy.