Posts

Performance Testing Fallacies

When testing software performance, there are several erroneous assumptions commonly made about when and how to go about it, what is to be measured, and how to make improvements based on the results of such testing.

Performance, Load and Stress Testing Are Not Equivalent

Thinking that load and stress testing are the same as performance testing is a common fallacy, especially among developers. Not understanding the distinctions leads to inefficient approaches to measuring a software system’s responsiveness, infrastructure needs and its fragility.

  • Performance testing evaluates software runtime efficiency under moderate loads and should be performed as early as possible in the development cycle.
  • Load testing takes place after performance balancing is accomplished. It simulates real-world conditions to measure the software’s endurance and response to high volumes of I/O or concurrent users. It is most effectively performed using test automation tools.
  • Stress testing comes last. It applies extreme pressure in order to find the software’s breaking points and measure its ability to degrade gracefully and preserve data under crash conditions.

The Fallacy That Performance Testing Comes Later

The best time to apply true performance testing is during development. It should begin as soon as unit/regression tests are in place and full functionality of the software is nearing completion.

It should be performed by the developers with close guidance from testers. This is because you still have full mind share from developers, so any load balancing required as the result of performance testing can be accomplished quickly and efficiently. Throwing the task of performance testing over a virtual wall to the test team creates an unnecessary and expensive disconnect that increases defect detect-repair cycle time.

The One-For-All Testing Fallacy

Performance tests are expensive when they require testing a broad swath of functionality, many alternative flows and a full end-to-end infrastructure including client devices, servers and databases. Because of this, there is a temptation to limit the number of these tests and hope that a subset is able to shake out all the problems.

The better approach is to create a set of tests that exercise as much of the system as possible and prioritize these in terms of the cost or risk of defects in the parts of the system under test. Secondarily, take into account the cost of the tests themselves.

Then, create the highest priority tests, run them and find solutions to defects as you go along. In this manner, you are more likely to uncover the most serious defects that, once fixed, are likely to improve software performance in subsequent tests.

Assuming Your Test Environment Is “Good Enough”

The larger the delta between your in-house performance testing environment and the actual production or deployment environment, the more likely you will fail to uncover showstopper defects. The extreme cases of this fallacy are using a different OS or OS version than that used where the software is deployed.

There are many other differences possible between the test and production environments including the type or version of hardware or database in use. The exact provisioning and configuration of the runtime systems including resident applications and library versions must also be accounted for. The most advanced testing environments keep all hardware and software elements under a single version control system.

The Extrapolation Fallacy

Making conclusions about how a software system’s performance will scale is fraught with risk. For one thing, even software that theoretically scales linearly may succumb to potential non-linearity of the underlying infrastructure on which it runs.

For example, as the load on a particular hardware platform’s memory or disk usage increases there comes a point where swapping and thrashing create a bottleneck. Furthermore, as such limits are reached, the pressure these exert on one part of the software may lead to unexpected breakdowns in performance for other components.

For software systems with a prominent networking component, Metcalfe’s law is particularly applicable, which says that the number of network connections will grow in proportion to the square of the number of networking end points. In such a case, making an extrapolation from a test using an unrealistically small set of users could catastrophically miss defects.

The corollary to scaling assumptions is the additional fallacy that such problems can be solved by simply adding more hardware or networking.

Conclusion

Testing resources are always at a premium. In order to effectively take advantage of them with regard to performance testing, it is critical to understand performance testing types and when and where to apply them. Consistency and completeness in test environments is essential as is taking into account real-world software and hardware scaling issues. Finally, whenever possible, you should take full advantage of test automation to increase testing efficacy.

Using Testing to Improve API Performance

For application programmers, APIs are the user-interfaces upon which they build their own APIs, services and applications. They have been in use almost since software programming began. Until recently, their main acceptance criteria were their ease of use and functionality. However, in today’s world, they are on the critical path for determining application end-user performance, usually measured by how promptly they respond to user actions and requests.

Thus, API testing must encompass more than correct functioning of the API in terms of inputs and outputs or whether it fails gracefully in the face of errors. It hardly matters to end-users if the functionality is right or wrong if they cannot access it in a reasonable amount of time. API performance under load, therefore, must be measured and fine-tuned to remove processing bottlenecks.

Ensure Functional Stability First

API functionality must be verified first before meaningful performance testing can progress. This includes evaluating the API and its documentation to ensure that API calls and their descriptions line up and are self-explanatory. Start with happy-path test scripts that take the documentation literally. These tests might be enhanced versions of developer unit tests.

Next, stress the functionality by exploring border conditions, passing random or missing parameter values, large amounts of data, non-ASCII character sets and so on. Working with code that has compile-time instrumentation built-in greatly reduces the amount of time spent tracking down bugs that appear as a result of these tests. Be sure to log all inputs and outputs during test runs also.

Performance Testing Preparation

Once functionality is stable and before designing performance, load or stress tests, be sure you know what you are testing for. Examine the software requirements to determine important real-life performance metrics the API performance expectations. Metrics to look for include request throughput, peak throughput, the distribution of throughput per API endpoint and the maximum number of concurrent users supported.

With these data in hand, start general load tests that progressively increase demands on the API. These may shake out bugs in the API as well as the test environment. They will provide early baselines of the maximum load the API can serve without breaking.

At this point, apply tests that more accurately reflect the expected real-world usage of the API. If an existing version of the API is already in use, API call frequency distributions are invaluable in determining the most effective use of your testing resources. Alternatively, utilize previous production logs for the API and feed these to automated testing tools to recreate realistic scenarios.

Popular API Testing Tools

It is often useful to take advantage of cloud services, such as AWS, to provide a testing infrastructure that can expand and shrink according to the demands of your performance tests. Along with that, there are many testing tools available to create an effective testing environment quickly of which we cover a few.

Vegeta

This is an open-source command-line tool whose main focus is to produce a steady request per second rate specified by the tester. It is simpler to use than many of the other testing tools mentioned here and is ideal for setting up performance baselines.

Loader.io

The free version of this cloud testing service from SendGrid permits up to 10K requests per second, which is a meaningful load for most APIs. It is simple to set up and produces informative reports.

Wrk

This tool configures all tests via a command line interface. It is relatively easy to use for generating any target rate of HTTP requests you need. It is multi-threaded, so has a higher performance than other tools, but the reporting is non-graphical.

JMeter

JMeter is probably the best-known open-source performance testing tool. It uses a full-featured GUI for creating detailed test plans. The number of execution threads, parameters for HTTP requests and listeners used to display results are some of the parameters that can be specified. Its downside is its complexity and steep learning curve.

API performance as an acceptance criterion has become equal in importance to API functionality and usability. This is due to increased expectations of end users who are intolerant of delays and have many similar application options to choose from in a growing market of applications.

Thus, it is critical that testing gives API performance its due as part of the overall test plan. Fortunately, there are clear best practices and many tools to assist in performance evaluation and to help developers and testers balance both performance and functionality of the APIs being produced.

Performance vs. Load vs. Stress Testing

The differences between performance, load and stress testing may seem subtle or even non-existent to non-testers or less experienced testers. There are significant conceptual overlaps between the three, which can make it difficult to differentiate each approach clearly.

All three assume that the software under test has passed functional testing. All three test approaches are seeking results based on behavioral measurements of the software at runtime on target hardware. Often, these measurements are taken when the combination of software and hardware is near or beyond its expected peak capacity.

The distinctions are in how each test approach is applied and the specific objectives behind each testing method.

Performance Testing

The main objective of performance testing is to evaluate a software’s runtime efficiency under moderate load conditions. It reveals imbalances in the software’s execution that result in poor application responsiveness or slow processing due to latency or resource hogging. Such characteristics are not functional bugs but rather are defects in the software architecture, design or its specific implementation. Under load, the software may become I/O bound, CPU bound or suffer from client-server latency for example.

The response of the software developers, assuming the design is acceptable, is to optimize the code’s performance to an acceptable level of efficiency. As the software matures, performance testing eventually produces a firm baseline of software performance. At this point, this baseline becomes part of continuing regression tests to measure deviations in performance as software changes are added.

Detecting performance bottlenecks may be challenging until a system is sufficiently loaded. Response times may be acceptable with 20 concurrent users but not with 200 users. An additional goal of performance testing is recording such pressure point limits, re-testing these limits and increasing the load as code development continues up to what is specified in the original software performance requirements.

Load Testing

Once the software has met its performance requirements, load testing begins. Software performance balancing has been done, so the objective of load testing is more about endurance and volume. It attempts to simulate the real-world conditions it will be subjected to after deployment. Usually, this is done with automated tools to achieve a realistic deployment scale.

Examples of volume testing include simulating thousands of users accessing the application or the addition of tens of thousands of new user accounts in a brief time period. Endurance testing, also known as soak testing, is the application of load testing continuously over a long period of time as might occur, for example, on a bank’s online service during semi-monthly paydays.

Load testing reveals software defects that might not occur during the shorter intervals of performance tests. Such defects include memory leaks, buffer overflows, failure to close network or inter-module connections and gradual degradations in response time.

Stress Testing

Performance testing tunes software performance, while load testing ensures the software can manage peak deployment requirements. Stress testing then allows testers to bang on the software with hammers. Its primary purpose is to apply maximum pressure to see where and how the system breaks. It measures the software’s responses under extreme conditions, especially its ability to degrade gracefully, retain data and recover its previous state.

Stress tests push the software far beyond normal load limits, restrict resources and create unexpected integrity issues in a variety of ways:

  • Doubling the maximum limit on users
  • Closing and restarting database access suddenly
  • Randomly shutting down network connections
  • Removing runtime libraries upon which the software depends
  • Severely reducing physical resources such as disk space, memory or network bandwidth

If, after such conditions are applied, the system is able to recover without loss of data and without creating security leaks, it has successfully passed the stress test. It is also desirable that during failure the system issued meaningful error messages and a crash trail that assists operators in understanding the root cause of failures.

Naturally, there is much more that can be written on this topic including specific tools and techniques employed for each testing type and how these tests integrate depending on development methodologies. For instance, agile development teams may desire to apply these three test methodologies during sprints rather than deferring them to a point when the software is nearing release. In any case, this discussion has drawn sufficient distinctions between performance, load and stress testing to alleviate confusion regarding their specific methodologies and objectives.

Integrating Software Testing and Development

costs-of-software-development-and-testing-300x225 Integrating Software Testing and Development

Don’t flush money down the drain. Coordinate your development and testing cycles.

In any software product, quality is commonly recognized as lack of bugs. This is expressed as the reliability of software or defect rate.

To survive in a hyper competitive environment, many software organizations are now focused on integrating software testing and development through a Quality of Service based approach towards the development and testing process.

Benefits of the Quality of Service approach:

  • Save time and money by identifying defects early
  • Identify and catalog reusable modules and components
  • Avoid or reduce development downtime
  • Provide better customer service by building a better application
  • Build a list of desired modifications and enhancements for later versions
  • Identify areas where programmers and developers need training
  • Enhance user satisfaction based on their requirements

How does one ensure that the development and testing process is both cost and time effective?

Optimization is the Key!

The product development life cycle consists of multiple complex phases. The output of previous phase is the input to the next phase and every deliverable has certain quality attributes; therefore, the testing process holds the key to success of the product in the market.

Since time is always in short supply, optimized testing is paramount. As shown in Figure 1 below, every phase in the software development process is accompanied by elements of software testing.

For example, in the Detailed Requirements phase of the development process, one should also design a testing strategy, test analysis and design plan. These should be derived from user interviews and followed by a requirements testability review. There are many software testing types that help companies get through the software testing process.

Software-Development-and-Testing-Process1 Integrating Software Testing and Development

Figure 1. – Software Development and Testing process done in parallel.

Ultimately the purpose of implementing the testing process along with the development process is to save the team effort, time and money. During this process, it is important to identify someone to drive the implementation, identify the scope, define an implementation plan and monitor the roll out.

All of this is usually best covered with a small software testing pilot project before any organization wide roll outs are considered. This enables a company to test this approach on a small scale, so adjustments can be made where necessary before rolling out the process across the company.

At OptimusQA, we help companies with their software testing needs. For more information on our services, please contact us at rupmeet.singh@optimusinfo.com

Performance Testing Tips and Tools with HP LoadRunner

picplz_20110429_00001874417_00001-300x225 Performance Testing Tips and Tools with HP LoadRunner

Performance testing in Vancouver

Last night at VanQ, our technical manager Larry Ng presented on performance testing tips and tools using HP LoadRunner (full video embedded below). He explained the performance testing process from design to execution and reporting and how HP LoadRunner is utilized for performance testing.

From his experience he found that although automation tools like HP LoadRunner are expensive, they are worth the money in time saved. Specifically, he noted that HP LoadRunner has been drastically improving the amount of time he spends designing and running tests as opposed to scripting.

Here are  a few key tips for running successful performance tests:

  1. Schedule a dry-run prior to running tests so that all participants are coordinated and familiar with the entire process.
  2. Take a snapshot of databases prior to running the tests so it’s easy to start fresh for the next test.
  3. Reboot the servers between tests and clear the cache in browsers (if applicable).
  4. Use an instant messenger client to keep everyone organized and more importantly, keep a timed log of events.
  5. Use Perfmon to monitor server health throughout the test. Use the IM log to match up the times.
  6. Choose test scenarios that are stable and heavily used.
  7. Group your test scripts into transactions so that entire process is measured (ie: The whole login process instead of each step.)
  8. Use a naming convention to keep your transactions organized. This helps with both reporting and script maintenance.

Here are a few helpful links that we discussed for those that want to learn more about HP LoadRunner:

At Optimus we work with clients on a variety of platforms. Here are a few of the other leading performance testing solutions:

Commercial Tools:

Open-Source Tools:

And of course, as promised here are the slides from Larry’s presentation. We are uploading the video as well – I will post it when it’s available.

Visit the OptimusQA performance testing page for more on our testing services.

Run Windows Applications on an iPad

ipad-windows-300x225 Run Windows Applications on an iPadThe proliferation of mobile devices is pushing IT departments to enable mobile access to enterprise applications. One of the most popular devices is the iPad and there is a lot of demand to run Windows applications on an iPad.

To do this, instead of attempting to rebuild applications in native mobile application development languages (ie: iOS, Android, Symbian), enterprises are increasingly enabling mobile access through the use of thin-clients such as Citrix’s Receiver and VMware‘s View client.

This method of delivering applications to mobile devices has two main advantages:

  1. The existing applications don’t have to be rebuilt for each mobile device (although they may need a redesign in order to optimize the user experience on a smaller screen, often without a keyboard).
  2. The application can be much more powerful than any mobile app because it runs on a server and only sends the image to the mobile device. For example, a stock market analysis program could crunch the numbers and simply send the results to the iPad.

Below I have embedded two leading companies that deliver this capability.

Here’s an introduction to the Citrix Receiver:

And here’s a demo of VMWare’s new iPad app:

We have been working with clients to test application’s performance when delivered to mobile devices (specifically iPads and Blackberrys) and have been impressed with the results. The performance has been great as it relies primarily on a strong network, but the user experience can be hampered if the application has a design/workflow that’s not easy to interact with on a smaller screen without a mouse and keyboard.

(image credit: ChrisDag)

Software Testing: 5 Tips for Writing LoadRunner Scripts

hploadrunner-300x243 Software Testing: 5 Tips for Writing LoadRunner ScriptsOur software testing team in Vancouver has been busy writing LoadRunner scripts. LoadRunner is one of several tools we use at Optimus. It’s a powerful application that enables you to create automated test scripts that measure end-to-end application performance.By automating performance testing, you can conduct reliable, consistent benchmarking.

Here are 5 tips for writing LoadRunner scripts.

  1. Although LoadRunner’s HTTP/HTML protocol has been reliable and flexible, it’s always good to try the other protocols available in LoadRunner to identify which ones are the most compatible with the application under test. LoadRunner’s Protocol Analysis is a good start.
  2. Transaction names should use a naming convention which allow sorting and ordering so that the transaction summary graphs can easily show transactions in proper order.
  3. After recording a script, make sure to rearrange Think Times to be placed outside of transactions to avoid including think time as part of actual transaction time.
  4. Changing runtime settings’ Browser Cache handling and Browser’s emulation of new user can help resolving some playback issues.
  5. Make use of the various Description properties of functions like web_button can help in identifying the correct component to perform action on.

If you’d like to learn more about how Optimus can help your team setup or improve their performance testing, email us at info@optimusinfo.com

Have any tips of your own? Leave them in the comments!

Record Twitter Usage During World Cup

world-cup-on-twitter-300x186 Record Twitter Usage During World Cup

Twitter is having a difficult time keeping up with increased usage caused by the World Cup. The online social network has over 100 million users worldwide and this year’s World Cup is a popular topic.

Twitter is working hard to resolve the issues but there has been an increase in downtime and frequent appearances by the infamous fail-whale. On Twitter’s engineering blog they have explained what caused the problem and how they’re working to resolve it.

Twitter has also created a page dedicated to the World Cup. Their coverage highlights messages sent by Twitter users about teams, matches, players, and news.

We are sure that Twitter conducted extensive performance testing prior to the World Cup, but are still finding it challenging to maintain their networks.  Optimus works with clients to predict application usage and to test performance in order to identify and remove potential bottlenecks.

To learn how Optimus can help your company conduct thorough software testing, email us at info@optimusinfo.com