Microsoft Transaction Cost Analysis (TCA)
Posted on Feb, 2004 by Admin
Note: This article was originally posted on Loadtester.com, and has been migrated to the Northway web site to maintain the content online.
For some time, Microsoft has published their own capacity planning model called Transaction Cost Analysis (TCA) . They also offer their own stress testing tools (WAST, ACT, etc) that can be used as add-ons to Visual Studio or stand alone tools for testing Microsoft applications. I was recently asked why I don’t recommend this method and those tools to accomplish performance goals, rather than purchasing 3rd party commercial tools that are much more expensive? This was not the first time I was asked. It usually comes up when the CIO finds out how much of the budget these 3rd party tools will eat up, or after a sales presentation by one of the vendors. In this article, we will look into TCA to find out if this is a better way of handling performance testing, and not just a cheaper way.
What is TCA?
TCA was developed around 1989 by Microsoft engineers including Hilal Al-Hilali, Morgan Oslake, David Guimbellot, Perry Clarke, and David Howell. It is an approach for estimating capacity without having to run a test for every single scenario. Suppose you have a baseline test of 1000 concurrent users and you want to know what would happen with 5000? You could use the TCA calculations to determine this without running another test. There are five major steps to TCA:
1. Create a User Profile
2. Stress discrete operations to determine maximum throughput and CPU utilization
3. Calculate the Cost per User Operation
4. Estimate Site Capacity – using TCA formulas
5. Verify Site Capacity – run scripts that have all the operations as a user would execute
The user profile is created from existing IIS logs. For a new application with no legacy application behind it, it is mostly guesswork. You would normally base your profiles on the top web pages (or code pages) that make up 80-90% of the traffic of the web application.
I don’t have the space to list all of the formulas and techniques with examples in this article. I’ve provided some links at the bottom where you can find these. But I do want to mention the main formula:
C = U * N * S / A * B
C is the total Cost listed in Megacycles (number of CPU cycles used)
U is the amount of CPU Utilization you want to acheive
N is the number of processors
S is the MHZ of each CPU
A is the ASP per sec number derived from stressing the asp page
B is the number of operations (of asp pages) called (or used) in the ASP page of A. This means that if the ASP page you are stressing calls another ASP page everytime, this is considered to be 2 operations, not 1.
The key to this formula is A. You cannot get this formula without running a stress test that isolates that ASP page. You run it until you reach your maximum “ASP Per Second” number or the maximum CPU threshold set in the U value. This is where the Microsoft tools like ACT or WAST come into play. You could also use a third party tool such as LoadRunner or QARun.
Example: Suppose we have a server with two 500 MHZ processors and the Service Level Agreement (SLA) states the application should not cause over 85% utilization of the CPU during peak times. We want to find out what the cost of hompage.asp is. Multiply 85% by the speed of the processors, and by the number of processors
.85 * 500 * 2
We have 850 Megacycles to work with. We run homepage.asp through the WAST stress tool and find out it runs at 50 asp pages per second at the highest point. We also know that page is calling an additional header (or include) file (header.asp), so we actually have 2 operations:
50 * 2 = 100
850 megacycles divided by 100 is 85 megacyles. So our homepage.asp takes up 85 megacycles. This is the magic number that is used to get the cost of the operation for each user. Taking your user profile and the 10 or 12 pages that make up 90% of your traffic, you can determine how many megacycles you are using. If it exceeds 850 before you have reached the number of users you need to support, you need more hardware. That is the simple version of how it works, anyway.
Benefits of TCA
Developers can use this to keep tabs of the performance of single asp pages during development. A spreadsheet can be used to keep track of the operational costs for important pages. Those would be the key indicators for the application. In other words, the ones that do the major lifting in the application. A new set of numbers should be created for each build and compared. If operational cost are going up dramatically from build to build, perhaps code needs to be cleaned up or optimized. Catch the problem at the creation stage instead of in the results of an integrated load test right before production roll out. In my opinion, this is the best use of TCA.
A standard for Operational Cost can be documented and used as a guideline by development staff. All pages that are exceptions (over the acceptable threshold) should be documented in the build and used as a reference by the engineer that does the integrated performance testing.
Weaknesses of TCA
TCA assumes the Highest End Platform and can only scale down, not up. This presents a problem since capacity planning is usually done prior to deployment where the development and/or QA test environments are not scaled the same as production.
TCA was created to tackle Client/Server Systems (and classic .ASP), not necessarily a web application that leverages Web Services (.Net). What happens when the ASPX page calls other web services that call ASPX pages? How is this calculated when you may not even know how many calls there are in the external web services beyond your control?
TCA methodology looks only at issues related to code execution. What if the bottleneck resides in the number of calls to AD, or the images/page sizes are too large? What if other elements that make up the entire page request other than the asp code are the problem? You would not catch this.
It assumes only Windows based systems that leverage Perfmon and the Windows stress tools.
Most importantly, it is a predictive method, and not based on actual tests executed on the system. There is no way to actually know what will happen when a system is loaded with 1000 users unless you actually load 1000 users on the system.
TCA is a great methodology in the development phase and allows self-governing of performance at that phase of the lifecycle. If all I need to do is get a good guess for the number of users per server, I would rather spend 15 minutes making some predictive calculations, than the time required to set up a load test. It is best used for capacity planning purposes only. TCA does not replace integrated load and stress testing. The only way to determine the end user experience is to actually create that experience on a production matched system. This is what integrated load testing provides me the ability to do. In addition, correlated metrics for transaction times, system resources (perfmon), and network utilization are all in one place and format.
Links About TCA:
Some of the information here is covered in greater detail in the book “Performance Testing Microsoft .NET Web Applications”, which I do recommend. Here is a link:
Can you think of other uses of TCA? As always, your comments are welcome.