Category: Tools


Have you noticed that when ASP.NET web application is accessed for the first time the response is slow? The reason for such behavior is batch compilation that occurs on the first hit.

ASP.NET batch compilation is the process of compiling ASP.NET markup (content of aspx files) into temporary dll’s. Compilation requires invoking compiler (csc.exe for C#) – that is pretty heavy activity. Process Explorer shows it clearly:

ASP.NET Batch compilation

ASP.NET batch compilation occurs on per folder basis. Said that, if your application divided into multiple sub-folders that contain ASP.NET pages each time any of the folders accessed for the first time the batch compilation is invoked.

Note that starting with ASP.NET 2.0 compilation model has changed. Also, there is a toolAspnet_compiler.exe that allows pre-compile your ASP.NET web application to improve performance.

Customer’s case study

Customer’s web application is built with ASP.NET 1.1. It is divided into multiple subfolders reflecting logical modules that are hosted across about 20 application pools. The application connects to Oracle database.

QA team complains that the application responds slowly each time any of the modules (subfolders) accessed for the first time.

Using Process Explorer and profiler we identified three main latency points:

  • Creating the application pool – w3wp.exe.
  • Batch compiling the application for each subfolder.
  • Creating Oracle connection pool when Oracle is accessed for the first time.

The solution

We decided to create a Warmer – solution that will hit each subfolder’s page in unattended manner thus warming up the application before the first user hits it.

For the solution we used free tools from IIS resource kit:

  • LogPrser.exe to identify the URL’s of the pages to hit.
  • TinyGet.exe to actually hit the pages identified by LogParser.

To identify what pages to hit we took IIS log files from QA environment and than we ran the following query using LogParser:

LogParser.exe “SELECT DISTINCT STRCAT(‘XXX’, cs-uri-stem) AS cs-uri-stem-strcat INTO ‘C:\result.txt’ FROM ‘C:\yourIISlogFile.log’ WHERE INDEX_OF(cs-uri-stem, ‘aspx’) > 0″ -o:w3c

Notice XXX – it has nothing to do with XXX rated content rather it is a placeholder to replace it with tinyget command.

Open resulting yourIISlogFile.log file in Notepad, hit Ctrl+H for “Replace” and replace all occurrences of XXX with the following command:

tinyget -srv:www.YourServer.com -uri:

ASP.NET Batch Compilation

yourIISlogFile.log before the Replace:

image

yourIISlogFile.log after the Replace:

image

Remove the header and save the file with BAT extension  – your Warmer is ready for action. Run it each time you deploy new version.

Do not forget to remove old temporary files in ASP.NET temporary folder:

C:\Windows\Microsoft.NET\Framework\<<NET FX VERSION>>\Temporary ASP.NET Files\

CAUTION. This action may potentially corrupt your application if you do not provide proper exception handling. On one hand it is good check to make. on other hand – be aware of it and do not do it on production sites unless you are completely sure it will not corrupt the application.

Measuring ASP.NET Performance Using Counters

Following is a list of performance counters I am usually taking to spot low hanging fruits when measuring ASP.NET performance:

Resource utilization

\.Processor\%Processor Time
\.NET CLR Memory(*)\Allocated Bytes/sec
\.NET CLR Memory(*)\% Time in GC
\.NET CLR Exceptions(*)\# of Exceps Thrown / sec
\.NET CLR Loading(*)\Current Assemblies

Throughput

\.NET CLR LocksAndThreads(*)\Contention Rate / sec
\.NET CLR LocksAndThreads(*)\Current Queue Length
\ASP.NET\Requests Queued
\ASP.NET\Request Wait Time
\ASP.NET\Requests Current
\ASP.NET Applications\Requests In Application Queue
\ASP.NET Applications\Pipeline Instance Count
\ASP.NET Applications\Requests Executing
\ASP.NET Applications\Requests/Sec
\Web Service\Current ISAPI Extension Requests

Response time

\ASP.NET\Request Execution Time

SQL Server

SQL Server: General Statistics\Logins/sec
SQL Server: General Statistics\Logouts/sec
SQL Server: General Statistics\User Connections

Detailed explanation about each counter and its significance can be found here: Chapter 15 — Measuring .NET Application Performance

I particularly love the following diagram from the guide. The diagram helps brainstorming while identifying root cause of the performance issues identified using the counters:

clip_image001

Related Book

My Related Posts

Free Web Performance Tools From Microsoft, Google, Yahoo, And IBM

This post is a quick overview of free performance tools available from Microsoft, Yahoo, Google, and IBM. It also contains a pointers to related articles that go deeper regarding the best practices and how the tools can help in identifying compliance to the best practices.

Quick Resource Box:

Microsoft’s Fiddler

Fiddler is a free web performance tool, it is not really a property of Microsoft rather a side project by Eric Lawrence, a PM with Microsoft. I used Fiddler for both security testing and now for performance. I love it a lot. Must mention it requires Net Fx 2.0 as a prerequisite so it is limited to Windows OS. Recently Eric added support to Firefox – Fiddler Hook For Firefox, so the tool is great for both IE and FF. My related posts:

Microsoft’s VRTA

VRTA is a free web performance tool and it stands for Visual Round Trip Analyzer created by Microsoft’s Jim Pierson and used internally for sometime. It was made available for public use during last PDC 2008. Jim has written very detailed article about the tool and how it solves performance problems - 12 Steps To Faster Web Pages With Visual Round Trip Analyzer. VRTA installs and uses under the hood free Microsoft Network Monitor (Netmon) to capture and analyze network captures.

Yahoo’s YSlow

YSlow is a free performance analysis tool created by Steve Souders when he was with Yahoo. Steve created another good tool called Cuzilion. YSlowl comes with extremely good set of performance guidance that can be found here - rules for high performance web pages. YSlow requires Firebug as a prerequisite, meaning it is restricted to Firefox only.

IBM’s Page Detailer

Page Detailer is a free web performance tool from IBM. I was not able to identify any good articles that cover it – if you share with me please, or better off publish one. It does not have any prerequisites, consider it as an advantage.

Google’s Page Speed

Recently I stumbled on Page Speed from Google. It reminds me a Yahoo’s YSlow lot  that makes me believe it comes from Steve Souders that works now for Google. It also requires Firebug as a prerequisite and works with FireFox only. It comes with nice guidance found here - Web Performance Best Practices. Must admit – I adore the concept of the tool although in most cases I cannot use it as I work for customers that IE is their target browser. Nevertheless the guidance is tool agnostic and I recommend bookmarking it for quick reference.

The Case of the Slooooow System

A few weeks ago my wife complained that her Vista desktop was not responding to her typing or mouse clicks. Given the importance of the customer, I immediately sat down at the system to troubleshoot.  It wasn’t completely hung, but extremely sluggish. For example, the mouse moved and when I clicked on the start button the start menu opened after about 30 seconds. I suspected that something was hogging the CPU and likely could have resolved the problem simply by logging off or rebooting, but knew that if I didn’t determine the root cause and address it, she’d likely be calling on my technical support services again in the near future. In any case, stooping to that kind of troubleshooting hack is beneath my dignity. I therefore set out to investigate.

My first step was to run Process Explorer to see which process was using the CPU. After a few minutes Process Explorer finally appeared and showed that not one, but two processes were involved, each consuming 50% of the CPU: Iexplore.exe and Dllhost.exe. Iexplore is Internet Explorer (IE) and I suspected that IE itself wasn’t the problem, but that it was a browser helper object (BHO), ActiveX control, or some other plugin loaded into IE. Similarly, Dllhost.exe is the host process for out-of-process COM server DLLs, so it was probably not at fault, but the COM server loaded into it. Both required digging deeper and I decided to tackle IE first.

In order to try and get some CPU headroom in which to operate, I suspended the Dllhost process by selecting it in Process Explorer, right-clicking to open the process context menu, and selecting the Suspend entry:

image

That put the Dllhost process to sleep and, as I expected, that freed up 50% of the CPU. That’s because the computer was a dual-core system and so to consume 100% of the available CPU cycles a process would have to have two threads, each hogging one of the cores. Most bugs I’ve seen that result in the CPU being pegged are caused by a single thread.

Processes don’t execute code, threads do, so I needed to look inside the IE process to see what thread or threads were running. I double-clicked on Iexplore.exe in Process Explorer to open its process properties dialog and switched to the Threads page. Several threads were running, but one was dominating the CPU:

image

From past experience I knew that Ieframe.dll was part of IE, but to be sure I clicked on the modules button on the Threads tab of the Properties dialog and switched to the Details page of the resulting Shell properties dialog:

image

The description didn’t give me a clue as the thread’s specific purpose, so I moved to the second clue about the thread, its start function. Because I had configured Process Explorer to retrieve symbols for Windows images from the Microsoft symbol server in Options->Configure Symbols, Process Explorer showed the name of the function where each thread began executing. Sometimes the DLL or function where a thread starts executing is enough to identify the thread’s purpose or the software causing a problem. In this case, the thread began in a function named CTablWindow::_TabWindowThreadProc. The function name hints that it’s the one in which the main thread of a tab starts running, but that still wasn’t enough to tell me why the thread was running so much; I needed to dig even deeper and look inside the thread to see where it was executing.

To look at what the thread was up to, I double-clicked on it in the Threads list to open the Thread Stack dialog, which shows the functions on the thread’s stack. A stack is essentially an execution history, where each function listed called the one above it on the list and the function at the top of the list is the one most recently executed by the thread at the time of Process Explorer looks at the stack. I scrolled through the list, looking for frames that referenced 3rd-party DLLs or Microsoft IE plugins, since they would be far more likely to have a bug than IE’s own code. Sure enough, I found frames pointing at a popular 3rd-party ActiveX control, Adobe Flash:

image

Just to be sure that I hadn’t happened to catch Flash running when a different component was using most of the CPU time, I closed and reopened the stack dialog several times, but all of them pointed at Flash.

The first thing I do when I suspect that some software is causing a problem is to check the vendor’s web site to make sure that I have the latest version. I opened the Process Explorer DLL view and looked at Flash.ocx’s version, went to Adobe’s site and looked at the version of the current Flash download, and they were the same.

I was at a dead end. I couldn’t know for sure if Flash had a bug or, more likely, there was a Flash application that had a bug, nor could I be sure that the problem wouldn’t recur. I tried to determine which site was hosting the Flash content by closing tabs one by one, but when I had close them all the thread was still running.

At this point the only options I had were to uninstall Flash and leave my wife with a degraded web experience, or terminate IE to stop the current CPU usage and hope that it wouldn’t happen again. I chose the latter and the case remains open. Since investigating this I’ve seen the same Flash behavior again on my wife’s system and on my own, so have been vigilantly watching the Adobe site for a new version just in case its due to a bug in Flash itself. I was disappointed that there was no actionable result of the investigation, but at least I knew what had caused the CPU usage.

I now turned my attention the Dllhost problem with the hope that I’d meet with better success. Process Explorer lists in a tooltip the component or components loaded into hosting processes like Svchost.exe (the Windows service host process), Rundll32 (the Control Panel applet hosting process), Taskeng.exe (the scheduled task hosting process on Vista and Server 2008), and Dllhost.exe. I moved the mouse over Dllhost.exe to see what COM server it was running:

image

It was running the Thumbnail Cache COM server, whose job it is to create Explorer thumbnails for image and media files. It is part of Windows, so once again I had to look inside the process for more clues. I resumed the Dllhost process I had suspended earlier and opened the process properties threads page:

image

The thread consuming the most CPU in this case started in Quartz.dll’s ObjectThread function. I looked at its properties and saw that it was another Windows DLL, the DirectShow Runtime, with a generic function name:

image

Next, I double-clicked to look at the thread stack:

image

The first few frames were in User32.dll and Ntdll.dll, core Windows system DLLs, but frames 4-7 are in the Sonicmp4demux.ax (“.ax” is an extension commonly used for DirectShow filters), a 3rd-party component. The function names for those frames were the same and didn’t make sense because the Microsoft symbol server only stores symbols for software included in Windows. Several more stack snapshots confirmed that it was the code causing the CPU usage.

Now that I had my suspect, the next step was to check for a newer version. But first I had to figure out what software the DLL came with, which was harder than it seemed. I opened the DLL view to take a closer look at the version information, but the description didn’t reveal anything:

image

There were no folders in the Start menu or items in the Add/Remove Programs list with Sonic in the name. I Windows-Live-searched (I expect that word to be added to Webster’s any day now) for Sonic and found that it’s part of the Roxio’s CD and DVD authoring software suites. I looked in the start menu and sure enough, found a Roxio folder:

image

I ran the Roxio software to check its version number and discovered that the Creator application includes a built-in facility to check for updates. I ran it, but it came up empty:

image

I checked the Roxio web site just to be sure and it turned out there was a newer version that the built-in updater hadn’t offered, perhaps because the update, according to the page, didn’t offer anything new:

image

I downloaded it anyway (all 640MB of it!) and waited the 15 or so minutes for it to install. Then I checked the version information of Sonicmp4demux.ax to see if it was newer, but its version number, 1.4.402.60802, was the same as the one I’d seen in the DLL view and the file was two years old:

image

I could have uninstalled the software, which would ensure that the problem wouldn’t return, but I wanted to keep Roxio for its DVD authoring functionality. I didn’t care if I didn’t get thumbnails for Roxio-specific image formats – I wasn’t even sure there were any I’d ever see in Explorer – so I set out to see if I could disable just the Sonic demultiplexer. I could have searched the Registry for the DLL name, which is surely where it was registered, but that’s a brute-force approach and if there were indirect or multiple references I could easily end up disabling more than just its thumbnail generation and possibly breaking something in Windows.

Process Monitor was the perfect tool for the job. Because I didn’t know when the problem might reoccur – it might takes days to reproduce – I didn’t want to just run it and let it consume all available virtual memory or disk space, so I set the History Depth in the Options menu to have Process Monitor retain only the most recent 1 million events:

image

I also set an Include filter for paths matching C:\Windows\System32\Dllhost.exe, minimized it, and let my wife have the system back.

The next day I came home from work, sat down at the computer and saw from Process Explorer that Dllhost.exe was back at it, consuming 50% of the CPU. I suspect that because it’s a dual-core system, the problem had been showing up regularly, but my wife hadn’t noticed it because the remaining CPU capacity was enough to mask it (another good reason to buy multi-core processors!). I brought Process Monitor to the foreground and noted it had seen 114,000 Dllhost operations, which was obviously way too many to scan through individually. I searched for “sonicmp4″ and found a reference in a Registry query near the end of the trace:

image

The query is of a COM object registration for the demultiplexer. Because the COM object is a 3rd-party DLL, I was certain that that COM Class ID (CLSID) isn’t hard-coded into Windows, so I went back to the first entry in the trace and searched for “A7DD215″, the first few characters of the CLSID. The search found a match a few thousand operations earlier:

image

The CLSID was in the name of a Registry key under another COM object registration. I Windows-Live-searched (that just rolls off the tongue, doesn’t it?) for the parent CLSID and found this KB article that explains that the registry key is where DirectShow filters register: http://msdn.microsoft.com/en-us/library/ms787560(VS.85).aspx I took a look at the stack for the particular query to confirm that’s the reason Dllhost was reading from there:

image

I was now confident that I could simply rename the Sonic filter registration key to prevent its use. I never delete registry keys when performing this kind of troubleshooting just in case the change disables important functionality or somehow breaks something else. I had seen from the traces that the thumbnail cache generator had come across an AVI file that caused it to load the Sonic demultiplexer, a format Windows is obviously able to handle on its own, so I was pretty sure things would continue to work. After terminating the Dllhost and making the change, I browsed to the same folder, deleted the thumbnails, and confirmed that there was no reduced functionality as far as I could tell. I then used Roxio to successfully burn a DVD with a number of AVI files. This case was closed.

My wife’s system was now usable again, and though I wasn’t able to close the Flash-related part of the case, at least I knew the cause and could keep an eye out for updates. More importantly, by solving the Dllhost part of the case, even if Flash went crazy again, her system would still be usable and she wouldn’t be filing a critical support incident for it with me – thanks to Process Explorer and Process Monitor.

Field experience proves – the earlier performance is tackled in development lifecycle the better results achieved. Below are most frequent practices that were most helpful in my engagement with the customers.

Architecture/Design phase

How to find a memory leak

I got the following message in the rhino tools mailing list:

I am looking into Rhino-esb and NServiceBus for a smart client application. I was doing some stress testing lately and I noticed some very strange behavior with rhino-esb. I tried to send large numbers of requests at the same time (6000-10000) and the memory of my back-end when using rhino-esb was continuously rising.

Since RSB is in production for the last two or three years, that seemed suspicious. Luckily, there was a reproduction that I could run. I tried it out, and indeed, memory seems to be taken, in proportion to the number of messages sent. That had me worried, really worried.

I run the application under memory profiling (using JetBrains dotTrace), and tried it. Which gave me this:

image

I went Ouch! and Huh?! at the same time. The next step was to find who was holding those. Luckily, that was as easy as asking the profiler.

image

And a short hop to the code explained what was actually going on.

There is a LRU buffer there to prevent duplicate messages from being sent, and the default limit for the buffer is 10,000. And since the buffer is swept once every 3 minutes. It would look like a memory leak.

But what really pleased me wasn’t so much the answer, but how easy it was to figure it out.

Powered by WordPress | Theme: by 85ideas. Editor by Khoanguyen