Summary
Recently we had several IIS application hang crashes caused by thread locks. The threads were locked by calls to Sytem.Diagnostic.PerformanceCounter.NextValue. What we found was that under certain circumstances calls into this routine can block a thread for up to 21 minutes. The circumstances all deal with failures to read the registry due to contention or IO failures.
The root cause of our problem (why are certain performance registry keys unavailable for read/update) is still unknown. So as a temporary measure we removed calls to this method from our IIS application.
Analyzing Thread Blocks
We were really fortunate to have people with the technical skills in house that were comfortable using tools like IISDiag and WinDbg. This saved us a call to Microsoft Technical support.
We were able to capture some process dumps using IISDiag. We got lucky here because we were doing some work on a server when the problem occurred. This allowed us to manually dump the W3WP processes.
Once we had the processes dumped we could use WinDbg to analyze the threads. We found the following pattern with a number of our threads as highlighted by the stack dump below.
00000000037a7768 0000000077d70616 ntdll!NtDelayExecution+0xa
00000000037a7770 000006427f877b4d kernel32!SleepEx+0xaf
00000000037a7810 000006427f540cc1 mscorwks!EESleepEx+0x2d
00000000037a7890 000006427fa2f399 mscorwks!Thread::UserSleep+0x71
00000000037a78f0 00000642751b67a0 mscorwks!ThreadNative::Sleep+0xf9
00000000037a7aa0 000006427f50dfb0 System_ni!System.Diagnostics.PerformanceMonitor.GetData(System.String)+0x1b28f0
00000000037a7af0 000006427f4360c2 mscorwks!ExceptionTracker::CallHandler+0x158
00000000037a7bf0 000006427f48e18a mscorwks!ExceptionTracker::CallCatchHandler+0x9e
00000000037a7c80 0000000077ee461d mscorwks!ProcessCLRException+0x25e
00000000037a7d20 0000000077ee650c ntdll!RtlpExecuteHandlerForUnwind+0xd
00000000037a7d50 000006427f550eea ntdll!RtlUnwindEx+0x238
00000000037a83d0 000006427f48e139 mscorwks!ClrUnwindEx+0x36
00000000037a88e0 0000000077ee459d mscorwks!ProcessCLRException+0x20d
00000000037a8980 0000000077ee60a7 ntdll!RtlpExecuteHandlerForException+0xd
00000000037a89b0 0000000077ef31ed ntdll!RtlDispatchException+0x1b4
00000000037a9060 0000000077d4dd50 ntdll!KiUserExceptionDispatch+0x2d
00000000037a9600 000006427f44e6e3 kernel32!RaiseException+0x5c
00000000037a96d0 000006427fa42e50 mscorwks!RaiseTheExceptionInternalOnly+0x2ff
00000000037a97c0 0000064278d7e0a6 mscorwks!JIT_Throw+0x130
00000000037a9970 0000064278cec5d0 mscorlib_ni!Microsoft.Win32.RegistryKey.Win32Error(Int32, System.String)+0x3a14a6
00000000037a99c0 00000642782f511d mscorlib_ni!Microsoft.Win32.RegistryKey.InternalGetValue(System.String, System.Object, Boolean, Boolean)+0x9f0130
00000000037a9a60 0000064275003f46 mscorlib_ni!Microsoft.Win32.RegistryKey.GetValue(System.String)+0x2d
00000000037a9ab0 0000064275003d25 System_ni!System.Diagnostics.PerformanceMonitor.GetData(System.String)+0x96
To the untrained eye this looks like a bunch of garbage (at least it did to me until I took the time to learn about WinDbg and stack dumps). What it is saying is that the thread has called the Win32 function Sleep from a try/catch exception handler in the System.Diagnostics.PerformanceMonitor.GetData routine. The exception originally occurred in the Microsoft.Win32.RegistryKey.InternalGetValue routine.
Putting the Thread to Sleep
So now we could see that System.Diagnostics.PerformanceMonitor.GetData was putting the thread to sleep. So with a little help from Red-Gate Reflector (this is one of those tools you must have in your bag as a .Net developer) I could see what is happening in that routine.
Below is a snipit from that routine as decompiled by Red-Gate Reflector.
catch (IOException exception) {error = Marshal.GetHRForException(exception);
switch (error) { case 6: case 0x6ba: case 0x6be: this.Init(); break; case 0x15: case 0xa7: case 170: case 0x102: break; default: throw SharedUtils.CreateSafeWin32Exception(error);}
num--;
if (millisecondsTimeout == 0) {millisecondsTimeout = 10;
}
else {Thread.Sleep(millisecondsTimeout);
millisecondsTimeout *= 2;
}
continue;}
So if one of the following Win32 Errors is the root cause for the IOException then the thread will go to sleep for some number of milliseconds.
| Error | Description |
| 6 | The handle is invalid. |
| 1726 | The remote procedure call failed. |
| 1722 | The RPC server is unavailable. |
| 21 | The device is not ready. |
| 167 | Unable to lock a region of a file. |
| 170 | The requested resource is in use. |
| 258 | The wait operation timed out. |
The thread is being put to sleep because the operation will be tried again (and again and again). That is because the routine has a “while” loop constructed.
int num = 0x11; int millisecondsTimeout = 0; int error = 0; new RegistryPermission(PermissionState.Unrestricted).Assert(); while (num > 0)So what we discovered is this routine will try up to 17 times before it will finally just give up and throw an exception back up the call stack.
Well 17 tries is not so bad, but what is bad is the bit of code that doubles the sleep timeout every time the while loop is executed. The first puts the thread to sleep for 2 milliseconds, not bad. But if you execute all 17 iterations the thread will block for a little over 21 minutes. Very bad.
Final Word
Ultimately we will find some sort of issue with our server that results in the registry keys or RPC calls to fail… so it will be on us. But whoever in the .Net Framework BCL team coded this routine to loop for 17 times and wait up to 21 minutes needs to be publicly flogged with hurtful words (which is what I am attempting to do in a pathetic way with this post). Hopefully this gets corrected in future versions.