Here’s a fun one that’s been sitting in draft status since 2013. I now vaguely recall the incident, but was apparently too tired from the evening’s work to complete the telling. The great part is that it leaves off with a cliffhanger ending. I’m tired of seeing it in my drafts, but there might be useful info for someone in here. Maybe you can help complete the story in the comments!
The original post…
(The exciting tale of our Windows 2003 R2, Exchange 2007 Hub Transport/CAS)
Finishing up Windows updates on the servers. Doing a last round of checks on all services. Hmmm… Can’t log into Outlook Web Access. Wow! Vsphere shows the CPU has been pegged non-stop since it was rebooted.
That realization came at about five-o-clock in the morning. OWA’s login screen loaded slowly, but would just spin and spin on the actual authentication process. As mentioned above, the CPU was pegged with countless DW20.exe processes. I was not familiar with the process name but quickly learned it was Windows Error reporting. Meaning something was crashing so often and so quickly, Windows was using all of its resources writing to the event log. Unfortunately, this meant there were no resources for me to actually troubleshoot.
Step 1, boot in safe mode and disable error reporting. I found several suggestions for disabling error reporting with registry values, but none of these worked. Thanks to http://blogs.msdn.com/b/rahulso/archive/2007/03/29/dw20-exe-was-stopping-us-from-taking-the-crash-dumps-in-w3wp-exe.aspx, I found the easiest way to stop reporting errors is this:
- Open Advanced System Settings
- Go to Advanced > Error Reporting
- Uncheck Programs > OK > OK
With error reporting turned off and back in normal mode, I could actually start troubleshooting. The Application log had recorded errors something like this (Sorry, I could remote into work to get the exact error, but that would kill the precious bandwidth being consumed by Netflix right now):
.NET Runtime version 2.0.50727.3053 – Fatal Execution Engine Error
http://support.microsoft.com/kb/2540222 seemed to describe the problem, so I ran the hotfix. Unfortunately, this did not stop the errors. Next step, install .NET 3.5. Still no help. Business hours were getting closer. Finally, I decided to uninstall all versions of .NET and start over. It is helpful to know that you need to uninstall them in reverse order, starting with the latest version installed. Otherwise, you will get a dependency error. Once all versions were removed, I reinstalled 2.0, followed by 3.5 SP1.
Finally, a little success! OWA loaded at a normal pace. But, authentication attempts immediately timed out. Evidently, the reinstallation of .NET caused IIS to turn off ASP.NET scripting. This (http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/9fc367dd-5830-4ba3-a3c9-f84aa08edffa.mspx?mfr=true) was a quick fix. However…
And that’s where the draft left off. I remember getting a call late in the afternoon while trying to rest up because the CAS was acting up again. I believe a restart took care of it that time. This was all essentially the result of a corrupt .NET Windows update. But, what came after the “However…”? Have an idea? Throw it in the comments, because I haven’t the foggiest.