When a Hub Transport/CAS Blows Up

Here’s a fun one that’s been sitting in draft status since 2013. I now vaguely recall the incident, but was apparently too tired from the evening’s work to complete the telling. The great part is that it leaves off with a cliffhanger ending. I’m tired of seeing it in my drafts, but there might be useful info for someone in here. Maybe you can help complete the story in the comments!

The original post…

(The exciting tale of our Windows 2003 R2, Exchange 2007 Hub Transport/CAS)

Finishing up Windows updates on the servers. Doing a last round of checks on all services. Hmmm… Can’t log into Outlook Web Access. Wow! Vsphere shows the CPU has been pegged non-stop since it was rebooted.

That realization came at about five-o-clock in the morning. OWA’s login screen loaded slowly, but would just spin and spin on the actual authentication process. As mentioned above, the CPU was pegged with countless DW20.exe processes. I was not familiar with the process name but quickly learned it was Windows Error reporting. Meaning something was crashing so often and so quickly, Windows was using all of its resources writing to the event log. Unfortunately, this meant there were no resources for me to actually troubleshoot.

Step 1, boot in safe mode and disable error reporting. I found several suggestions for disabling error reporting with registry values, but none of these worked. Thanks to http://blogs.msdn.com/b/rahulso/archive/2007/03/29/dw20-exe-was-stopping-us-from-taking-the-crash-dumps-in-w3wp-exe.aspx, I found the easiest way to stop reporting errors is this:

  • Open Advanced System Settings
  • Go to Advanced > Error Reporting
  • Uncheck Programs > OK > OK

With error reporting turned off and back in normal mode, I could actually start troubleshooting. The Application log had recorded errors something like this (Sorry, I could remote into work to get the exact error, but that would kill the precious bandwidth being consumed by Netflix right now):

.NET Runtime version 2.0.50727.3053 – Fatal Execution Engine Error

http://support.microsoft.com/kb/2540222 seemed to describe the problem, so I ran the hotfix. Unfortunately, this did not stop the errors. Next step, install .NET 3.5. Still no help. Business hours were getting closer. Finally, I decided to uninstall all versions of .NET and start over. It is helpful to know that you need to uninstall them in reverse order, starting with the latest version installed. Otherwise, you will get a dependency error. Once all versions were removed, I reinstalled 2.0, followed by 3.5 SP1.

Finally, a little success! OWA loaded at a normal pace. But, authentication attempts immediately timed out. Evidently, the reinstallation of .NET caused IIS to turn off ASP.NET scripting. This (http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/9fc367dd-5830-4ba3-a3c9-f84aa08edffa.mspx?mfr=true) was a quick fix. However…

And that’s where the draft left off. I remember getting a call late in the afternoon while trying to rest up because the CAS was acting up again. I believe a restart took care of it that time. This was all essentially the result of a corrupt .NET Windows update. But, what came after the “However…”? Have an idea? Throw it in the comments, because I haven’t the foggiest.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s