Two weeks ago, the morning after my biggest client (a large mining company) applied patches to all their servers, one of them failed to come back up. It would boot to the Windows logo screen and would just keep churning through… I booted into safe mode using their iLo and during the listing of files being loaded, it stopped after ACPITABL.DAT. Naturally I googled and found quite a few references to this type of problem. Most of them related to USB devices or hardware failure. We verified that there were no USB devices connected to the server which ruled that out. Nothing we did remotely worked to resolve it. In the end we sent one of the clients techs to site – 5 hour drive from Brisbane after 1.5hr flight from Sydney. When the tech attended the site and we went through a DR where the first step was to install the base Windows Server 2003 OS again and it failed during that process. Ok – we figured hardware failure, so the tech quickly grabbed the server box (It was a HP ML350G4) and brought it back to Brisbane where there are a better range of HP techs to help along with a more plentiful supply of parts.
Once back in Brisbane, they went through the process of installing the base Windows Server 2003 environment and it all worked fine. They ran HP diagnostics on it for a day without fault. They then restored the tape from the night before the patches and everything still worked. Ok – very strange, no obvious hardware issue, but given it passed diagnostics and ran flawlessly through many reboots, we had not choice but to say it was good.
They packed the server box up and took it to site – 5 hours drive away. Once onsite they reconnected everything they had left there and turned on the power… and then it hung at exactly the same spot as before. It was at that point they the tech was about ready to go and jump off the mines cliffs. 🙂
They stood back and looked at the situation from a different perspective. What had changed. They’d taken the server box to Brisbane and back… but they did not take the monitor, keyboard, mouse, UPS, Network switch or any other things not directly related to getting the server running again. CLICK… a light bulb went on in the tech’s head. He disconnected the UPS, Network switch, keyboard and mouse – he connected a new keyboard and mouse from another computer and it booted back up. Huh!! he ultimately found that the mouse was causing the server to stall during the boot process. The mouse is a normal HP PS/2 mouse. I would have NEVER picked that. What’s more is that this only occurred after the reboot from the April patches.
Can anyone explain this?