Web226 has been taken offline for emergency maintenance. The server was experiencing severely high loads which our monitors suggest may be related to an intermittent hardware failure (RAM or motherboard-related), and we are investigating this now.
2012-06-22 3:09 UTC: The server is back online temporarily, but we will be taking it offline again for further investigation.
2012-06-21 05:34 UTC: The server has now been taken offline to replace the RAM.
2012-06-21 05:57 UTC: After replacing the failed RAM module we’ve found several other RAM modules that were not reporting as failed before the replacement. We’re now working to bring the server back offline to replace all of the RAM modules.
2012-06-22 06:08 UTC: The server is now back online and functioning normally.
Web65 has been taken offline for emergency maintenance. The server was experiencing severely high loads and not utilizing any SWAP space despite the OOM killer running. We’re investigating for hardware problems now.
2012-06-13 02:34 UTC: The server is back online but our testing and investigation isn’t finished so the server may go offline multiple times before the maintenance is complete.
2012-06-13 03:06 UTC: We’ve taken the server offline again to replace all of the RAM in the server to rule out RAM being a problem.
2012-06-13 03:43 UTC: The server is back online now and we’re closely monitoring the server to verify that the RAM swap has fixed the problems we were seeing.
2012-06-13 04:00 UTC: We’ll continue to monitor the server closely throughout the night but it looks like the hardware problems we found have been fixed with the RAM swap.
2012-06-13 04:32 UTC: The problems seem to have returned and we have had to reboot the server. We are still monitoring the server to find out the exact cause.
2012-06-14 12:49 UTC: The problem was with one of our backup subsystems. We have corrected it now and the server is stable.