Web65 has been taken offline for emergency maintenance. The server was experiencing severely high loads and not utilizing any SWAP space despite the OOM killer running. We’re investigating for hardware problems now.
2012-06-13 02:34 UTC: The server is back online but our testing and investigation isn’t finished so the server may go offline multiple times before the maintenance is complete.
2012-06-13 03:06 UTC: We’ve taken the server offline again to replace all of the RAM in the server to rule out RAM being a problem.
2012-06-13 03:43 UTC: The server is back online now and we’re closely monitoring the server to verify that the RAM swap has fixed the problems we were seeing.
2012-06-13 04:00 UTC: We’ll continue to monitor the server closely throughout the night but it looks like the hardware problems we found have been fixed with the RAM swap.
2012-06-13 04:32 UTC: The problems seem to have returned and we have had to reboot the server. We are still monitoring the server to find out the exact cause.
2012-06-14 12:49 UTC: The problem was with one of our backup subsystems. We have corrected it now and the server is stable.