Servers down after running up2date (fixed)

Posted Nov 20 at 04:01 CDT by Remi in Downtime  - 0 comment(s)

For an unknown reason, a dozen of our servers went down after we applied the latest up2date patches. We're currently working on getting these servers back up ASAP and will update this post as soon as the problem is fixed.

The update went well on all the other servers and our test servers.

Update 1: Several of the servers are back up now. We're working our way through the rest.

Update 2: All servers apart from Krait are back to normal. Downtime was between 30 minutes and 2 hours depending on the server. The problem was that sshd got misconfigured after up2date and it didn't come back after a reboot. In the future we will apply up2date patches to servers gradually to avoid these problems. Krait is taking longer to come back up because it runs an older version of sshd and it's taking longer to fix it. We will update this post when Krait is back to normal.

Update 3: Krait is now back to normal.