[Fixed] Web207 down

Posted in Downtime by

Web207 started having problems several minutes ago and has not come back online after a reboot.
Our system administrators are currently working on it.

Update [2011-07-30 12:16 p.m. UTC] – The problems on Web207 have escalated to the point that we are now reloading the operating system and re-installing the data from backup. We’ll keep this post updated with our progress.

Update [2011-07-30 1:31 p.m. UTC] – We are now restoring customer data to the server.

Update [2011-07-30 2:00 p.m. UTC] – We are still restoring customer data to the server. We will update this post as the progress continues.

Update [2011-07-30 15:10 p.m. UTC] – The machine is back to it’s normal operations.

-
-

New IP address for mx10.webfaction.com

Posted in Service change by

As of today our incoming email server mx10.webfaction.com has a new IP address: 67.228.154.2

-
-

[Done] Scheduled maintenance for Web192

Posted in Scheduled downtime by

Web192 will be taken down on July 19th at 07:00AM (UTC) for a disk replacement. Expected downtime should be less than one hour. We will update this post as the maintenance progresses.

2011-07-19 7.34AM UTC – The disk has been replaced and server is back to normal operation.

-
-

[Done] Scheduled maintenance for Web192

Posted in Scheduled downtime by

Our monitors have alerted us of a failed hard drive on Web192 we’ve scheduled a time to have the hard drive replaced. The maintenance will take place on July 15th at 11:30 a.m. UTC. We will update this post as the maintenance progresses.

2011-07-15 12:56PM UTC – The server is back online, the disk has been replaced and RAID is syncing.

-
-

[Fixed] Web192 down

Posted in Downtime by

2011-07-14 09:57 UTC Web192 stopped responding several minutes ago and has not come back online after a reboot. We’re investigating the problem and hope to have normal service restored shortly. We’ll update this post when we have more information.

2011-07-14 11:45AM UTC – We have taken the server offline and we are now running FSCK.

2011-07-14 11:56AM UTC – FSCK is now at 60%

2011-07-14 12:11PM UTC – FSCK is now at 90%

2011-07-14 12.25PM UTC – FSCK finished, the machine has rebooted and it’s back to normal operations.

-
-

[Fixed] Web192 down

Posted in Downtime by

Web192 stopped responding several minutes ago and has not come back online after a reboot. We’re investigating the problem and hope to have normal service restored shortly. We’ll update this post when we have more information.

2011-07-13 21:26 UTC – Web192 is not recognizing its boot device. We’ll continue to troubleshoot the problem.

2011-07-13 21:38 UTC – We’ve determined that the RAID controller on Web192 has failed – we’ll have it replaced ASAP.

2011-07-13 23:16 UTC – Web192’s RAID controller has been replaced and the server is now back online.

-
-

[Fixed] Disk problems on Web194

Posted in Problems by

Web194 currently has a failing disk. We are scheduling a replacement now and will take the machine down shortly for repair. We will update this post when we have more information.

Update [2011-07-11 23:23 UTC] – The server’s file system has entered a read-only state. We update this post when we have more information.

Update [2011-07-11 23:46 UTC] – We are bringing the server down to verify the file system so that the read-only state can be cleared. We will keep this post updated.

Update [2011-07-12 01:26 UTC] – The FSCK is currently still running.

Update [2011-07-12 02:15 UTC] – The FSCK is still running.

Update [2011-07-12 05:27 UTC] – The FSCK is still running.

Update [2011-07-12 07:05 UTC] – The FSCK is still running.

Update [2011-07-12 11:00 UTC] – Unfortunately we had to reboot the server in a rescue environment and restart a FSCK from there. FSCK is currently at 25%.

Update [2011-07-12 15:11 UTC] – FSCK is still running. It was very slow because the RAID array was being rebuilt at the same time on the machine. The RAID array is now done rebuilding so FSCK should get much faster.

Update [2011-07-12 17:11 UTC] – Unfortunately FSCK was still slow so we have decided to re-install the machine and restore all the data from backup.

Update [2011-07-12 19:11 UTC] – The operating system has been re-installed and we are in the process of setting the server up now.

Update [2011-07-12 20:36 UTC] – The server has been set up completely. We are now starting to restore customer data to the machine.

Update [2011-07-12 22:27 UTC] – We have restored all databases on the server. We are still restoring customer data to the server.

Update [2011-07-12 23:45 UTC] – We are still restoring customer data to the server.

Update [2011-07-13 01:25 UTC] – We are still restoring customer data to the server.

Update [2011-07-13 04:01 UTC] – We are still restoring customer data to the server. The first pass has finished and we are now verifying the integrity of the files.

Update [2011-07-13 05:25 UTC] – We are still restoring customer data to the server. The second pass has finished and we are now verifying the integrity of those files.

Update [2011-07-13 07:09 UTC] – Most user’s sites are now online.

Update [2011-07-13 07:28 UTC] – User logins are enabled and working.

Update [2011-07-13 08:42 UTC] – The server is now back to normal. In 8 years of business this is the first time we have such a long downtime on a server and we would like to apologize for that. The problem was a combination of a corrupted filesystem, a degraded RAID array and FSCK taking many times longer than usual. We will update our procedures to greatly reduce the downtime if this happens again: we’ll run FSCK before trying to rebuild the array (rebuilding the array can be done once the server is back online) and if FSCK is taking too long we’ll stop it much sooner and we’ll start re-installing the server and restoring the data from backups straight away.

-
-

[Done] Scheduled maintenance on Web26

Posted in Downtime by

Web26 will be taken down at 11:30 AM UTC on Tuesday, June 12th 2011 to repair its RAID controller. Expected downtime should be less than one hour. We will update this post as the maintenance progresses.

[Update] 11:48 UTC – We are now taking the server offline to fix the controller.
[Update] 12:06 UTC – The server is back online and responding to requests.

-
-

[fixed] Web20 down

Posted in Downtime by

Currently Web20 is down and not reachable after failing to come back from a restart. We are working to bring the server back online.

2011-07- 20:43 UTC: The server is now back online and responding to requests.

-
-