[Fixed] Web348 down

Posted in Downtime by

Web348 stopped responding and has not come back online following several attempted reboots. We’re working to restore service at this time and will update this post when we have more information.

2012-12-29 23:12 UTC There is a hardware issue on the server preventing a successful boot. We’re currently performing a full scan of all the hardware to determine the exact cause.

2012-12-30 01:00 UTC The hardware issue has been resolved and the server is back online.

-
-

[Done] Web225 being migrated to new hardware

Posted in Downtime by

The filesystem on Web225 went read-only again. At this point we’re finishing to copy its data to a spare server and we’ll make that spare server the new Web225. We’ll update this post when the switch is done.

Note that the server’s main IP will change from 62.212.65.164 to 5.153.9.56. If you’re using external DNS servers for some of your domains you will have to update your records. If you’re using WebFaction’s DNS servers then you don’t need to do anything

[Update 11:13 UTC] The migration is over and the new machine is live at the IP 5.153.9.56. If you experience any problem with your sites on Web225 open a support ticket.

-
-

[Fixed] Web225 down

Posted in Downtime by

Web225 stopped responding several minutes ago and has not come back online following several attempted reboots. We’re working to restore service at this time and will update this post when we have more information.

2012-12-22 02:33 UTC: The server is still offline and not responding.

2012-12-22 03:51 UTC: We’ve booted the server into a rescue environment and we’re checking the state of the RAID, disks, and filesystems.

2012-12-22 05:01 UTC: We’re currently checking the filesystem for errors by running the fsck command.

2012-12-22 07:20 UTC: The file system checks have finished and the file systems are now in a clean state. We’re now working to get the RAID controller card replaced.

2012-12-22 10:14 UTC: At this time the server will not boot because of a bad hard drive. We’re replacing the hard drive that is causing the boot errors and we will continue working to bring the server back online as soon as possible.

2012-12-22 11:38 UTC: The hard drive has been replaced and server is now booting. However, the boot process is stopped at a kernel panic. We are working to resolve the cause of the panic now.

2012-12-22 12:39 UTC: The server is now booting correctly and the kernel panics have been resolved. We’re now working to restore full access to the machine and verify its functionality.

2012-12-22 14:17 UTC: In verifying the functionality of the server we’ve found that there was widespread corruption in both the PostgreSQL and MySQL databases. We’ve reinstalled and reinitialized both database servers and we’re now working to restore the databases from backups.

2012-12-22 15:20 UTC: We are still restoring the databases from backups.

2012-12-22 17:11 UTC: The MySQL databases have been mostly restored and we’re beginning to work on the PostgreSQL databases.

2012-12-22 19:11 UTC: The MySQL and PostgreSQL databases have been restored and the server is back online

2012-12-23 10:22 UTC: The server’s disks have gone read-only again and we are working on the server now.

2012-12-23 12:23 UTC: The server is back as of now but in case the problem returns we have started a copy full copy of the data to a spare server.

-
-

[Fixed] Emergency filesystem check on web237 since it went read-only

Posted in Downtime by

Web237’s disk went read-only shortly after the scheduled maintenance so we have taken it into rescue for  an immediate fsck.

We will update this post as the fsck progresses.

[2012-12-19 10:40 UTC] The fsck is upto 10% now.

[2012-12-19 13:06 UTC] The fsck is still running

[2012-12-19 15:58 UTC] fsck is at pass 1C.

[2012-12-19 18:12 UTC] Unfortunately fsck failed to complete. At this point we are going to copy the data to a spare server. We will update this post as the copy progresses.

[2012-12-19 20:06 UTC] The copy of the data to a new server is at 15%

[2012-12-19 21:18 UTC] The copy of the data to a new server is at 25%

[2012-12-19 22:59 UTC] The copy of the data to a new server is still ongoing

[2012-12-20 00:50 UTC] The copy of the data to a new server is at 35%

2012-12-20 02:26 UTC: The copy of the data to a new server is at 45%

2012-12-20 03:14 UTC: The copy of the data to a new server is at 50%

2012-12-20 05:02 UTC: The copy of the data to a new server is at 60%

2012-12-20 09:38 UTC: The copy of the data is still going on.

2012-12-20 10:35 UTC: After hanging for a while the copy is now going at full speed again. It’s at 70%

2012-12-20 11:33 UTC: We are going to start re-enabling websites for accounts whose data has been copied onto the new server. If you would like your websites to be re-enabled as a higher priority open a support ticket.

2012-12-20 13:37 UTC: Some sites are re-enabled on the new Web237 machine and we’re working through the other ones. Please note that the machine’s main IP changed from 108.59.11.111 to 75.126.24.82 so if you’re using external DNS servers you will have to update your DNS records. If you’re using WebFaction’s DNS servers you don’t need to do anything.

2012-12-20 16:49 UTC: Most websites have new been re-enabled. We’re working on re-enabling the last few ones.

2012-12-20 22:19 UTC: All websites have now been re-enabled for several hours. We’re marking this issue as fixed and we’ll email a post-mortem to all Web237 users tomorrow.

-
-

[Done]Scheduled Web305 migration

Posted in Downtime by

As scheduled we are starting the migration of Web305 to new hardware. During the migration services on the machine will be unavailable. We will update this post once the migration is over.

2012-12-18 23:48 UTC: The migration is now finished and the server is back online and functioning normally.

-
-

[Fixed] Web28 offline for filesystem repairs

Posted in Downtime by

The filesystem on web28 has continued to go into a read-only state, so we’re taking the machine offline for emergency maintenance. We’ll update this post as soon as we have more information.

2012-12-12 00:05 UTC: The filesystem repair is complete and Web28 is back online.

2012-12-12 09:53 UTC: The filesystem is still going read-only and we think this time we have isolated the problem which persists between fscks so we are taking the server down to fix it and run a fsck again.

2012-12-12 11:47 UTC: The filesystem has been fscked one again and the server is back.

-
-

[Done] Filesystem Corruption on Web28

Posted in Scheduled downtime by

Web28 has a dirty filesystem so we have scheduled a filesystem check for it at 2012-12-11 0600 UTC.

We will update this post as the check progresses.

[2012-12-11 06:04 UTC]  We have taken the server down to rescue mode to start the fsck

[2012-12-11 07:14 UTC]  The second pass of fsck is going on now.

[2012-12-11 07:53 UTC]  The fsck is over and the server is back.

-
-

[Done] IP address change for mx8.webfaction.com on December 14th

Posted in Downtime by

On Friday December 14th, 2012 the IP address of mx8.webfaction.com (one of our inbound mail servers) will change from 174.133.21.100 to 75.126.24.68. The old IP will still be working for several days after that while the IP change propagates everywhere.

-
-

[Completed]Scheduled Web304 migration

Posted in Downtime by

As scheduled we are starting the migration of Web304 to new hardware. During the migration services on the machine will be unavailable. We will update this post once the migration is over.

2012-12-04 01:57 UTC: The Web304 migration is over and the server is working normally.

-
-

[Complete]Scheduled Web303 migration

Posted in Downtime by

As scheduled we are starting the migration of Web303 to new hardware. During the migration services on the machine will be unavailable. We will update this post once the migration is over.

2012-12-04 00:48 UTC: The migration of Web303 has finished and the server is now back online and functioning normally.

-
-