The filesystem on Web225 went read-only again. At this point we’re finishing to copy its data to a spare server and we’ll make that spare server the new Web225. We’ll update this post when the switch is done.
Note that the server’s main IP will change from 220.127.116.11 to 18.104.22.168. If you’re using external DNS servers for some of your domains you will have to update your records. If you’re using WebFaction’s DNS servers then you don’t need to do anything
[Update 11:13 UTC] The migration is over and the new machine is live at the IP 22.214.171.124. If you experience any problem with your sites on Web225 open a support ticket.
Web225 stopped responding several minutes ago and has not come back online following several attempted reboots. We’re working to restore service at this time and will update this post when we have more information.
2012-12-22 02:33 UTC: The server is still offline and not responding.
2012-12-22 03:51 UTC: We’ve booted the server into a rescue environment and we’re checking the state of the RAID, disks, and filesystems.
2012-12-22 05:01 UTC: We’re currently checking the filesystem for errors by running the fsck command.
2012-12-22 07:20 UTC: The file system checks have finished and the file systems are now in a clean state. We’re now working to get the RAID controller card replaced.
2012-12-22 10:14 UTC: At this time the server will not boot because of a bad hard drive. We’re replacing the hard drive that is causing the boot errors and we will continue working to bring the server back online as soon as possible.
2012-12-22 11:38 UTC: The hard drive has been replaced and server is now booting. However, the boot process is stopped at a kernel panic. We are working to resolve the cause of the panic now.
2012-12-22 12:39 UTC: The server is now booting correctly and the kernel panics have been resolved. We’re now working to restore full access to the machine and verify its functionality.
2012-12-22 14:17 UTC: In verifying the functionality of the server we’ve found that there was widespread corruption in both the PostgreSQL and MySQL databases. We’ve reinstalled and reinitialized both database servers and we’re now working to restore the databases from backups.
2012-12-22 15:20 UTC: We are still restoring the databases from backups.
2012-12-22 17:11 UTC: The MySQL databases have been mostly restored and we’re beginning to work on the PostgreSQL databases.
2012-12-22 19:11 UTC: The MySQL and PostgreSQL databases have been restored and the server is back online
2012-12-23 10:22 UTC: The server’s disks have gone read-only again and we are working on the server now.
2012-12-23 12:23 UTC: The server is back as of now but in case the problem returns we have started a copy full copy of the data to a spare server.
Web237’s disk went read-only shortly after the scheduled maintenance so we have taken it into rescue for an immediate fsck.
We will update this post as the fsck progresses.
[2012-12-19 10:40 UTC] The fsck is upto 10% now.
[2012-12-19 13:06 UTC] The fsck is still running
[2012-12-19 15:58 UTC] fsck is at pass 1C.
[2012-12-19 18:12 UTC] Unfortunately fsck failed to complete. At this point we are going to copy the data to a spare server. We will update this post as the copy progresses.
[2012-12-19 20:06 UTC] The copy of the data to a new server is at 15%
[2012-12-19 21:18 UTC] The copy of the data to a new server is at 25%
[2012-12-19 22:59 UTC] The copy of the data to a new server is still ongoing
[2012-12-20 00:50 UTC] The copy of the data to a new server is at 35%
2012-12-20 02:26 UTC: The copy of the data to a new server is at 45%
2012-12-20 03:14 UTC: The copy of the data to a new server is at 50%
2012-12-20 05:02 UTC: The copy of the data to a new server is at 60%
2012-12-20 09:38 UTC: The copy of the data is still going on.
2012-12-20 10:35 UTC: After hanging for a while the copy is now going at full speed again. It’s at 70%
2012-12-20 11:33 UTC: We are going to start re-enabling websites for accounts whose data has been copied onto the new server. If you would like your websites to be re-enabled as a higher priority open a support ticket.
2012-12-20 13:37 UTC: Some sites are re-enabled on the new Web237 machine and we’re working through the other ones. Please note that the machine’s main IP changed from 126.96.36.199 to 188.8.131.52 so if you’re using external DNS servers you will have to update your DNS records. If you’re using WebFaction’s DNS servers you don’t need to do anything.
2012-12-20 16:49 UTC: Most websites have new been re-enabled. We’re working on re-enabling the last few ones.
2012-12-20 22:19 UTC: All websites have now been re-enabled for several hours. We’re marking this issue as fixed and we’ll email a post-mortem to all Web237 users tomorrow.
The filesystem on web28 has continued to go into a read-only state, so we’re taking the machine offline for emergency maintenance. We’ll update this post as soon as we have more information.
2012-12-12 00:05 UTC: The filesystem repair is complete and Web28 is back online.
2012-12-12 09:53 UTC: The filesystem is still going read-only and we think this time we have isolated the problem which persists between fscks so we are taking the server down to fix it and run a fsck again.
2012-12-12 11:47 UTC: The filesystem has been fscked one again and the server is back.
On Friday December 14th, 2012 the IP address of mx8.webfaction.com (one of our inbound mail servers) will change from 184.108.40.206 to 220.127.116.11. The old IP will still be working for several days after that while the IP change propagates everywhere.