[Fixed] Emergency maintenance on Web385

Posted in Downtime by

Web385 stopped responding, and so we are booting into a rescue environment and we will soon perform a filesystem check.  We will update this blog as maintenance progresses.

Fixed: The file system check is complete and the server is back online.

-
-

[Fixed] Web311 downtime

Posted in Downtime by

Web311 stopped responding and did not come back as expected following a reboot. We are working on bringing it back ASAP.

Update: web311 is undergoing a fsck now.

Fixed: Normal operation has been restored.

-
-

[Fixed]Web303 down due to DDOS

Posted in Problems by

Web303 is experiencing a massive DDoS attack which our DDoS protection system is having trouble mitigating.

We’re doing everything we can to restore service at this time, and will update this post as soon as we have more information.

Update 23:27 UTC: The DDoS attack is back. We are working to restore service.

Update 03:25 UTC: The condition has improved. We are still investigating and working with the up-stream provider to stop the attack completely.

Update Sep 29 11:12 UTC: The attack has been mitigated and service restored. The server has been stable for the last hour. We are still monitoring the server.

-
-

[Done] Web391 down for RAID cable replacement

Posted in Downtime by

Web391 has been taken down for a RAID cable replacement by the datacenter due to some problems we detected last time it was rebooted. We will update this post as the maintenance progresses.

[Sep 28 04:21] The server is back and working correctly now.

-
-

[Done] Memory upgrades on Web354, Web357 and Web360 on Saturday and Sunday

Posted in Scheduled downtime by

Web354, Web357 and Web360 will be taken down for a memory upgrade. The downtime for each upgrade should be less than 30 minutes. The upgrades will happen at the following dates / times:

Web354: Saturday, September 28th at 22:00 UTC

Web357: Saturday, September 28th at 23:00 UTC

Web360: Sunday, September 29th at 01:00 UTC

Update: the upgrades are now done

-
-

[Fixed] Emergency maintenance on Web233

Posted in Downtime by

Web233 stopped responding, and so we have rebooted into a rescue console and we are performing a filesystem check.  We will update this blog as maintenance progresses.

Update 23:09 UTC: Some of the data on the server appears to be corrupted so we are migrating the server to new hardware. The new server is already set up and we have started transferring data to it. The new server’s main IP is 108.168.213.86

Update 01:00 UTC: The data transfer is still ongoing

Update 03:45 UTC: The data transfer is still ongoing

Update 06:35 UTC: The transfer of the home directories is still ongoing

Update 10:07 UTC: All MySql and PostgreSQL databases have now been recovered on the new server. Home directories for usernames starting with the letters “a” to “q” have been recovered and we are working on recovering the rest of the home directories. The main services (MySql, PostgreSQL, Apache, Nginx) have been started on the new server.

Update 16:05 UTC: We are finishing to restore the last few home directories (starting with “w”)

Update 17:46 UTC: All data has been recovered and service is back to normal. If you notice any problem with your sites open a ticket and we’ll look into it asap. Note that if you use non-WebFaction DNS servers for your domains you will have to update them to point your domains at the new IP (108.168.213.86)

-
-

Connectivity problems on web249

Posted in Downtime by

Web249 is currently experiencing intermittent network connectivity. The cause is  a denial-of-service attack against one of the sites hosted on that server. We have activated upstream DDOS mitigation, which has increased connectivity, but intermittent outages may still happen. We will continue to monitor and update this post as conditions warrant.

-
-

Intermittent outages due to load spikes on Web23

Posted in Downtime by

Web23 is experiencing a series of intermittent load spikes which is causing the machine to become non-responsive. We’re working to resolve the issue at this time and will update this post when we have more information.

-
-

[Done]Emergency maintenance on Web309

Posted in Downtime by

Web309 filesystem is showing errors. We’re taking the server down to run a fsck.  We will update this blog post as maintenance progresses.

2013-09-18 15:49 UTC While checking the status of server’s filesystem we found hardware issues we’re currently investigating.

2013-09-18 16:05 UTC Hardware issue solved, fsck is now running.

2013-09-18 16:48 UTC:  We are 46% of the way through on the first pass of the file system check.

2013-09-18 17:49 UTC: The file system check is now at 54.7% of the first pass.

2013-09-18 19:38 UTC: The file system check failed to return a clean file system. At this time we’ve decided to migrate the good data from the old machine to a new server that is in the process of being set up.

2013-09-18 21:23 UTC:  The data from the old server is now being transferred to the new server.

2013-09-19 00:58 UTC: We’re still transferring the data from the old server to the new server.

2013-09-19 05:46 UTC: MySQL, Postgresql and Cronjobs have been restored already so sites should be coming back up as the home directory restore progresses, it is at 30% now. Please file a support ticket if you notice any problems with the restored data.

2013-09-19 10:02 UTC: 53% of the data has been  transferred, please note the data is being restored alphabetically and it is upto “J” now, if your username comes very late please let us know by filing a support ticket to jump ahead in the queue.

2013-09-19 15:14 UTC: Approximately 92% of the data has been transferred and it is up to “P” now.

2013-09-19 18:19 UTC:  The transfer of data from the old server has finished.

-
-

[Fixed] Drive cable replacement on Web392

Posted in Downtime by

Web392 has been taken down for an emergency drive cable replacement. We will update this post as the maintenance progresses.

[Sep 18 07:04] The server is back and is OK now.

-
-