[Fixed] Web334 is offline

Posted in Downtime by

Web334 is offline from a kernel panic. Our sysadmins are looking into this now.

[27/02 20:57 UTC] The kernel has been upgraded and the server rebooted. We are monitoring for any issues.

-
-

[Fixed] Web34 is read-only

Posted in Downtime by

The filesystem on Web34 has gone read-only so we are booting the server into rescue mode and doing a full filesystem check.

[27/02 03:45 UTC] The fsck is running and is at 20% now.

[27/02 05:23] The fsck is over and the server is back and OK now.

-
-

[Done]Emergency maintenance on Web178, 26 February 2013

Posted in Downtime by

Web178 filesystem went read only, we’re about to take the server down to run a fsck on it. We will update this post as maintenance progresses.

2013-02-26 13:20 UTC: FSCK is now running, progress is at 22%

2013-02-26 13:47 UTC: FSCK is at 50%

2013-02-26 14:54 UTC: FSCK completed and the server is back to operational status.

2013-02-26 15:22 UTC: Filesystem went read only again, we’re currently investigating the issue and will report progresses on this post.

2013-02-26 16:07 UTC: The server is back at operational status.

2013-02-26 17:14 UTC: The filesystem has gone read-only again. We’re working to restore service and will update this post when we have more information.

2013-02-26 18:15 UTC: We are now running fsck on the machine again and it is at 57%.

2013-02-26 20:03 UTC: After the initial fsck finished the file system is still not returning a clean file system from our checks so we are running further file system checks.

2013-02-26 21:51 UTC: Our further checks have shown some filesystem inodes that were multiply referenced and need to be manually removed. We are working on this now.

2013-02-26 23:53 UTC: The inodes have been removed and we’re now running a final fsck on the file system to insure it is clean of all file system errors before booting.

2013-02-27 00:07 UTC: The server is now back online and functioning normally. We’ll continue to watch the server and its filesystem very closely to insure that all errors were resolved.

-
-

[Fixed]Network Problems with Web346

Posted in Downtime by

Web346 is unresponsive as its network is affected by what look likes a bandwidth saturation attack as of now.

We are working to mitigate this problem with our datacenter now.

2013-02-26 13:22 UTC: We are still working on this issue.

2013-02-26 15:07 UTC: We block all malicious traffic to Web346 and all systems are back and online at the moment

2013-02-27 00:16 UTC: The attack on Web346 has resumed, and service is intermittent at this time. We are working to mitigate the effects of the attack at this time.

2013-02-27 01:06 UTC: The attack as subsided and Web346 is stable at this time. We’ll leave this post open until this issue is fully resolved.

2013-02-27 06:25 UTC: The attack is back again and we are working on it.

2013-02-27 10:50 UTC  The attack is still going  and the datacenter has not been able to stop it so we have decided to move this server to a datacenter which has better protection against denial of service attacks like this one. We are backing up the data and prepping the new machine now.

2013-02-27 11:00 UTC  There is a lot of data to migrate to the new server. If you would like your account to be migrated as a higher priority open a support ticket.

2013-02-27 17:07 UTC Some accounts have been re-enabled and we’re working on the other ones

s2013-02-27 18:12 UTC: The new server is now online and fully enabled.

-
-

[Fixed] Network outage affecting multiple servers

Posted in Downtime by

The following servers are currently offline due to a networking issue with our upstream provider:  web245, web246, web247, web345, web346, web347, web348, web349

We’ll update this post when we have more information.

2013-02-25 22:31 UTC: web245, web246, web247, web345, web347, web348, and web349 are back online. Web346 remains offline at this time.

2013-02-25 23:55 UTC: web346 is booting to a kernel panic, so our system administrators are booting it to a rescue environment for further troubleshooting.

2013-02-26 00:57 UTC: web346 is back online with a new kernel.

-
-

[FIXED] MySQL Down on Web327

Posted in Downtime by

The MySQL service is currently down on Web327. We’re working to restore service and will update this post when we have more information

2013-02-23 1923 UTC: Users may see intermittent issues with the Webserver in general as we need to stop web services to bring MySQL back online.

2013-02-23 2308 UTC: Web327 is back on-line and MySQL service has been restored. The machine is now online and functioning normally

-
-

[Fixed]Web383 down due to kernel panic

Posted in Downtime by

Web383 went down due to kernel panic and is not coming back properly so we have taken it into rescue and are working on it.

We will add updates here as the work progresses.

[Feb 20 07:11 UTC 2013]  The issue seems to be the disks in RAID and we are working with the datacenter to get it fixed now.

[Feb 20 10:15 UTC 2013]  The disk is fine but the server is going into kernel panic on every reboot so we have booted it into rescue and are making an off-site backup of the data now.

[Feb 20 11:53 UTC 2013]  The backups are at 44% now.

[Feb 20 12:38 UTC 2013]  The backups are at 70% now.

[Feb 20 15:12 UTC 2013] The backups are complete. We’re working on restoring the server now.

[Feb 20 18:16 UTC 2013] We are still restoring user data to the server.

[Feb 20 19:56 UTC 2013] All data has been restored to the machine and the machine is now online and functioning normally.

-
-

[Fixed] Web55 and Web175 offline

Posted in Downtime by

Web55 and Web175 are offline. We’re working to restore service and will update this post when we have more information.

2013-02-18 21:20 UTC: both servers booted to a kernel panic, so we restarted them in rescue mode and are running filesystem checks on them at this time.

2013-02-19 00:35 UTC: Web175 is back online after a full filesystem check and the re-installation of many system packages. We’re still working to restore service on Web55.

2013-02-19 04:41 UTC: Web175 has experienced a kernel panic again and we are booting it back into rescue mode for repair. We’re still working to restore service on Web55.

2013-02-19 05:19:42 UTC: Web55 is back after re-installation of several core packages.

-
-

[Done] Time zone change on multiple web servers on Friday, 8th March 2013

Posted in Downtime by

In an effort to standardize the time zones used on all of our servers we will be changing the time zone on the following servers to UTC on Friday, 8th March 2013 at 9am UTC:

  • web314
  • web315
  • web316
  • web317
  • web318
  • web319
  • web320
  • web321
  • web322
  • web323
  • web324
  • web325
  • web326
  • web327
  • web328
  • web329
  • web330
  • web331
  • web332
  • web333
  • web334
  • web335
  • web336
  • web337
  • web338
  • web339
  • web340

Note that depending on which applications you use and how they are configured the timezone change may affect your applications. We have created some documentation on the change which is available here:

2013-03-08 09:05 UTC The timezone switch has been completed.

-
-

[Done] Time zone change on multiple web servers on Monday, 4th March 2013

Posted in Downtime by

In an effort to standardize the time zones used on all of our servers we will be changing the time zone on the following servers to UTC on Monday, 4th March 2013 at 9am UTC:

  • web219
  • web220
  • web223
  • web224
  • web225
  • web226
  • web300
  • web301
  • web302
  • web303
  • web304
  • web305
  • web306
  • web307
  • web308
  • web309
  • web310
  • web311
  • web312
  • web313

Note that depending on which applications you use and how they are configured the timezone change may affect your applications. We have created some documentation on the change which is available here:

2013-03-04 09:05 UTC The timezone switch has been completed.

-
-