[Fixed] Network problems affecting multiple servers

Posted in Problems by

Network problems in the data center are currently causing degraded performance and intermittent outages on multiple servers. We’re looking into the problem and hope to have it resolved soon. We will update this post as the situation develops.

Update (7.13pm CST): Servers currently affected are: Web32, Web12, Dweb21, Dweb28, Web14, Dweb52, Web24, Web54, Dweb19, Web13, Mail4.

Update (7.30pm CST): Servers are now back online.

-
-

[Done] Scheduled work on Mail5 and Web11

Posted in Scheduled downtime by

We will be investigating the cause of visual alarms on mail5.webfaction.com and web11.webfaction.com at 30 Dec 2009 0400 GMT-6.

Depending on the severity we might have to take the server down and if so it will be down between a few minutes and a few hours. We will update this ticket as soon as we have more information tomorrow.

-
-

[Reminder] Kernel upgrade tomorrow.

Posted in Scheduled downtime by

See http://statusblog.webfaction.com/2009/12/14/kernel-upgrade/.

-
-

[Done] Kernel Upgrade

Posted in Scheduled downtime by

On Sunday December 20 at 11 AM CST we will begin rebooting the following servers to install a kernel upgrade:

  • Dweb19
  • Dweb21
  • Dweb23
  • Dweb26-dweb30
  • Dweb32
  • Dweb33
  • Dweb39
  • Dweb42-dweb47
  • Dweb49-dweb54
  • Mail1
  • Mail5-Mail8
  • Mailbox1-Mailbox4
  • Mamba
  • panel.webfaction.com
  • Taipan
  • Viper
  • Web21-Web34
  • Web36-Web43
  • Web46-Web48
  • Web50
  • Web52
  • Web54
  • Web56-Web66
  • Web68
  • Web69
  • Web71-Web83
  • Web85-Web89
  • Web91-Web100
  • Web102-Web108
  • Web110
  • We expect to be done by 7 PM CST and we do not expect any server to be down for more than 10 minutes.

    Update [06:35 PM CST]: Done

    -
    -

    [Fixed] Web51 is currently down

    Posted in Downtime by

    We are looking into the issue and will update this post when we get more info.

    Update 1: We rebooted the server and it is currently running fsck.

    Update 2: fsck is currently on a second pass

    [09:10 PM CST] Update: The HD appears to be failing so we are going to attempt to back up what we can and replace it.

    [10:50 PM CST] Update: The HD backup is underway.

    [1:25 AM CST] Update: The HD backup is still underway.

    [3:13 AM CST] Update: We are now going to reinstall the server and restore the data.

    [7:54 AM CST] Update: The restoration of the data is still ongoing.

    [1:17 PM CST] Update: The restoration of the data is still ongoing but some network issues between datacenters are slowing down the restoration. If you want to setup your site on another server in the mean time just open a ticket and we’ll give you a free extra plan on another machine.

    [3:52 PM CST] Update: The network issues between datacenters have been fixed and data restoration is now happening at full speed. Data restoration is about 50% complete.

    [6 PM CST] Update: The server is now back online. Open a ticket if you notice any problem with your account. We would like to apologize for the extended downtime. We are working hard to improve our procedures and our setup to minimize downtime when file system corruptions happen.

    -
    -

    [Fixed] Slow performance, intermittent SSH and Python problems on Web44

    Posted in Problems by

    Web44 is currently experiencing very high load as we are restoring one final very large database to that machine. The high load is causing slow performance and intermittent problems with SSH logins. We expect the restore to be complete between 5PM and 6PM US Central time, at which point normal service should be restored.

    We’re aware that there are some missing Python modules on Web44. Once the DB restore is complete, we’ll look into that issue.

    Update 1: Web44 is back online and is functioning normally.

    -
    -

    [Fixed] Web44 down

    Posted in Downtime by

    Web44 is currently down and we’re looking into it. We’ll update this post when we have more information

    Update 1: The problem appears to be related to the filesystem. We are currently running fsck on the server.

    Update 2: Unfortunately fsck didn’t fix the issue so we are going to re-install the machine and restore all data from backup

    Update 3: We are still recovering all the data from our backup servers. We are using our latest backup which is from less than 24h before Web44 went down.

    Update 4: The machine is now back to normal apart from a few large MySql database which are still being imported into MySql. Open a ticket is you notice any problem with your account.

    -
    -

    [Done] Kernel upgrades on RedHat servers tomorrow

    Posted in Scheduled downtime by

    We will be upgrading the kernels on all RedHat Enterprise Linux 4 servers tomorrow between 3pm GMT and 7pm GMT.

    The downtime on each server should only be a few minutes.

    The servers are mamba, krait, web1 to web20, dweb3 to dweb14 and mail1 to mail4.

    We’ll update this post once the work is done.

    Update [12:58 PM CST]: Done

    -
    -

    [Done] Scheduled downtime on Web104

    Posted in Scheduled downtime by

    We will be taking down Web104 tomorrow at 4am CST to replace a failing drive in the RAID array.

    The downtime should only last a few minutes. We’ll update this post once the work is completed.

    Update 11.50am GMT: The drive has been replace and the server is now back online

    -
    -

    [Fixed] Web5 down

    Posted in Downtime by

    Web5 is currently down and we’re looking into the issue.

    We’ll update this post when we have more information.

    Update: All websites on the server are now working but there is still an issue with some people not being able to access the server via SSH. We’re working on fixing the issue.

    2009-11-07 14:33 CST: SSH services should now be working for all users on Web5. Please open a support ticket if you have problems accessing Web5.

    -
    -