[Fixed] Five servers inaccessible.

Posted in Downtime by

Dweb 13, Dweb 30, Dweb 33, Web 8, and Web 16 are currently inaccessible.

Update [05:05 PM CST]: The problem has been tracked down to The Planet’s Dallas 5 data center.

Update [06:15 PM CST]: The problem has been fixed.

-
-

Outage in H2 (Houston) data center (fixed)

Posted in Downtime by

2009-08-01 17:15 CDT – A network outage is affecting several servers in our Houston #2 data center.

The affected servers currently are: web61, web64, web65, web71, web72, web73, web75, web76, web77, web78, web82, web83 and mailbox2.

The support system is also affected.

We’ll post more inforation as it becomes available.

2009-08-01 17:45 CDT – The network issue has been resolved and all affected servers are back online.

-
-

Mail server refusing connections (fixed)

Posted in Downtime by

One of our mail servers is currently refusing connections. This is affecting the ability to log in to POP, IMAP and Webmail for some of our users. We are looking into the problem and hope to have it resolved soon.

2009-07-06 10:19 CDT – the mail login issue has been resolved.

-
-

Web 35 Down (fixed)

Posted in Downtime by

Web 35’s root partition has gone read only. We are looking
into it now.

[06:16 PM CST] Update: Web 35 is up and running again. There do not appear to be any software problems so we are running a diagnostic test on the server’s hardware.

[08:58 PM CST] Update: We are currently running fsck on Web 35.

[010:48 PM CST] Update: The fsck is complete and the server is back online.

-
-

Network issue in our datacenters

Posted in Downtime by

Several of the datacenters hosting our servers are having network issues and a large chunk of our servers is currently unreachable.

Update: the network issues are now fixed and all servers are back online.

Update 2009-05-13 10:38 CDT: the datacenters just experienced another brief network outage, but all servers are back online at this time.

-
-

Read-only filesystem on Web71 (fixed)

Posted in Downtime by

The filesystem on Web71 went into a read-only state a few minutes ago. We are currently rebooting the server and will investigate the problem further when it is back online.

Update 2009-05-07 08:43 CDT – Web71 is back online.

-
-

Problem on Web54 (fixed)

Posted in Downtime by

We’re currently investigating a problem that is affecting sites hosted on Web54. We’ll update this entry when we have more information.

Update 2009-05-07 11:33 CDT – Web54 is back online. The problem was high load due to excessive CPU utilization by our backup script. The problem with the script has been resolved.

-
-

Services down on several servers (fixed)

Posted in Downtime by

We are currently investigating an issue following an upgrade to CentOS-5.3. Services are down on several servers and we are currently working on restoring them. The servers are:

  • web23
  • web25
  • web27
  • web31
  • web32
  • web33
  • web34
  • web35
  • web36
  • web38
  • web39
  • web43
  • web46
  • web49
  • web52
  • web53
  • web55
  • web57
  • web58
  • web68
  • web75
  • dweb40

Update 2009-05-06 11:10 CDT: We are in the process of reinstalling some RPMs on the servers in order to bring them back online.

Update 2009-05-06 11.50 CDT: Two servers are back online (web36 and web52). We are continuing to work on the other servers.

Update 2009-05-06 12:39 CDT: Web32, Web35 and Web39 are back online. We are continuing to work on the other servers.

Update 2009-05-06 12:55 CDT: Web52 is down again.

Update 2009-05-06 13:23 CDT: Web25, Web27 and Web58 are back online. We are continuing to work on the other servers.

Update 2009-05-06 13:28 CDT: Web55 and Web68 are back online. We are continuing to work on the other servers.

Update 2009-05-06 14:06 CDT: Web38 and Web57 are back online. We are continuing to work on the other servers.

Update 2009-05-06 14:12 CDT: Web52 is back online. We are continuing to work on the other servers.

Update 2009-05-06 16:11 CDT: Web31, Web34 and Web46 are back online. We are continuing to work on the other servers.

Update 2009-05-06 16:48 CDT: Dweb40 and Web43 are back online. We are continuing to work on the other servers.

Update 2009-05-06 16:55 CDT: Web23 and Web53 are back online. We are continuing to work on the other servers.

Update 2009-05-06 17:04 CDT: Web33 and Web75 are back online. We are continuing to work on Web49 (the last server!).

Update 2009-05-06 17:26 CDT: Web49 is back online – that was the last server affected by the problem, so we’re calling this one fixed. Sorry for the trouble, folks!

-
-

Web73 down (fixed)

Posted in Downtime by

Web73 is currently down while we investigate some filesystem errors. We’ll update the post as soon as we have more information.

Update (12.40pm GMT): The filesystem on the server is corrupted beyond recovery so we’re going to do an OS reload and restore the data from backup. We’ll update this post with our progress.

Update (3.30pm GMT): We have now moved the server onto new hardware (in case the filesystem errors were hardware-related) and we have started copying all the data from backup.

Update (5.30pm GMT): The server is now back up with new hardware and the data from yesterday’s backup. Note that the RSA host key has changed so your SSH client may display a warning about it.

-
-

Drive replacement on web42 (fixed)

Posted in Downtime by

One of the drives on web42 died and we are currently rebuilding the RAID with the new drive. We will update this post once the server is back online.

2009-04-24 12:26 CDT – the drive rebuild is complete and Web42 is back online.

-
-