[Done]Scheduled Maintenance on Mail4, June 12th 2012.

Posted in Scheduled downtime by

Mail4 will be taken down for a RAM swap Tuesday June 12 between 08:00 UTC and 11:00 UTC. We will update this post as the maintenance progresses.

2012-06-12 08:29 UTC The RAM has been swapped and server is back at operational status.

-
-

[Done]Scheduled Maintenance on Web134, June 11th 2012.

Posted in Scheduled downtime by

Web134 will be taken down for a disk drive replacement Monday June 11th at 13:00 UTC. We will update this post as maintenance progresses.

13:04 UTC the disk has been swapped and the RAID is rebuilding. The server is back at operational status.

-
-

[Fixed] Web320 sites offline

Posted in Problems by

Websites hosted on Web320 are currently offline while we troubleshoot a networking problem. We hope to have service restored shortly and will update this post when we have more information.

2012-06-09 11:27 UTC: Web320 sites are back online.

-
-

[Done] Emergency maintenance on Web140

Posted in Downtime by

Web140 ‘s file system has gone read-only. We’re taking the server down to perform a full FSCK. We’ll update this post as the FSCK progresses.

[2012-06-07 7:45 UTC] The FSCK is at 80%
[2012-06-07 8:20 UTC] The server is back online
-
-

[Done]Scheduled Maintenance on Web166, June 7th 2012

Posted in Scheduled downtime by

Web166 will be taken down for a disk drive replacement Thursday June 7th between 08:00 UTC and 08:30 UTC. We will update this post as maintenance progresses.

2012-06-07 08:40 UTC The disk has been replaced and RAID is rebuilding. The server is back at operational status.

-
-

[Done]Web143 SSH problems

Posted in Problems by

Web143 is currently not accessible via SSH. We’re looking into the problem and will update this ticket as we have more information.

2012-06-05 00:00 UTC: Upon further inspection the RAID controller and/or hard drive is causing the server to be offline. We’re investigating further. We’ll update this post as more info is available.

2012-06-05 01:33 UTC: We’ve replaced the failing hard drive and we’re now replacing the RAID controller and updating it’s firmware.

2012-06-05 02:14 UTC: The RAID controller and hard drives are now functioning correctly. We’re running a FSCK on the machine now to correct  file system errors.

2012-06-05 03:45 UTC: The FSCK is still running. We’ll post more information as the FSCK progresses.

2012-06-05 04:56 UTC: The FSCK was unable to complete due to an infinite loop encountered in the process. We’ve now brought the server back online with a read only file system to perform as complete a backup as possible to minimize chances of data loss.

2012-06-06 07:24 UTC: We’re still retrieving files from the machine.

2012-06-06 08:25 UTC: We’ve requested a full chassis swap and the OS be reloaded on the new equipment.

2012-06-06 11:21 UTC: The OS reload is running currently.

2012-06-06 12:27 UTC:  The OS is still being reloaded onto the machine.

2012-06-06 14:25 UTC:  The OS installation is on it’s last steps. Once it’s finished we’ll begin installing our platform tools and transferring user data back to the machine.

2012-06-06 15:31 UTC: The OS installation is finished and we are installing our platform tools.

2012-06-06 16:50 UTC: Our setup has finished and we’re now transferring user data back to the machine.

2012-06-06 18:35 UTC: The user data is still transferring, currently the MySQL databases have been restored and we’re working on the PostgreSQL databases as the other data transfers.

2012-06-06 18:57 UTC: The PostgreSQL databases have now been restored and we’re getting close to the end of the user files to be transferred.

2012-06-06 20:13 UTC: All user files have been transferred to the server. We’re now verifying that all files transferred off the machine have been transferred back to the machine.

2012-06-06 21:12 UTC:  The files have been verified. We’ve noted some spots of corruption and fixed them. We’re now correcting file system permissions.

2012-06-06 21:42 UTC: The server is now back online and resuming normal function. Please log into the server and verify that your apps and files are working as expected.

-
-

[Done]Emergency maintenance on Web223, June 5th 2012.

Posted in Downtime by

Web223 had one disk failing. After we replaced the failing drive, the filesystem needed a fsck. This is now at 70% done. We will update this post as maintenance progresses.

2012-06-05 15:32 UTC: The RAID controller is showing further issues which we’re now investigating. We will update this post as the situation progresses.

2012-06-05 18:59 UTC: Web223 is back online. The disk array is rebuilding, so load is somewhat elevated at this time, but customer sites should be working (albeit a bit slowly). We’ll update this post when the rebuild is complete and load is normal.

2012-06-05 20:41 UTC: Web223’s RAID array is not rebuilding as normal. We’ll take the server offline again to update/change the RAID controller and firmware. We’ll update this post when the server goes offline.

2012-06-05 23:46 UTC:  We’ll take the server offline in the next few minutes. We’ll update this post as maintenance progresses.

2012-06-05 00:09 UTC: The server is back online and the RAID is rebuilding properly. We’ll keep this post open until the RAID is rebuilt.

2012-06-05 00:26 UTC: The server’s RAID is now rebuilt and the server is working normally at this time.

-
-

[Done]Scheduled Maintenance on Web140 on June 5th 2012.

Posted in Scheduled downtime by

Web140 will be taken down for a RAM swap at Tuesday June 5th between 10:00 UTC and 13:00 UTC. We will update this post as maintenance progresses.

2012-06-05 10:20 UTC The RAM has been swapped and the server is back at operational status.

-
-

[Done] Emergency maintenance on web140

Posted in Downtime by

web140 is showing some filesystem errors. We’re taking the server down to perform a full FSCK. We’ll update this post as the FSCK progresses.

2012-06-01 11:23 GMT: The first pass is currently at 30%
2012-06-01 11:32 GMT: The first pass is currently at 60%
2012-06-01 12:10 GMT: The FSCK is complete and the server is being brought back online now
2012-06-01 12:20 GMT: The server is back online

-
-

[Done]Reboots on various Web and Dweb servers on Sunday, 3 June 2012

Posted in Downtime by

On June 3rd we will be rebooting the following servers for routine kernel updates between 21:30 and 22:00 UTC:

  • dweb95
  • dweb96
  • dweb97
  • dweb98
  • dweb100
  • dweb101
  • dweb102
  • dweb104
  • dweb105
  • dweb110
  • dweb111
  • web310
  • web312
  • web315
  • web317

Downtime on each server is expected to be less than 20 minutes. We will update this post as maintenance progresses.

2012-06-03 21:43 UTC: All servers are now back online and working normally.

-
-