[Done] Time zone change on multiple web servers on Sunday, 3rd March 2013

Posted in Downtime by

In an effort to standardize the time zones used on all of our servers we will be changing the time zone on the following servers to UTC on Sunday, 3rd March 2013 at 9am UTC:

  • web162
  • web174
  • web175
  • web178
  • web180
  • web182
  • web183
  • web186
  • web187
  • web198
  • web199
  • web200
  • web208
  • web209
  • web210
  • web211
  • web213
  • web216
  • web217
  • web218

Note that depending on which applications you use and how they are configured the timezone change may affect your applications. We have created some documentation on the change which is available here:

2013-03-03 09:05 UTC The timezone switch has been completed.

-
-

[Done] Time zone change on multiple web servers on Saturday, 2nd March 2013

Posted in Downtime by

In an effort to standardize the time zones used on all of our servers we will be changing the time zone on the following servers to UTC on Saturday, 2nd March 2013 at 9am UTC:

  • web75
  • web80
  • web83
  • web91
  • web95
  • web99
  • web102
  • web105
  • web106
  • web108
  • web114
  • web117
  • web119
  • web122
  • web126
  • web129
  • web143
  • web148
  • web151
  • web155

Note that depending on which applications you use and how they are configured the timezone change may affect your applications. We have created some documentation on the change which is available here:

2013-03-02 09:10 UTC The timezone switch has been completed.

-
-

[Done] Time zone change on multiple web servers on Friday, 1st March 2013

Posted in Downtime by

In an effort to standardize the time zones used on all of our servers we will be changing the time zone on the following servers to UTC on Friday, 1st March 2013 at 9am UTC:

  • web27
  • web28
  • web30
  • web31
  • web34
  • web35
  • web37
  • web39
  • web40
  • web42
  • web48
  • web49
  • web55
  • web57
  • web65
  • web69
  • web70
  • web72
  • web74

Note that depending on which applications you use and how they are configured the timezone change may affect your applications. We have created some documentation on the change which is available here:

2013-03-01 09:15 UTC The timezone switch has been completed.

-
-

[Done]Scheduled maintenance on Web381, February 13th, 2013.

Posted in Scheduled downtime by

Web381 will be taken down Wdnesday February 13th between 09:00 UTC and 12:00 UTC for a RAID adapter swap. We will update this post as maintenance progresses.

2013-02-13 12:24 UTC We had to start an offline RAID rebuild, it is now at 72%.

2013-02-13 15:45 UTC After rebuilding the RAID the OS was not able to boot up, we’re currently investigating the issue and considering a hardware replacement.

2013-02-13 17:24 UTC: The RAID controller is causing the kernel to panic on boot. We’ve currently got the server in a rescue environment and we are copying all data off of the current hard drives. Once this is finished we will have the entire server chassis swapped and begin restoring the data to the machine.

2013-02-13 18:33 UTC: The server chassis is currently being swapped.

2013-02-13 19:33 UTC: The relevant hardware has been swapped and we are working to bring the server back online in order to verify the hardware swap has corrected the problem.

2013-02-13 21:58 UTC: The new hardware has not helped the cause of the kernel panics. We’ve decided to bring the machine back online in a rescue environment to pull as much data off the machine as possible then have all of the hardware switched and restore from the backups we are taking now.

2013-02-14 00:14 UTC: We’ve replaced the entire machine including the hard disks. We are now preparing an OS reload.

2013-02-14 03:06 UTC: We’ve restored the mysql and postgres databases on the machine and all $HOME directories. We’re now vigorously testing the machine to verify the complete hardware swap has corrected the problem we were seeing.

2013-02-14 04:02 UTC: We’ve restored all cron jobs and the SSH fingerprint information so you will not see any warnings that the host key has changed. We’re taking the machine through one final test before allowing logins.

2013-02-14 04:05 UTC: The final test has completed and the server is now back online and functioning normally. We’ll continue to closely monitor the machine throughout the next few hours.

-
-

[Done]Web381 down

Posted in Downtime by

Web381 stopped responding and has not come back online following several attempted reboots. We’re working to restore service at this time and will update this post when we have more information.

2013-02-08 14:52 UTC: We’re still working on server to bring it online
2013-02-08 16:04 UTC: We’re reloading the server in order to bring it online, we were able to get all data from the server prior reload, so we don’t expect any data loss.

2013-02-08 16:40 UTC:The server is now back online and we are monitoring it closely.

-
-

[Done] RAID cache problem on Web129

Posted in Downtime by

The cache module of the RAID card on Web129 needs be replaced and we would be taking the server down for that emergency replacement soon.

We will update this post with more information as the replacement progresses.

2013-02-06 08:43 UTC: There was a delay in finding the appropriate part at the datacenter but it has been replaced now and the server is OK now.

-
-

[Done]Read-only filesystem on web129

Posted in Downtime by

The filesystem on Web129 is currently read-only. We’re working to restore service and will update this post when we have more information.

2013-02-06 01:03 UTC: We’ve taken the server offline to preform a full file system check. That process is on-going and multi-stage. We are currently at 15% of the first pass.

2013-02-06 02:07 UTC: The first full pass has completed and we are now on the second pass.

2013-02-06 02:53 UTC: The server is now back online. We’ll continue to monitor more closely it to insure that the file system is stable.

-
-

[Done]Scheduled maintenance on Web360, February 6th, 2013.

Posted in Downtime by

Web360 will be taken down for a firmware upgrade Wednesday February 6th at 09:00 UTC. We will update this post as maintenance progresses.

2013-02-06 10:54 UTC While flashing its firmware the IPMI showed an error we’re currently investigating.

2013-02-06 11:20 UTC The server finished flashing the IPMI and it’s now setting the BIOS.

2013-02-06 12:09 UTC After all firmware upgrades finished the server was not booting up, we’re now swapping the server hardware.

2013-02-06 13:56 UTC The server had the hardware swapped and it’s back to operational status.

-
-