[Fixed] Web28 offline for filesystem repairs

Posted in Downtime by

The filesystem on web28 has continued to go into a read-only state, so we’re taking the machine offline for emergency maintenance. We’ll update this post as soon as we have more information.

2012-12-12 00:05 UTC: The filesystem repair is complete and Web28 is back online.

2012-12-12 09:53 UTC: The filesystem is still going read-only and we think this time we have isolated the problem which persists between fscks so we are taking the server down to fix it and run a fsck again.

2012-12-12 11:47 UTC: The filesystem has been fscked one again and the server is back.

-
-

[Done] Filesystem Corruption on Web28

Posted in Scheduled downtime by

Web28 has a dirty filesystem so we have scheduled a filesystem check for it at 2012-12-11 0600 UTC.

We will update this post as the check progresses.

[2012-12-11 06:04 UTC]  We have taken the server down to rescue mode to start the fsck

[2012-12-11 07:14 UTC]  The second pass of fsck is going on now.

[2012-12-11 07:53 UTC]  The fsck is over and the server is back.

-
-

[Done]Emergency maintenance on Web28, 14th November, 2012.

Posted in Downtime by

Web28 filesystem went in R/O mode. We’re taking down the server to run a fsck. We will update this post as maintenance progresses.

2012-11-14 16:32 UTC the fsck finished execution. The server is back at operational status.

-
-

[Fixed] Intermittent network routing issues affecting several US servers

Posted in Problems by

Several of our US servers are currently experiencing intermittent network routing issues. Affected servers may include: Dweb100 Dweb101 Dweb102 Dweb104 Dweb105 Dweb110 Dweb111 Dweb112 Dweb113 Dweb114 Dweb115 Dweb116 Dweb117 Dweb118 Dweb119 Dweb120 Dweb121 Dweb122 Dweb123 Dweb124 Dweb125 Dweb126 Dweb127 Dweb128 Dweb129 Dweb130 Dweb133 Dweb134 Dweb135 Dweb137 Dweb140 Dweb141 Dweb142 Dweb143 Dweb144 Dweb145 Dweb146 Dweb147 Dweb149 Dweb150 Dweb151 Dweb152 Dweb153 Dweb154 Dweb158 Dweb160 Dweb161 Dweb162 Dweb163 Dweb164 Dweb91 Dweb92 Dweb93 Dweb94 Dweb95 Dweb96 Dweb97 Mailbox8 Web102 Web105 Web106 Web108 Web11 Web114 Web117 Web119 Web12 Web122 Web126 Web129 Web143 Web148 Web15 Web151 Web155 Web162 Web174 Web175 Web178 Web180 Web182 Web183 Web186 Web187 Web198 Web199 Web200 Web213 Web219 Web220 Web226 Web227 Web228 Web229 Web230 Web231 Web232 Web233 Web234 Web235 Web236 Web237 Web238 Web239 Web24 Web240 Web241 Web243 Web244 Web245 Web246 Web247 Web25 Web27 Web28 Web30 Web300 Web301 Web302 Web307 Web308 Web309 Web31 Web310 Web311 Web312 Web313 Web318 Web319 Web320 Web324 Web328 Web329 Web330 Web335 Web336 Web337 Web338 Web34 Web341 Web342 Web343 Web344 Web345 Web346 Web347 Web348 Web349 Web35 Web37 Web39 Web4 Web40 Web42 Web48 Web49 Web5 Web55 Web57 Web65 Web69 Web70 Web72 Web74 Web75 Web80 Web83 Web91 Web95 Web99

We’re working to resolve this issue and will update this post when we have more information.

2012-10-15 6:07 UTC: The problem was an issue with an upstream network carrier and has been resolved.

-
-

Emergency maintenance on web28

Posted in Downtime by

The server has been having problems since the last few hours of intermittently becoming unresponsive.

We think the problem is due to faulty RAM and have scheduled an immediate RAM
replacement to solve it.

We will update this post regularly with more information.

2012-02-20 19:44 UTC: the server RAM was replaced, but the problem persists. We’ll continue to troubleshoot and will update this post when we have more information.

2012-02-20 01:16 UTC: We’ve isolated the problem down to a out of memory killer error that is being triggered by numerous processes (different processes each time). We’ve disabled all non-essential services on the machine. The server seems stable for now and we’re continuing to monitor it.

-
-

[Fixed] Multiple servers inaccessible

Posted in Problems by

Due to an apparent network outage, all servers in our data centers in Houston are currently inaccessible.

We are looking into the problem and will update this post when more information is available.

Affected servers are:
dweb23, dweb27, dweb32, dweb45, dweb46, dweb47, dweb49, dweb54, dweb60, dweb63, dweb64, dweb66, dweb67, krait, mail1, mail2, mail6, mail7, mail8, mamba, web1, web2, web3, web4, web5, web6, web7, web15, web17, web19, web28, web29, web30, web31, web33, web34, web35, web37, web38, web40, web41, web42, web44, web45, web46, web47, web48, web49, web50, web51, web53, web55, web56, web57, web58, web60, web61, web62, web63, web64, web65, web66, web67, web68, web69, web70, web71, web72, web73, web74, web75, web76, web77, web78, web79, web80, web81, web82, web83, web134, web135 and web136

[Update May 3, 2010 – 06:08 UTC] The network problems have been fixed and all servers are now accessible.
[Update May 3, 2010 – 08:32 UTC] ThePlanet have confirmed that their network team is still working on the problem. Intermittent outages are possible until the issues are completely resolved.
[Update May 3, 2010 – 10:45 UTC] ThePlanet reported that they tracked the problem to a border router on their network. The defective device is now removed from the network. No further connectivity interruptions are expected at this point.
[Update May 3, 2010 – 13:00 UTC] The network at ThePlanet is unreachable once again. We are checking this further with the datacenter specialists.
[Update May 3, 2010 – 14:05 UTC] Connectivity has just been restored. ThePlanet are still working on the underlying problems. Further connectivity interruptions may occur.
[Update May 3, 2010 – 15:35 UTC] There have been no connectivity problems during the past hour and a half. ThePlanet confirmed that they are still investigating the issue, but they do not expect further interruptions.

-
-