[Fixed] Intermittent network routing issues affecting several US servers

Posted in Problems by

Several of our US servers are currently experiencing intermittent network routing issues. Affected servers may include: Dweb100 Dweb101 Dweb102 Dweb104 Dweb105 Dweb110 Dweb111 Dweb112 Dweb113 Dweb114 Dweb115 Dweb116 Dweb117 Dweb118 Dweb119 Dweb120 Dweb121 Dweb122 Dweb123 Dweb124 Dweb125 Dweb126 Dweb127 Dweb128 Dweb129 Dweb130 Dweb133 Dweb134 Dweb135 Dweb137 Dweb140 Dweb141 Dweb142 Dweb143 Dweb144 Dweb145 Dweb146 Dweb147 Dweb149 Dweb150 Dweb151 Dweb152 Dweb153 Dweb154 Dweb158 Dweb160 Dweb161 Dweb162 Dweb163 Dweb164 Dweb91 Dweb92 Dweb93 Dweb94 Dweb95 Dweb96 Dweb97 Mailbox8 Web102 Web105 Web106 Web108 Web11 Web114 Web117 Web119 Web12 Web122 Web126 Web129 Web143 Web148 Web15 Web151 Web155 Web162 Web174 Web175 Web178 Web180 Web182 Web183 Web186 Web187 Web198 Web199 Web200 Web213 Web219 Web220 Web226 Web227 Web228 Web229 Web230 Web231 Web232 Web233 Web234 Web235 Web236 Web237 Web238 Web239 Web24 Web240 Web241 Web243 Web244 Web245 Web246 Web247 Web25 Web27 Web28 Web30 Web300 Web301 Web302 Web307 Web308 Web309 Web31 Web310 Web311 Web312 Web313 Web318 Web319 Web320 Web324 Web328 Web329 Web330 Web335 Web336 Web337 Web338 Web34 Web341 Web342 Web343 Web344 Web345 Web346 Web347 Web348 Web349 Web35 Web37 Web39 Web4 Web40 Web42 Web48 Web49 Web5 Web55 Web57 Web65 Web69 Web70 Web72 Web74 Web75 Web80 Web83 Web91 Web95 Web99

We’re working to resolve this issue and will update this post when we have more information.

2012-10-15 6:07 UTC: The problem was an issue with an upstream network carrier and has been resolved.


[Done]Emergency maintenance on Web106, July 25 2012

Posted in Downtime by

Web106’s file system went read-only. We are currently running fsck to bring the server back up.

2012-07-25 15:39 UTC: Web106 is back up and functioning properly.

2012-07-25 16:58 UTC: The filesystem has gone read-only again. We’re working to resolve the issue.

2012-07-25 18:21 UTC: The filesystem check is complete, and we’re running other hardware diagnostic tests at this time.

2012-07-25 20:33 UTC: The filesystem continues to go into a read-only state, even after successful checks. Since the disks seem to be OK, we’re arranging for a full chassis swap at this time.

2012-07-25 22:34 UTC: The chassis has been swapped. The filesystem still contains errors after the chassis swap; we’re running a filesystem check.

2012-07-26 12:04 UTC: The filesystem check is still in progress, 45% complete.

2012-07-26 01:20 UTC: The filesystem check is still in progress, 80.5% complete.

2012-07-26 03:25 UTC: The file system check is finished and the file system seems to be stable. We’re now working to bring the server back up on the network.

2012-07-26 04:26 UTC: The server is now back online. We’ll continue to monitor the server closely to make sure that no additional filesystem, hardware, or network errors are left unresolved.

2012-07-26 07:13 UTC: The server is down again; we suspect drive failure in one of the RAID disks which is causing the read-only condition. We need to fsck, backup, and replace that drive.

2012-07-26 10:14:41 UTC: There were problems getting the server into the rescue mode but the fsck has started now.

2012-07-26 13:48 UTC FSCK completed, RAID firmware upgraded and now rebuilding. The server is back to operational status. We will keep monitoring this machine closely.