[Fixed] Intermittent network routing issues affecting several US servers

Posted in Problems by

Several of our US servers are currently experiencing intermittent network routing issues. Affected servers may include: Dweb100 Dweb101 Dweb102 Dweb104 Dweb105 Dweb110 Dweb111 Dweb112 Dweb113 Dweb114 Dweb115 Dweb116 Dweb117 Dweb118 Dweb119 Dweb120 Dweb121 Dweb122 Dweb123 Dweb124 Dweb125 Dweb126 Dweb127 Dweb128 Dweb129 Dweb130 Dweb133 Dweb134 Dweb135 Dweb137 Dweb140 Dweb141 Dweb142 Dweb143 Dweb144 Dweb145 Dweb146 Dweb147 Dweb149 Dweb150 Dweb151 Dweb152 Dweb153 Dweb154 Dweb158 Dweb160 Dweb161 Dweb162 Dweb163 Dweb164 Dweb91 Dweb92 Dweb93 Dweb94 Dweb95 Dweb96 Dweb97 Mailbox8 Web102 Web105 Web106 Web108 Web11 Web114 Web117 Web119 Web12 Web122 Web126 Web129 Web143 Web148 Web15 Web151 Web155 Web162 Web174 Web175 Web178 Web180 Web182 Web183 Web186 Web187 Web198 Web199 Web200 Web213 Web219 Web220 Web226 Web227 Web228 Web229 Web230 Web231 Web232 Web233 Web234 Web235 Web236 Web237 Web238 Web239 Web24 Web240 Web241 Web243 Web244 Web245 Web246 Web247 Web25 Web27 Web28 Web30 Web300 Web301 Web302 Web307 Web308 Web309 Web31 Web310 Web311 Web312 Web313 Web318 Web319 Web320 Web324 Web328 Web329 Web330 Web335 Web336 Web337 Web338 Web34 Web341 Web342 Web343 Web344 Web345 Web346 Web347 Web348 Web349 Web35 Web37 Web39 Web4 Web40 Web42 Web48 Web49 Web5 Web55 Web57 Web65 Web69 Web70 Web72 Web74 Web75 Web80 Web83 Web91 Web95 Web99

We’re working to resolve this issue and will update this post when we have more information.

2012-10-15 6:07 UTC: The problem was an issue with an upstream network carrier and has been resolved.

-
-

[Done]Web143 SSH problems

Posted in Problems by

Web143 is currently not accessible via SSH. We’re looking into the problem and will update this ticket as we have more information.

2012-06-05 00:00 UTC: Upon further inspection the RAID controller and/or hard drive is causing the server to be offline. We’re investigating further. We’ll update this post as more info is available.

2012-06-05 01:33 UTC: We’ve replaced the failing hard drive and we’re now replacing the RAID controller and updating it’s firmware.

2012-06-05 02:14 UTC: The RAID controller and hard drives are now functioning correctly. We’re running a FSCK on the machine now to correct  file system errors.

2012-06-05 03:45 UTC: The FSCK is still running. We’ll post more information as the FSCK progresses.

2012-06-05 04:56 UTC: The FSCK was unable to complete due to an infinite loop encountered in the process. We’ve now brought the server back online with a read only file system to perform as complete a backup as possible to minimize chances of data loss.

2012-06-06 07:24 UTC: We’re still retrieving files from the machine.

2012-06-06 08:25 UTC: We’ve requested a full chassis swap and the OS be reloaded on the new equipment.

2012-06-06 11:21 UTC: The OS reload is running currently.

2012-06-06 12:27 UTC:  The OS is still being reloaded onto the machine.

2012-06-06 14:25 UTC:  The OS installation is on it’s last steps. Once it’s finished we’ll begin installing our platform tools and transferring user data back to the machine.

2012-06-06 15:31 UTC: The OS installation is finished and we are installing our platform tools.

2012-06-06 16:50 UTC: Our setup has finished and we’re now transferring user data back to the machine.

2012-06-06 18:35 UTC: The user data is still transferring, currently the MySQL databases have been restored and we’re working on the PostgreSQL databases as the other data transfers.

2012-06-06 18:57 UTC: The PostgreSQL databases have now been restored and we’re getting close to the end of the user files to be transferred.

2012-06-06 20:13 UTC: All user files have been transferred to the server. We’re now verifying that all files transferred off the machine have been transferred back to the machine.

2012-06-06 21:12 UTC:  The files have been verified. We’ve noted some spots of corruption and fixed them. We’re now correcting file system permissions.

2012-06-06 21:42 UTC: The server is now back online and resuming normal function. Please log into the server and verify that your apps and files are working as expected.

-
-