We are aware of an issue affecting Web90, Web141, Web162 & Web166 and are working to resolve it ASAP. We will post more information here as it develops.

Update [2011-08-20 09:25 UTC] – Here is the post mortem and follow up for this issue:

Over the last 36 hours Web90, Web141, Web162, and Web166 have repeatedly gone offline or become unreachable.

From the first outage we’ve been working with the data center to determine the exact cause of the issue. With high network usage cited as the cause of the outage we watched the servers carefully over the remainder of the day.

When the servers became unstable again we began scrutinizing the outbound traffic from the servers both from the data center and our own monitoring tools on the server. Initial causes seemed to be UDP packets that were flooding the connection.

After disabling specific UDP packets with no change we began to look deeper into what the cause was. After a few hours both the data center and our system administrators found that the cause was fragmented IP protocol packets that were flooding the outbound connection on the servers.

These fragmented IP packets were not being picked up through normal monitoring channels because they weren’t considered valid packets by the monitoring software. With the issue found we began to trace it back to its root cause which was the WordPress exploit we tweeted about earlier:

Of all the WordPress sites we host the only ones hit with this thumb.php exploit, to this extent, were on Web166 and Web90. Since these machines were in the same racks as our other servers the excess bandwidth over saturated the connection and caused outages on the entire network segment.

We immediately began finding vulnerable WordPress themes and plugins that used the thumb.php and timthumb.php files and sending messages to the owners of the sites informing them of the issue and a fix.

Since we began that process the servers have been online and we are monitoring them very closely to insure that no other vulnerable WordPress sites can be exploited. The server security was never compromised because of the way our users and ACLs are set up and the exploits were run as the user like all PHP scripts are.

Over the next few days we will be looking for this, specific, vulnerabilities over all servers and notifying those users