2009年10月7日水曜日

DDoS attack rains down on Amazon cloud

Amazon Web Servicesで大掛かりなDDOS攻撃があり、サービスが19時間も停止した、という事件が発生し、顧客である、BitBucket社が大きな影響を受けた、という話。 
 
興味深いのは、障害が起きている間、AWSの競合ホスティング事業者からサービス乗り換えの営業売込みが入ってきている、という事。 

Updated Web-based code hosting service Bitbucket experienced more than 19 hours of downtime over the weekend after an apparent DDoS attack on the sky-high compute infrastructure it rents from Amazon.com.

This in turn left many developers without access to code projects hosted on Bitbucket, a GitHub-like service based on the Mercurial version control system.

"Looks like all (my and a large number of fellow nerds) bitbucket projects have evaporated in a temporary, cloudy way. This is a major pissoff," said one Reg reader and Bitbucket user, as others vented via Twitter.

But on another level, the news is sure to fuel fears over the security of Amazon's Elastic Compute Cloud (EC2) and similar "infrastructure clouds," online services that provide grid-like access to scalable processing, storage, and networking resources.

"The lesson here is: 'Don't bet the farm on a single cloud provider,'" says Craig Balding, founder of cloudsecurity.org and a security practitioner at a Fortune 500 company. "It's common sense really. But people get lulled into thinking they site is always going to be available [when they host with a single provider]."

According to a blog post from Jesper Nøhr, the Danish developer who runs Bitbucket.org, the site's Amazon-hosted network storage became "virtually unavailable" beginning Friday evening, and the outage persisted well into Saturday before Amazon pinpointed the problem.

Nøhr says Amazon advised him not to divulge the cause of the outage. But he divulged anyway. "We were attacked. Bigtime. We had a massive flood of UDP [User Datagram Protocol] packets coming in to our IP, basically eating away all bandwidth to the box," he wrote. "So, basically a massive-scale DDOS. That's nice."

Speaking with The Reg, Nøhr said that Amazon urged him not to reveal the attack because it might help attackers develop new ways of DDoSing the site.

After uncovering the problem - at least 16 hours after it was first reported - Amazon blocked the offending traffic, and service returned to normal. But by the next morning (Sunday), the problem returned, and another two hours passed before this second outage was reversed. According to Nøhr, Amazon told him the second attack used a flood of TCP SYN connection requests, rather than UDP packets.

Then, it seems, a third attack arrived. Nøhr tells The Reg that an attack on an Amazon edge router took out service for some but not all Bitbucket customers for close to one and a half hours earlier today.

For Nøhr and other Bitbucket devotees, it seems odd that traffic from the net at large could bring down what should be "internal" storage resources. Nøhr speculates that Bitbucket's storage sits on the same network interface that connects the site to the outside world. He asks why the storage isn't on a separate channel - and why Amazon doesn't have methods in place to rapidly detect and combat such DDoS attacks.

"I do think they could've taken precautions to at least be warned if one of their routers started pumping through millions of bogus UDP packets to one IP," he wrote, "and I also think that 16+ hours is too long to discover the root of the problem."

Cloudsecurity.org's Balding is equally surprised that an outside attack could somehow "get between" EC2 and EBS. But since Amazon treats its service as a black box, he says, it's difficult to tell what actually occurred. He says it's possible that the attack came from inside EC2 - i.e. from another EC2 customer - but this is unlikely. "You'd think that Amazon could have shut down that sort of thing pretty quickly," he tells The Reg.

Amazon did not immediately respond to a request for comment, but we made contact before Pacific Coast office hours. We will update this story when the company responds.

In a security white paper (pdf) dated September 2008, the company says it uses standard DDoS-fighting techniques such as syn cookies and connection limits. It also says: "To further mitigate the effect of potential DDoS attacks, Amazon maintains internal bandwidth which exceeds its provider-supplied Internet bandwidth."

Bitbucket runs its entire site on Amazon's Elastic Compute Cloud, using the company's Elastic Block Store (EBS) for storing its database, log files, user data, and more. EBS provides persistent storage for EC2 server instances.

On Friday evening, Jesper Nøhr and Bitbucket told Amazon that there appeared to be a serious slowdown in the transfer of data to and from its Elastic Block Store. Nøhr said the site was "getting less throughput than you can pull off of a 1.44MB floppy".

But when Nøhr first reported the problem, an Amazon support rep blamed it on that fact that EBS is a shared resource, saying that performance could vary. But after Nøhr made another call, a second rep acknowledged that the problem went beyond the usual performance hiccups.

"We had been extremely frustrated up until this point, because 1) we couldn't actually *do* anything about it, and 2) we were being told that everything should be fine," Nøhr wrote. "It felt like there was an elephant right in front of us, and a person next to us was insisting that there wasn't."

Eventually, Nøhr said, Amazon gave the problem the attention it deserved. "After a bit of stalling with their first rep, our case received absolutely stellar attention. They were very professional in their correspondence, and in communicating things to us along the way."

Nøhr says that Amazon wants to work with him to ensure this sort of attack doesn't happen in the future, but he's now considering a switch to another hosting provider.

"We're seriously considering moving to a different setup now," he said. "Not because Amazon isn't providing us with decent service, which they are, most of the time. While we were down, several large hosting companies took direct contact with us, pitching their solutions. I won't mention names, but some of the offerings are quite tempting, and have several advantages over what we get with Amazon.

"One thing's for sure, we're investing a lot of man-hours into making sure this won't happen again. If this means moving to a different host, so be it. We haven't decided yet."

According to Nøhr, the DDoS traffic was spoofed, so it's unclear where it originated. Nøhr speculates that attackers were targeting a Bitbucket-hosted project "related to" World of Warcraft, but he declined to name the service.