2010年1月15日金曜日

Rackspace社でまたもやシステム障害発生

Rackspace Hosting社、システム障害ではないが、性能劣化の問題が発生。 CRONと呼ばれるUnixやLinuxで利用されるタスク自動化コマンドをとめた際に生じた、と報告されている。

------------------------------------------------------------------------------------------------------

Performance Problems for Rackspace Cloud

January 14th, 2010 : Rich Miller
Rackspace reports that its cloud computing service is “degraded,” with many customers reporting their sites are unreachable. The company attributed the problem to an unusual load spike in the storage system supporting its cloud platform. The outage came several hours after the Rackspace Cloud disabled CRON, a command commonly used to automate tasks on Unix and Linux systems. By early evening, the company said performance had improved.
“We are working with engineers from inside and outside the company with the best expertise on these issues to resolve them and develop a plan of action to ensure we do not repeat this state. We have a series of changes that are being implemented in real time. We are being careful to minimize issues as we proceed.”“Starting yesterday we began experiencing very high loads on our storage devices for cluster WC1 in DFW,” Rackspace said on itsstatus page. ”In order to reduce load we have shut down processes like CRON to ensure core site content continue to load. While load spikes are common in our cloud infrastructure, we have not been able to fully identify the root cause of these unusual issues.
UPDATE: At about 6 pm Central time Rackspace provided an update: “We have been seeing improved performance on our Cloud Sites WC1 storage cluster for the last few hours. Assuming stability continues we will resume CRON operations this evening. At this time, we cannot declare victory on this issue, but we have many plans in place to continue to increase headroom and ensure stability under all conditions.”