2009年5月29日金曜日

Opscode Gets $2.5M to Automate the Cloud

Opscode, Inc.社はオープンソースを採用したCloud Computingインフラソフトウェアソリューションのベンダー。  AmazonとMicrosoft社の幹部が発足した企業で、先ごろ、Draper Fisher Jurvetsonと呼ばれるVCより、$2.5Mの出資を受けている。 

Chefと呼ばれる製品は既に出荷されており、サーバ環境の自動化を提供するのが主たる機能セット。  Opscode社の経営者やエンジニアがAmazon EC2やMicrosoft Azure等の大規模Cloud Computing環境に精通しているメンバである事が売り。 


Opscode, Inc., a cloud infrastructure company led by veterans of the data center teams from Amazon and Microsoft, has closed a Series A funding round of $2.5 million led by Draper Fisher Jurvetson, the company said today. Opscode is developing technology for building and managing automated infrastructures based on community best practices. The company plans to use the funds to hire additional staff as it prepares to launch its core service later this year.

Opscode has already released Chef, an open source systems integration framework for managing large-scale server and application deployments. Chef is already in production use at companies including Engine Yard, 37Signals, and Wikia.

"Opscode is defining essential automation solutions that companies need to take full advantage of cloud computing and virtualization technologies," said Bill Bryant, Venture Partner with Draper Fisher Jurvetson and Opscode board member. "The founding Opscode team is comprised of experts who have built and operated some of the Web's largest sites and supporting infrastructure services, such as Amazon EC2. Companies that embrace infrastructure automation will gain a significant advantage over those that manage these processes by hand."

2009年5月28日木曜日

Insert Catchy Cloud Computing Title Here

クラウド関係の記事で興味深いものを集めたもの。 

For the most part I stay away from articles or links about cloud computing -- it's just too crowded right now and I have too much to read, let alone write about.  I have, however run across several good links lately that I wanted to share:

Many sites and blogs have coverered the move from Microsoft to Digital Realty Trust for Michael Manos.  I thought the May 5th post on his blog - Forecast Cloudy with Continued Enterprise - was particularly good.  The beginning reminded me of reading Nicholas Carr's The Big Switch: an excellent book.  I think Digital Realty Trust will be an excellent fit for Michael and thank him for generously sharing his wisdom in his blog.

Also on Cinco de Mayo was a post from Sam Johnston on "Is OCCI the HTTP of Cloud Computing?"  OCCI is the OGF's Open Cloud Computing Interface - who provides the interface to Cloud Infrastructure as a Service.  Has anyone asked Tim Berners-Lee about this yet?

Apparently Akamai, formerly a Content Delivery Network company, is now a Cloud Provider.  Ummm... ok

Google had a couple of notes on cloud computing recently.  Co-founder Sergey Brin posted the 2008 Founders' Letter on the Google blog, and discussed cloud computing.  CEO Eric Schmidt, at a press event, dismissed Android questions in favor of talking about the cloud, models and strategies.

Forrester Research recognized 3Tera's Applogic as the leading "cloud infrastructure software offering available today".

The Private Cloud, Now In A Purple Box

  IBMがPrivate Cloud向けのソリューションをアプライアンスの形で製品化、出荷を開始した。 
 
製品名は、「WebSphere CLoudBurst Appliance」。
企業内データセンタ内にある仮想環境を管理し、Private Cloudインフラを構築する機能を提供する。 
具体的な昨日は不明だが、何らかのアプリケーションの開発環境があって、Cloud環境上にそれを実装できる事が出来るツールと共に、そのアプロケーションの利用に伴うアクセス権限の管理と、利用者に対する課金の機能もサポートされている、と予想される。 

The IBM WebSphere CloudBurst offers a private cloud in an appliance.

The IBM WebSphere CloudBurst offers a private cloud in an appliance.

The private cloud exists, and is living in a purple box. On Thursday IBM announced the WebSphere CloudBurst Appliance, a hardware appliance that manages and maintains virtual machines in a "private cloud."


IBM says WebSphere CloudBurst helps customers quickly develop and deploy applications. Once finished, resources are automatically returned to the shared resource pool and logged for internal charge-back purposes.

"The WebSphere CloudBurst Appliance combines the flexibility of SOA (service-oriented architecture) with the adaptability of the cloud," said Tom Rosamilia, general manager of WebSphere at IBM, in a press release. "It's new  capabilities for rapid deployment deliver a fast return on investment to our clients."

2009年5月19日火曜日

IDC Survey: Recession Accelerating Cloud Computing

IDCの調査によると、昨今の経済状況のCloud Computingに対する取り組みに同影響を及ぼしているか、という質問に対して、1/4が積極的に進めている、と回答している。

idc_survey1One of the most common questions I've been asked in the past several months has been "How will the global recession impact the pace of adoption of Cloud Computing?".

My gut reaction has been that the economic crunch would certainly amplify the economic benefits of the cloud services model, and therefore accelerate IT cloud services adoption. Some data from a user survey my colleague Stephen Minton published earlier this year substantiates that view.

The survey was of 332 IT and line-of-business executives, predominantly based in North America, and spread across large, medium and small enterprises. Stephen asked this group: "How will the economic situation affect your approach to cloud computing and SaaS?". Here are the responses:

recession_impact_on_cloud_spending-thumb1

Click on Image to Enlarge

Almost half of the respondents claimed there would be no impact - a real testament to the power of inertia in many businesses! But over half of the executives stated that they are, indeed, adjusting their approach to market conditions.

Intriguingly, twenty-four percent are reacting to the recession by moving more aggressively in the cloud/SaaS direction: either doing more evaluation, beginning to adopt, or increasing their adoption of IT cloud services. While fourteen percent are reducing their pace of cloud/SaaS adoption - my guess is, this is not because of anything specific about cloud computing, but because they are reducing most of their IT investments in the down economy.

In a down economy as severe as the one we're experiencing, it's remarkable that one in four executives are thinking more aggressively about adopting ANY kind of IT. But the cloud model's economic benefits are compelling. To me, this survey strongly suggests that the cloud model - which we forecasted last October would account for about 9% of enterprise IT spending in 2012 - is on a pace to drive closer to 10-15%.

One other important takeaway: by far, the largest portion of customers leaning more aggressively toward the cloud model are in the "more evaluation" stage. This makes 2009 and 2010 a very important time for suppliers to be actively educating the marketplace about the cloud model and their cloud offerings - very appropriate, given our assessment that the cloud model is in the "crossing the chasm" stage of adoption.

IDC eXchange / Fri, 01 May 2009 18:44:02 GMT

Sent from FeedDemon

Mike Manos is Back, Blogs about his Microsoft Exit to Digital Realty

Microsoftを離れ、Digital Realty社に移ったMichael Manos氏のコメント。
今後のデータセンタ業界は、Google、Microsoftに代表される大型のデータセンタ(Information Substationと呼ぶ)とEnteprrise内で運用されるでーたセンタの2つの分野が成長していく、と指摘。
それぞれ異なる技術が培われ、共存していく事になる、と予測している。


Mike Manos is back blogging on Loosebolts, and starts off his first discussing his decision to exit Microsoft.

The first question of course is - Why did I leave Microsoft for Digital Realty Trust?

First we need to pull our heads out of the tactical world of data centers and look at the larger emerging landscape in which data centers sit. Microsoft, along with Google, Amazon and a few others are taking aim at Cloud Computing and are designing, building, and operating a different kind of infrastructure with different kinds of requirements. Specifically building ubiquitous services around the globe. In my previous role, I was tasked with thinking about and building this unique infrastructure in concert with hundreds of development groups taking aim at building a core set of services for the cloud. A wonderful blend of application and infrastructure. Its a great thing. But as my personal thought processes matured and deepened on this topic flavored with what I was seeing as emerging trends in business, technology and data center requirements I had a personal epiphany. The concept of large monolithic clouds ruling the Information-sphere was not really complete. Don't get me wrong, they will play a large and significant role in how we compute tomorrow, but instead of an oligarchy of the few, I realized that enterprise data centers are here to stay and additionally we are likely to see an explosion of different cloud types are on the horizon.

The problem Mike is tackling is the role of data centers as information utilities.

In my opinion it is here in this new emerging space where the Information Utility will ultimately be born, defined, and true innovation in our industry (data center-wise) will take place. This may seem rather unintuitive given the significant investments being made by the big cloud players but it is really not. We have to remember that today, any technology must sate basic key requirements. First and foremost amongst these is that it must solve the particular business problems. Technology for technology sake will never result in significant adoption and the big players are working to perfect platforms that will work across a predominance of applications being specifically developed for their infrastructure. In effect they are solving for their issues. Issues that most of those looking to leverage cloud or shared compute will not necessarily match in either scale or standardization of server and IT environments. There will definitely be great advances in technology, process, and a host of other areas, as a result of this work, but their leveragability is ultimately minimized as their environments, while they look like each other's, will not easily map into the enterprise, near-enterprise, or near-cloud space. The NASA space program has had thousands of great solutions, and some of them have been commercialized for the greater good. I see similar things happening in the data center space. Not everyone can get sub 1.3 Average PUE numbers, but they can definitely use those learnings to better their own efficiency in some way. While these large platforms in conjunction with enterprise data centers will provide key and required services, the innovation and primary requirement drivers in the future will come from the channel.

Why Digital Realty?

In Digital Realty Trust I found the great qualities I was looking for in any company. First, they are positioned to provide either "information substation" or "enterprise" solutions and will need to solve for both. They are effectively right in the middle of solving these issues and they are big enough to have a dramatic impact on the industry. Secondly, and perhaps more importantly, they have a passionate, forward looking management team whom I have interacted with in the Industry for quite some time.

Another area where there is significant alignment in terms of my own personal beliefs and those of Digital Realty Trust is around speed of execution and bringing capacity online just in time. Its no secret that I have been an active advocate of moving from big build and construction to a just in time production model.

I think we are going to see more of Mike Manos out there now that he is at Digital Realty Trust.

In the end, my belief is that it will be companies like Digital Realty Trust at the spearhead of driving the design, physical technology application and requirements for the global Information Utility infrastructure. They will clearly be situated the closest to those changing requirements for the largest amount of affected groups. It is going to be a huge challenge. A challenge, I for one am extremely excited about and can't wait to dig in and get started.

As much as there is curiosity about Google, Microsoft, and Amazon data centers, these companies will only share what passes internal approval processes. I challenge any internal group to limit what Mike can talk about. Mike knows part of being innovative comes risk, and he took a risk leaving a nice safe Microsoft position building data centers for its own needs, to building data centers as information utility infrastructure.

We need more people like Mike out there talking about changes in the industry.

Add Loosebolts to your rss readers as i am sure Mike will be discussing all kinds of things. http://loosebolts.wordpress.com/feed/

Green Data Center Blog / Mon, 04 May 2009 14:40:08 GMT

Sent from FeedDemon

Federal Web Portal Moves to Cloud Computing Platform

米国政府GSA(General Services Administrations)のポータルサイト、USA.govがTerremark社の運用するEnterprise Cloudサービスに移行し、Cloud Computingサービスに移行することが発表された。


The federal government's massive information portal -- USA.gov -- will shift to a cloud computing platform this weekend in a move that GSA officials say will slash infrastructure expenses and provide better flexibility.

"We are flipping the switch tomorrow to the cloud computing platform, so this is a nervous day," said Martha Dorris, acting associate administrator for the General Services Administration's (GSA) Office of Citizen Services and Communications, on Friday, May 1. The office operates USA.gov which receives more than 100 million visits per year. Dorris spoke at the 2009 mid-year conference of the National Association of State Chief Information Officers (NASCIO).

The GSA decided in February to move the federal portal to a cloud computing platform, announcing an agreement with Terremark Worldwide, a Miami-based infrastructure services provider. The USA.gov portal will run on the company's Enterprise Cloud service.

Dorris said the move will cut the portal's infrastructure costs by as much as 90 percent and improve its capabilities. "We're saving money and we'll have a flexible infrastructure," she said, adding that complete migration to the new platform would be done by September.

The move presented a difficult cultural shift for agency staff, Dorris told state technology executives attending the NASCIO conference in Baltimore. "This isn't a story about technology. It's a story of culture," she said. "Our technology team did not want to give up the servers. We spend a lot of time moving people along."

The cloud solution will let USA.gov quickly tap into cloud computing resources to handle spikes in online traffic, according to Terremark.

"The on-demand nature of The Enterprise Cloud allowed Terremark to provide the GSA complete access to secure cloud-based computing resources within minutes instead of weeks," the company said in a February statement announcing the deal. "Terremark's solution will also supply GSA with industry-leading physical and logical security and robust connectivity to some of the world's leading carrier networks."

FW: [ Cloud Computing ] Gartner says Worldwide SaaS Revenue to Grow 22 Percent in 2009

 
Gartnerの調査によると、SaaSは今後も成長を続け、今年で$9.6Bを達成、年21.9%の成長を2013まで続ける、との事。  分野は、オフィス系アプリケーションに加え、CRMやERP系のアプリケーション。

From: cloud-computing@googlegroups.com [mailto:cloud-computing@googlegroups.com] On Behalf Of Ashish Ranjan
Sent: Thursday, May 07, 2009 9:13 PM
To: cloud-computing@googlegroups.com
Subject: [ Cloud Computing ] Gartner says Worldwide SaaS Revenue to Grow 22 Percent in 2009

Gartner says Worldwide SaaS Revenue to Grow 22 Percent in 2009

STAMFORD, Conn., May 7, 2009 —

The market for software as a service (SaaS) is forecast to reach $9.6 billion in 2009, a 21.9 percent increase from 2008 revenue of $6.6 billion, according to Gartner, Inc. The market will show consistent growth through 2013 when worldwide SaaS revenue will total $16 billion for the enterprise application markets.

"The adoption of SaaS continues to grow and evolve within the enterprise application markets as tighter capital budgets in the current economic environment demand leaner alternatives, popularity increases, and interest for platform as a service and cloud computing grows," said Sharon Mertz, research director at Gartner.

"Adoption of the on-demand deployment model has grown for nearly a decade, but its popularity has increased significantly within the last five years," Ms. Mertz said. "Initial concerns about security response time and service availability have diminished for many organizations. As SaaS business and computing models have matured, adoption has become more widespread."

SaaS adoption varies between and within markets. Although usage is expanding, growth remains most significant in areas characterized by horizontal applications with common processes, among distributed virtual workforce teams, and within Web 2.0 initiatives.

Office suites and digital content creation (DCC) remain the fastest-growing markets for SaaS. Office suites are projected to total $512 million in 2009, up from $136 million in 2008, while DCC is forecast to total $126 million in 2009, up from $70 million in 2008. The content, communications and collaboration (CCC) market continues to show the widest disparity of SaaS revenue across market segments, generating $2.5 billion in 2009, up from $2.16 billion in 2008 (see Table 1).

Table 1
Worldwide Software Revenue for SaaS Delivery Within the Enterprise Application Software Markets(Millions of Dollars)

 

 2009    

2008

Content, Communications and Collaboration (CCC)

2,507    

2,155

Office Suites

512    

136

Digital Content Creation (DCC)

126    

70

Customer Relationship Management (CRM)

2,169    

1,838

Enterprise Resource Planning (ERP)

1,376    

1,256

Supply Chain Management (SCM)

861    

748

Other Application Software

483    

387

Total Enterprise Software

8,035    

6,591

Source: Gartner (May 2009)

The adoption of SaaS within enterprise resource planning (ERP) and supply chain management (SCM) varies based on process complexity. SaaS is expected to represent only about 1 percent of ERP manufacturing and operations revenue, but more than 18 percent of human capital management (HCM) and 30 percent of the procurement segment by 2013. The CRM market exhibits more general market adoption, ranging between 9 percent and more than 33 percent of total software revenue, depending on the CRM subsegment. Overall, SaaS accounted for more than 18 percent of the CRM market total revenue in 2008.

"Many factors are driving adoption of SaaS, including the benefits of rapid deployment and rapid ROI, less upfront capital investment, and a decreased reliance on limited implementation resources," said Ms. Mertz. "Greater market competition and increased focus by the "megavendors" is reinforcing the legitimacy of on-demand solutions. Many enterprises are further encouraged by the fact that with SaaS, responsibility for continuous operation, backups, updates and infrastructure maintenance shifts risk and resource requirements from internal IT to vendors or service providers."

Certain factors can, however, work to impede adoption of SaaS including: concerns about data security, a perceived lack of competitive differentiation, increasing concerns about scalability, questions about vendor longevity, and the fact that existing investments in applications capital and organizational expertise limit SaaS growth.

Ms. Mertz advised enterprises to determine where SaaS is most appropriate and advantageous within an overall sourcing and applications strategy before selecting a service provider, as well as anticipating the broader costs incurred with a SaaS solution and when these are likely to occur within the life cycle. She said that identifying costs associated with the subscription, training, customization, integration or feature upgrades, and reviewing contractual terms carefully will enable organizations to determine whether SaaS is the better choice.

Additional information is available in the Gartner report "Market Trends: Software as a Service, Worldwide, 2009-2013." The report is available on Gartner's Web site athttp://www.gartner.com/DisplayDocument?ref=g_search&id=965313&subref=simplesearch.


-------------
Regards,
Ashish Ranjan
talkcrm.blogspot.com



--~--~---------~--~----~------------~-------~--~----~
~~~~~
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-qu...
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-computing@googlegroups.com
To unsubscribe from this group, send email to cloud-computing-unsubscribe@googlegroups.com
-~----------~----~----~----~------~----~------~--~---

2009年5月6日水曜日

Are Cloud Based Memory Architectures the Next Big Thing?

メモリーを利用し、ディスクによるストレージを全く排除したシステムアーキテクチャに関連した新技術やビジネスモデルが俄かに登場している。  それらについての解説。  非常に詳細に説明があり、興味深い。

Are Cloud Based Memory Architectures the Next Big Thing?

Todd Hoff's picture

We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.

Let's take a short trip down web architecture lane:

  • It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database
  • It's 1995: Scale-up the database.
  • It's 1998: LAMP
  • It's 1999: Stateless + Load Balanced + Database + SAN
  • It's 2001: In-memory data-grid.
  • It's 2003: Add a caching layer.
  • It's 2004: Add scale-out and partitioning.
  • It's 2005: Add asynchronous job scheduling and maybe a distributed file system.
  • It's 2007: Move it all into the cloud.
  • It's 2008: Cloud + web scalable database.
  • It's 20??: Cloud + Memory Based Architectures

    You may disagree with the timing of various innovations and you would be correct. I couldn't find a history of the evolution of website architectures so I just made stuff up. If you have any better information please let me know.

    Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

    Behold the power of keeping data in memory:


    Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

    This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

    Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle 100s of millions of events per day by using memory instead of disk.

    With their new Unified Computing strategy Cisco is also entering the memory game. Their new machines "will be focusing on networking and memory" with servers crammed with 384 GB of RAM, fast processors, and blazingly fast processor interconnects. Just what you need when creating memory based systems.

    Memory is the System of Record

    What makes Memory Based Architectures different from traditional architectures is that memory is the system of record. Typically disk based databases have been the system of record. Disk has been King, safely storing data away within its castle walls. Disk being slow we've ended up wrapping disks in complicated caching and distributed file systems to make them perform.

    Sure, memory is used as all over the place as cache, but we're always supposed to pretend that cache can be invalidated at any time and old Mr. Reliable, the database, will step in and provide the correct values. In Memory Based Architectures memory is where the "official" data values are stored.

    Caching also serves a different purpose. The purpose behind cache based architectures is to minimize the data bottleneck through to disk. Memory based architectures can address the entire end-to-end application stack. Data in memory can be of higher reliability and availability than traditional architectures.

    Memory Based Architectures initially developed out of the need in some applications spaces for very low latencies. The dramatic drop of RAM prices along with the ability of servers to handle larger and larger amounts of RAM has caused memory architectures to verge on going mainstream. For example, someone recently calculated that 1TB of RAM across 40 servers at 24 GB per server would cost an additional $40,000. Which is really quite affordable given the cost of the servers. Projecting out, 1U and 2U rack-mounted servers will soon support a terabyte or more or memory.

    RAM = High Bandwidth and Low Latency

    Why are Memory Based Architectures so attractive? Compared to disk RAM is a high bandwidth and low latency storage medium. Depending on who you ask the bandwidth of RAM is 5 GB/s. The bandwidth of disk is about 100 MB/s. RAM bandwidth is many hundreds of times faster. RAM wins. Modern hard drives have latencies under 13 milliseconds. When many applications are queued for disk reads latencies can easily be in the many second range. Memory latency is in the 5 nanosecond range. Memory latency is 2,000 times faster. RAM wins again.

    RAM is the New Disk

    The superiority of RAM is at the heart of the RAM is the New Disk paradigm. As an architecture it combines the holy quadrinity of computing:

  • Performance is better because data is accessed from memory instead of through a database to a disk.
  • Scalability is linear because as more servers are added data is transparently load balanced across the servers so there is an automated in-memory sharding.
  • Availability is higher because multiple copies of data are kept in memory and the entire system reroutes on failure.
  • Application development is faster because there's only one layer of software to deal with, the cache, and its API is simple. All the complexity is hidden from the programmer which means all a developer has to do is get and put data.

    Access disk on the critical path of any transaction limits both throughput and latency. Committing a transaction over the network in-memory is faster than writing through to disk. Reading data from memory is also faster than reading data from disk. So the idea is to skip disk, except perhaps as an asynchronous write-behind option, archival storage, and for large files.

    Or is Disk is the the new RAM?

    To be fair there is also a Disk is the the new RAM, RAM is the New Cache paradigm too. This somewhat counter intuitive notion is that a cluster of about 50 disks has the same bandwidth of RAM, so the bandwidth problem is taken care of by adding more disks.

    The latency problem is handled by reorganizing data structures and low level algorithms. It's as simple as avoiding piecemeal reads and organizing algorithms around moving data to and from memory in very large batches and writing highly parallelized programs. While I have no doubt this approach can be made to work by very clever people in many domains, a large chunk of applications are more time in the random access domain space for which RAM based architectures are a better fit.

    Grids and a Few Other Definitions

    There's a constellation of different concepts centered around Memory Based Architectures that we'll need to understand before we can understand the different products in this space. They include:

  • Compute Grid - parallel execution. A Compute Grid is a set of CPUs on which calculations/jobs/work is run. Problems are broken up into smaller tasks and spread across nodes in the grid. The result is calculated faster because it is happening in parallel.
  • Data Grid - a system that deals with data — the controlled sharing and management of large amounts of distributed data.
  • In-Memory Data Grid (IMDG) - parallel in-memory data storage. Data Grids are scaled horizontally, that is by adding more nodes. Data contention is removed removed by partitioning data across nodes.
  • Colocation - Business logic and object state are colocated within the same process. Methods are invoked by routing to the object and having the object execute the method on the node it was mapped to. Latency is low because object state is not sent across the wire.
  • Grid Computing - Compute Grids + Data Grids
  • Cloud Computing - datacenter + API. The API allows the set of CPUs in the grid to be dynamically allocated and deallocated.

    Who are the Major Players in this Space?

    With that bit of background behind us, there are several major players in this space (in alphabetical order):

  • Coherence - is a peer-to-peer, clustered, in-memory data management system. Coherence is a good match for applications that need write-behind functionality when working with a database and you require multiple applications have ACID transactions on the database. Java, JavaEE, C++, and .NET.
  • GemFire - an in-memory data caching solution that provides low-latency and near-zero downtime along with horizontal & global scalability. C++, Java and .NET.
  • GigaSpaces - GigaSpaces attacks the whole stack: Compute Grid, Data Grid, Message, Colocation, and Application Server capabilities. This makes for greater complexity, but it means there's less plumbing that needs to be written and developers can concentrate on writing business logic. Java, C, or .Net.
  • GridGain - A compute grid that can operate over many data grids. It specializes in the transparent and low configuration implementation of features. Java only.
  • Terracotta - Terracotta is network-attached memory that allows you share memory and do anything across a cluster. Terracotta works its magic at the JVM level and provides: high availability, an end of messaging, distributed caching, a single JVM image. Java only.
  • WebSphere eXtreme Scale. Operates as an in-memory data grid that dynamically caches, partitions, replicates, and manages application data and business logic across multiple servers.

    This class of products has generally been called In-Memory Data Grids (IDMG), though not all the products fit snugly in this category. There's quite a range of different features amongst the different products.

    I tossed IDMG the acronym in favor of Memory Based Architectures because the "in-memory" part seems redundant, the grid part has given way to the cloud, the "data" part really can include both data and code. And there are other architectures that will exploit memory yet won't be classic IDMG. So I just used Memory Based Architecture as that's the part that counts.

    Given the wide differences between the products there's no canonical architecture. As an example here's a diagram of how GigaSpaces In-Memory-Data-Grid on the Cloud works.

    Some key points to note are:

  • A POJO (Plain Old Java Object) is written through a proxy using a hash-based data routing mechanism to be stored in a partition on a Processing Unit. Attributes of the object are used as a key. This is straightforward hash based partitioning like you would use with memcached.
  • You are operating through GigaSpace's framework/container so they can automatically handle things like messaging, sending change events, replication, failover, master-worker pattern, map-reduce, transactions, parallel processing, parallel query processing, and write-behind to databases.
  • Scaling is accomplished by dividing your objects into more partitions and assigning the partitions to Processing Unit instances which run on nodes-- a scale-out strategy. Objects are kept in RAM and the objects contain both state and behavior. A Service Grid component supports the dynamic creation and termination of Processing Units.

    Not conceptually difficult and familiar to anyone who has used caching systems like memcached. Only is this case memory is not just a cache, it's the system of record.

    Obviously there are a million more juicy details at play, but that's the gist of it. Admittedly GigaSpaces is on the full featured side of the product equation, but from a memory based architecture perspective the ideas should generalize. When you shard a database, for example, you generally lose the ability to execute queries, you have to do all the assembly yourself. By using GigaSpaces framework you get a lot of very high-end features like parallel query processing for free.

    The power of this approach certainly comes in part from familiar concepts like partitioning. But the speed of memory versus disk also allows entire new levels of performance and reliability in a relatively simple and easy to understand and deploy package.

    NimbusDB - the Database in the Cloud

    Jim Starkey, President of NimbusDB, is not following the IDMG gang's lead. He's taking a completely fresh approach based on thinking of the cloud as a new platform unto itself. Starting from scratch, what would a database for the cloud look like?

    Jim is in position to answer this question as he has created a transactional database engine for MySQL named Falcon and added multi-versioning support to InterBase, the first relational database to feature MVCC (Multiversion Concurrency Control).

    What defines the cloud as a platform? Here's are some thoughts from Jim I copied out of the Cloud Computing group. You'll notice I've quoted Jim way way too much. I did that because Jim is an insightful guy, he has a lot of interesting things to say, and I think he has a different spin on the future of databases in the cloud than anyone else I've read. He also has the advantage of course of not having a shipping product, but we shall see.

  • I've probably said this before, but the cloud is a new computing platform that some have learned to exploit, others are scrambling to master, but most people will see as nothing but a minor variation on what they're already doing. This is not new. When time sharing as invented, the batch guys considered it as remote job entry, just a variation on batch. When departmental computing came along (VAXes, et al), the timesharing guys considered it nothing but timesharing on a smaller scale. When PCs and client/server computing came along, the departmental computing guys (i.e. DEC), considered PCs to be a special case of smart terminals. And when the Internet blew into town, the client server guys considered it as nothing more than a global scale LAN. So the batchguys are dead, the timesharing guys are dead, the departmental computing guys are dead, and the client server guys are dead. Notice a pattern?
  • The reason that databases are important to cloud computing is that virtually all applications involve the interaction of client data with a shared, persistent data store. And while application processing can be easily scaled, the limiting factor is the database system. So if you plan to do anything more than play Tetris in the cloud, the issue of database management should be foremost in your mind.
  • Disks are the limiting factors in contemporary database systems. Horrible things, disk. But conventional wisdom is that you build a clustered database system by starting with a distributed file system. Wrong. Evolution is faster processors, bigger memory, better tools. Revolution
    is a different way of thinking, a different topology, a different way of putting the parts together.
  • What I'm arguing is that a cloud is a different platform, and what works well for a single computer doesn't work at all well in cloud, and things that work well in a cloud don't work at all on the single computer system. So it behooves us to re-examine a lot an ancient and honorable assumptions to see if they make any sense at all in this brave new world.
  • Sharing a high performance disk system is fine on a single computer, troublesome in a cluster, and miserable on a cloud.
  • I'm a database guy who's had it with disks. Didn't much like the IBM 1301, and disks haven't gotten much better since. Ugly, warty, slow, things that require complex subsystems to hide their miserable characteristics. The alternative is to use the memory in a cloud as a distributed L2
    cache. Yes, disks are still there, but they're out of the performance loop except for data so stale that nobody has it memory.
  • Another machine or set of machines is just as good as a disk. You can quibble about reliable power, etc, but write queuing disks have the same problem.
  • Once you give up the idea of logs and page caches in favor of asynchronous replications, life gets a great deal brighter. It really does make sense to design to the strengths of cloud(redundancy) rather than their weaknesses (shared anything).
  • And while one guys is fetching his 100 MB per second, the disk is busy and everyone else is waiting in line contemplating existence. Even the cheapest of servers have two gigabit ethernet channels and switch. The network serves everyone in parallel while the disk is single threaded
  • I favor data sharing through a formal abstraction like a relational database. Shared objects are things most programmers are good at handling. The fewer the things that application developers need to manage the more likely it is that the application will work.
  • I buy the model of object level replication, but only as a substrate for something with a more civilized API. Or in other words, it's a foundation, not a house.
  • I'd much rather have a pair of quad-core processors running as independent servers than contending for memory on a dual socket server. I don't object to more cores per processor chip, but I don't want to pay for die size for cores perpetually stalled for memory.
  • The object substrate worries about data distribution and who should see what. It doesn't even know it's a database. SQL semantics are applied by an engine layered on the object substrate. The SQL engine doesn't worry or even know that it's part of a distributed database -- it just executes SQL statements. The black magic is MVCC.
  • I'm a database developing building a database system for clouds. Tell me what you need. Here is my first approximation: A database that scales by adding more computers and degrades gracefully when machines are yanked out; A database system that never needs to be shut down; Hardware and software fault tolerance; Multi-site archiving for disaster survival; A facility to reach into the past to recover from human errors (drop table customers; oops;); Automatic load balancing
  • MySQL scales with read replication which requires a full database copy to start up. For any cloud relevant application, that's probably hundreds of gigabytes. That makes it a mighty poor candidate for on-demand virtual servers.
  • Do remember that the primary function of a database system is to maintain consistency. You don't want a dozen people each draining the last thousand buckets from a bank account or a debit to happen without the corresponding credit.
  • Whether the data moves to the work or the work moves to the data isn't that important as long as they both end up a the same place with as few intermediate round trips as possible.
  • In my area, for example, databases are either limited by the biggest, ugliest machine you can afford *or* you have to learn to operation without consistent, atomic transactions. A bad rock / hard place choice that send the cost of scalable application development through the ceiling. Once we solve that, applications that server 20,000,000 users will be simple and cheap to write. Who knows where that will go?
  • To paraphrase our new president, we must reject the false choice between data consistency and scalability.
  • Cloud computing is about using many computers to scale problems that were once limited by the capabilities of a single computer. That's what makes clouds exciting, at least to me. But most will argue that cloud computing is a better economic model for running many instances of a
    single computer. Bah, I say, bah!
  • Cloud computing is a wonder new platform. Let's not let the dinosaurs waiting for extinction define it as a minor variation of what they've been doing for years. They will, of course, but this (and the dinosaurs) will pass.
  • The revolutionary idea is that applications don't run on a single computer but an elastic cloud of computers that grows and contracts by demand. This, in turn, requires an applications infrastructure that can a) run a single application across as many machines as necessary, and b) run many applications on the same machines without any of the cross talk and software maintenance problems of years past. No, the software infrastructure required to enable this is not mature and certainly not off the shelf, but many smart folks are working on it.
  • There's nothing limiting in relational except the companies that build them. A relational database can scale as well as BigTable and SimpleDB but still be transactional. And, unlike BigTable and SimpleDB, a relational database can model relationships and do exotic things like transferring money from one account to another without "breaking the bank.". It is true that existing relational database systems are largely constrained to single cpu or cluster with a shared file system, but we'll get over that.
  • Personally, I don't like masters any more than I like slaves. I strongly favor peer to peer architectures with no single point of failure. I also believe that database federation is a work-around
    rather than a feature. If a database system had sufficient capacity, reliability, and availability, nobody would ever partition or shard data. (If one database instance is a headache, a million tiny ones is a horrible, horrible migraine.)
  • Logic does need to be pushed to the data, which is why relational database systems destroyed hierarchical (IMS), network (CODASYL), and OODBMS. But there is a constant need to push semantics higher to further reduce the number of round trips between application semantics and the database systems. As for I/O, a database system that can use the cloud as an L2 cache breaks free from dependencies on file systems. This means that bandwidth and cycles are the limiting factors, not I/O capacity.
  • What we should be talking about is trans-server application architecture, trans-server application platforms, both, or whether one will make the other unnecessary.
  • If you scale, you don't/can't worry about server reliability. Money spent on (alleged) server reliability is money wasted.
  • If you view the cloud as a new model for scalable applications, it is a radical change in computing platform. Most people see the cloud through the lens of EC2, which is just another way to run a server that you have to manage and control, then the cloud is little more than a rather
    boring business model. When clouds evolve to point that applications and databases can utilize whatever resources then need to meet demand without the constraint of single machine limitations, we'll have something really neat.
  • On MVCC: Forget about the concept of master. Synchronizing slaves to a master is hopeless. Instead, think of a transaction as a temporal view of database state; different transactions
    will have different views. Certain critical operations must be serialized, but that still doesn't require that all nodes have identical views of database state.
  • Low latency is definitely good, but I'm designing the system to support geographically separated sub-clouds. How well that works under heavy load is probably application specific. If the amount of volatile data common to the sub-clouds is relatively low, it should work just fine provided there is enough bandwidth to handle the replication messages.
  • MVCC tracks multiple versions to provide a transaction with a view of the database consistent with the instant it started while preventing a transaction from updating a piece of data that it could not see. MVCC is consistent, but it is not serializable. Opinions vary between academia and the real world, but most database practitioners recognize that the consistency provided by MVCC is sufficient for programmers of modest skills to product robust applications.
  • MVCC, heretofore, has been limited to single node databases. Applied to the cloud with suitable bookkeeping to control visibility of updates on individual nodes, MVCC is as close to black magic as you are likely to see in your lifetime, enabling concurrency and consistency with mostly non-blocking, asynchronous messaging. It does, however, dispense with the idea that a cloud has at any given point of time a single definitive state. Serializability implemented with record locking is an attempt to make distributed system march in lock-step so that the result is as if there there no parallelism between nodes. MVCC recognizes that parallelism is the key to scalability. Data that is a few microseconds old is not a problem as long as updates don't collide.

    Jim certainly isn't shy with his opinions :-)

    My summary of what he wants to do with NimbusDB is:

  • Make a scalable relational database in the cloud where you can use normal everyday SQL to perform summary functions, define referential integrity, and all that other good stuff.
  • Transactions scale using a distributed version of MVCC, which I do not believe has been done before. This is the key part of the plan and a lot depends on it working.
  • The database is stored primarily in RAM which makes cloud level scaling of an RDBMS possible.
  • The database will handle all the details of scaling in the cloud. To the developer it will look like just a very large highly available database.

    I'm not sure if NimbusDB will support a compute grid and map-reduce type functionality. The low latency argument for data and code collocation is a good one, so I hope it integrates some sort of extension mechanism.

    Why might NimbusDB be a good idea?

  • Keeps simple things simple. Web scale databases like BigTable and SimpleDB make simple things difficult. They are full of quotas, limits, and restrictions because by their very nature they are just a key-value layer on top of a distributed file system. The database knows as little about the data as possible. If you want to build a sequence number for a comment system, for example, it takes complicated sharding logic to remove write contention. Developers are used to SQL and are comfortable working within the transaction model, so the transition to cloud computing would be that much easier. Now, to be fair, who knows if NimbusDB will be able to scale under high load either, but we need to make simple things simple again.
  • Language independence. Notice the that IDMG products are all language specific. They support some combination of .Net/Java/C/C++. This is because they need low level object knowledge to transparently implement their magic. This isn't bad, but it does mean if you use Python, Erlang, Ruby, or any other unsupported language then you are out of luck. As many problems as SQL has, one of its great gifts is programmatic universal access.
  • Separates data from code. Data is forever, code changes all the time. That's one of the common reasons for preferring a database instead of an objectbase. This also dovetails with the language independence issue. Any application can access data from any language and any platform from now and into the future. That's a good quality to have.

    The smart money has been that cloud level scaling requires abandoning relational databases and distributed transactions. That's why we've seen an epidemic of key-value databases and eventually consistent semantics. It will be fascinating to see if Jim's combination of Cloud + Memory + MVCC can prove the insiders wrong.

    Are Cloud Based Memory Architectures the Next Big Thing?

    We've gone through a couple of different approaches to deploying Memory Based Architectures. So are they the next big thing?

    Adoption has been slow because it's new and different and that inertia takes a while to overcome. Historically tools haven't made it easy for early adopters to make the big switch, but that is changing with easier to deploy cloud based systems. And current architectures, with a lot of elbow grease, have generally been good enough.

    But we are seeing a wide convergence on caching as way to make slow disks perform. Truly enormous amounts of effort are going into adding cache and then trying to keep the database and applications all in-sync with cache as bottom up and top down driven changes flow through the system.

    After all that work it's a simple step to wonder why that extra layer is needed when the data could have just as well be kept in memory from the start. Now add the ease of cloud deployments and the ease of creating scalable, low latency applications that are still easy to program, manage, and deploy. Building multiple complicated layers of application code just to make the disk happy will make less and less sense over time.

    We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.

  • Amazon EC2 Running IBM

    Amazon Web Services上で提供されるIBMのソフトウェアサービスについての情報

    Amazon EC2 Running IBM

    Earlier this year I talked about our partnership with IBM and their commitment to the creation of licensing models that are a good match for dynamic cloud-computing environments. At that time we released a set of development AMIs (Amazon Machine Images), giving you the ability to create applications using IBM products such as DB2, WebSphere sMash, WebSphere Portal, Lotus Web Content Management, and Informix.

    The response to our announcement has been good; developers, integrators, and IT shops have all been asking us for information on pricing and for access to the actual AMIs. We've been working with IBM to iron out all of the details and I'm happy to be able to share them with you now!

    Starting today you now have development and production access to a number of IBM environments including:

    • Amazon EC2 running IBM DB2 Express - starting at $0.38 per hour.
    • Amazon EC2 running IBM DB2 Workgroup - starting at $1.31 per hour.
    • Amazon EC2 running IBM Informix Dynamic Server Express - starting at $0.38 per hour.
    • Amazon EC2 running IBM Informix Dynamic Server Workgroup - starting at $1.31 per hour.
    • Amazon EC2 running IBM WebSphere sMash - starting at $0.50 per hour.
    • Amazon EC2 running IBM Lotus Web Content Management - starting at $2.48 per hour.
    • Amazon EC2 running IBM WebSphere Portal Server and IBM Lotus Web Content Management Server - starting at $6.39 per hour.

    These prices include on-demand licenses for each product. The AMIs are available in the US and EU regions, but you currently can not use Amazon EC2 running IBM with Reserved Instances. However, if you already have licenses from IBM you can install and run the software yourself and pay the usual EC2 rate for On-Demand or Reserved Instances. You can, of course, use other EC2 features such as Elastic IP Addresses and Elastic Block Storage.

    You can find the IBM AMIs in the AWS Management Console's Community AMI List (search for "paid-ibm"), or you can search for "paid-ibm" in ElasticFox.

    Because products like the WebSphere Portal Server and IBM Lotus Web Content Management Server can now be accessed on an hourly basis, you can now think about deploying them in new ways. If you are running a big conference or other event, you can spin up an instance for the duration of the event and only pay a couple of hundred dollars. If you need to do more than one event at the same time, just spin up a second instance. This is all old-hat to true devotees of cloud computing, but I never tire of pointing it out!

    Each AMI includes a detailed Getting Started guide. For example, the guide for the WebSphere Portal Server and IBM Lotus Web Content Management Server is 30 pages long. The guide provides recommendations on instance sizes (Small and Large are fine for development; a 64-bit Large or Extra Large is required for production), security groups, and access via SSH And remote desktop (VNC). There's information about entering license credentials (needed if you bring your own), EBS configuration, and application configuration. The guide also details the entire process of bundling a customized version of the product for eventual reuse.

    Additional information on products and pricing is available on the IBM partner page.

    And there you have it. With this release, all of the major database products — Oracle, MySQL, DB2, Informix, and SQL Server — are available in production form on EC2.

     

    VMware unveils its cloud OS; Wants to be a bridge for the enterprise

    VMWareのPrivate Cloud 戦略である vSphereについての情報

    VMware on Tuesday will announce its cloud operating system—dubbed vSphere 4—with plans for general availability in the second quarter. 

    With the effort, VMware is attempting to bridge virtualized data centers—now known as "private clouds"—and growing cloud computing services from the likes of Amazon and others. However, this bridging process is a work in progress due to the lack of standards. VMware's big pitch is that vSphere can run your data center and allow you to bridge out when external resources are needed. 

    VMware's John Gilmartin, director of product marketing at the company, says the company is hoping to ease enterprises into cloud computing without redoing architecture. "There's a big gap between what most people talk about as cloud and what people are doing today in the enterprise," said Gilmartin. VMware's plan is to get cloud providers to use its operating system and then seamlessly hook up to enterprises using vSphere 4. 

    It's unclear what happens if a vSphere shop isn't hooking up to another VMware powered cloud. Gilmartin said the company is working behind the scenes on application swapping among clouds, but didn't have details or timelines for these standards. It is clear that VMware sees vSphere 4 as a way to thwart both Microsoft's cloud OS, Azure, and its virtualization effort, Hyper-V. 

    Gilmartin argued that Microsoft's approach with Azure requires too many architecture changes for enterprises. He also noted that vSphere will support more operating systems. 

    In the meantime, VMware has packed enough features in vSphere 4 to keep enterprises interested for their own IT as a service plans. 

    Among the key features:

    • A 30 percent increase in application consolidation ratios;
    • Up to 50 percent in storage savings by allowing virtual machines to only use storage as needed;
    • Up to 20 percent additional power and cooling savings;
    • vSphere 4 also scales better with the ability to pool 32 physical servers with up to 2048 processor cores, 1,280 virtual machines, 32 TB of RAM, 16 petabytes of storage and 8,000 network ports.  

    Here's the chart detailing vSphere 4 vs. VMware Infrastructure 3 (in the "current" column):

    One of the more interesting features of vSphere is a fault tolerance option. Data center managers can keep their most valuable apps running even if the underlying hardware fails. By clicking a box to protect a virtual machine, vSphere 4 creates a shadow copy of the application to take over in the event of a failure. There is a performance hit since you're allocating computing resources to the shadow application, but Gilmartin notes that only 20 percent to 30 percent of enterprise software would have to be fault tolerant. 

    VMware's price list for vSphere 4 looks a bit complicated to untrained eyes—notably mine—but here's the summary.