2009年12月12日土曜日

2009を振り返って:クラウドは急成長、しかしながらデータが置き去り?

2009年はクラウド元年とも言え、さまざまな市場の動きがあった。

下記記事はそれを上手にまとめつつ、段々と表面化してきている企業内のデータのクラウド化の難しさについて言及している。 構造型DBであるSQLの時代からHadoopに代表される構造化されていないデータの管理技術が発達する、と想定されており、企業としてどのように移行していくか、大きな課題になる、と指摘している。


Year in Review, 2009: Cloud computing struggles with data schemes

By Rob Barry
11 Dec 2009 | SearchSOA.com

Software movers and shakers have explored the underlying concepts behind cloud computing for some time, but the technology as a whole stole the spotlight in 2009. Massive scaling of distributed computers with light-weight data architectures at Amazon.com, Google.com, SalesForce.com, Yahoo.com and other 'dot-coms' propelled the notion of 'cloud' to the fore.

It has been a particularly busy year in the cloud computing space. In February, Microsoft announced SQL Azure Database for its upcoming Azure cloud computing platform. The offering is an alternative to Windows Azure Table, a non-relational storage system. In April, Google added support for Java to its App Engine application-hosting service. Previously, the cloud-based service supported only Python. In June, SalesForce released the Force.com Free Edition, expanding its cloud applications hosting offerings to include an entry-level option. In October, Amazon released its Relational Database Service (RDS), which brought the MySQL database to its Electronic Compute Cloud (EC2).

At the start, attention was on SimpleDB, Hadoop and MapReduce, which promised to bring a new Representational State Transfer-oriented (REST-oriented) architecture on the cloud. Before the year was over, attention was on Microsoft and Amazon and their efforts to make their cloud offerings more approachable for traditional developers versed in standard SQL database architectures.

"I think the rise of the cloud is the big story in 2009," said David Linthicum, expert and author on SOA and cloud computing. "That is, the ability to leverage the internet is a core strategic value within our enterprises, beyond simple content distribution." Linthicum went as far as to suggest that cloud was the "Man of the Year," after Time magazine's familiar yearly award.

Linthicum said a major issue in cloud computing this year has been getting data out of the data center and onto the cloud in a reliable way. Security has also been a major concern.

Standard cloud forming slowly
As cloud computing is young, standards are slow in coming. The primary focus with cloud computing at The Open Group (TOG) said Chris Harding, forum director for SOA and semantic interoperability at TOG, is exploring possible models, frameworks and standards.

At present, however, Harding said it may be a bit too early to legitimately consider setting standards for cloud computing. On one hand, it is an immature space and nobody quite knows what it will look like in the next few years. On the other hand, the large vendors that can afford to provide cloud platforms would probably only play along if it was in their interest.

Cloud computing's present technical diversity is especially borne out in cloud data architecture. It may appear evolutionary rather than revolutionary right now, as some big players filed tamer versions of the original offerings.

"One strong candidate for the most significant advance in cloud computing this year was the recognition by major players—first Microsoft, then Amazon—of how important relational data as a service is to customers," said David Chappell, principal of Chappell and Associates. "Microsoft's original announcement of its Azure technologies didn't support relational data at all, an oversight they corrected with the announcement of SQL Azure Database."

Philip Walston, vice president of development and product management at Layer 7 Technologies, wagered that Amazon has proven to be the cloud computing player of 2009.

"I think Amazon - if you look at where they started the year and where they ended - has added a lot of services," said Walston. "And when you can spin up instances of other people's technologies and spin them up, that's a really great way of infiltrating your competitors markets."

One of the things Amazon has done to broaden its appeal as a platform is allow users to spin up Amazon Machine Images (AMIs) of relational databases from IBM, Oracle, Sybase and others inside EC2. Walston said this gave Amazon the edge in terms of providing database options to customers. And then when the company opened up VPN access to its cloud database offerings, it more or less sealed the deal in Waslton's eyes.

Scaling the cloud computing horizons
Cloud computing's seemingly unlimited on-demand resources offer architects unprecedented amounts of spare cache to play with. Some REST-proponents have even indicated that this will lead large distributed, non-relational data caching systems to gradually replace relational SQL databases. But for now, it seems SQL is what most enterprises are used to and cloud providers who started with a more non-relational approach have adjusted course to include relational offerings.

While IT architects throughout the industry wonder how they may eventually have to approach databases differently in the cloud, some experts say the structures may stay more or less as they are.

"I don't think there will be much difference because of the work that's already been put into parallelizing relational database use," said Curt Monash, founder of Monash Research. "Most large organizations around the world are using some form of distributed relational database system."

While non-SQL systems can be scaled to tackle sizeable problems, Monash says there is and will likely continue to be a better market for SQL databases as most enterprises begin to look at the cloud.

In October, Amazon added support for MySQL to Amazon Web Services (AWS). This will give developers easing into Amazon Elastic Compute Cloud (EC2) a more familiar data system than the non-relational SimpleDB, which had been released first. Since most enterprises keep their data in some sort of SQL database, having this option could make the transition to cloud an easier one.

In February Microsoft had taken the first similar step when it announced the addition of SQL Azure Database (then SQL Database Services) to its upcoming Azure cloud platform. Originally, the company had planned to center the data architecture for Azure around REST architecture. This non-SQL approach would provide a data infrastructure more reminiscent of SimpleDB or Apache Hadoop.

Highlights from the year in cloud
In February ... Microsoft announced SQL Azure Database for its upcoming Azure cloud computing platform. The offering is an alternative to Windows Azure Table, a non-relational storage system.
In April ... Google added support for Java to its App Engine application-hosting service. Previously, the cloud-based service supported only Python.
In May ... IBM announced its CloudBurst cloud appliance for 'private cloud' deployment.
In June ... SalesForce released the Force.com Free Edition, expanding its cloud applications hosting offerings to include an entry-level option.
In September ... Oracle's Larry Ellison derides the cloud as "water vapor."
In October ... Amazon released its Relational Database Service (RDS), which brought the MySQL database to its Electronic Compute Cloud (EC2).

But there was so much unrest among .NET developers over this that Microsoft eventually came around and built out SQL for Azure as well.

Microsoft's support for a relational database in the cloud is one of the most important cloud data moves in the past year, said Marcus Collins, senior analyst at the Burton Group and author of "Cloud Databases: Structure in a Nebulous World," a new Burton Group report.

"Now Microsoft has opened up the notion of fully relational databases in the cloud," said Collins. "If it's successful we will almost inevitably see other vendors doing it."

A new kind of database
While the support of cloud-based RDBs from Amazon and Microsoft has met many programmer needs, there is clearly still significant interest in non-SQL approaches for handling what Collins calls "Internet-scale" processing. Collins describes Internet-scale attributes as comprising very large data volumes, complex processing, and semi-consistent data. SQL, while good for dependably storing and retrieving corporate transactional data, would struggle to analyze the large amounts of unstructured data in the cloud that companies like Google and Facebook use.

To meet this need, a different kind of database is emerging. "There's a new class of database that's coming into play that's called a 'non-schematic database,'" said Carl Olofson, a research vice president at IDC. "You're providing it with data or tags or non value pairs, whatever you want to call it. You load all this stuff into the database first and then you find out the order of what's in it."

This "populate first, ask questions later" approach is different from the SQL approach, where data is structured and defined before entering the database. But it is meant to attack a different class of problem.

An example of where this is done is Hadoop, Olofson said.

Hadoop is a large-scale, Java application framework that uses a non-SQL distributed file system and Google's MapReduce technology. It is better suited for calculating compute-intensive analytics from very large sets of data than SQL because it offers more scalability and loose coupling of data. An architect might use Hadoop to analyze complex behavior patterns for 300 million users in a social network. Such a problem is broken apart into many pieces and sent to any number of computers (nodes) in a cluster.

The major advantage to SQL databases is data quality and consistency. Application architects can rest easy knowing that when an update is made to a piece of data in a relational database, there is little chance it will be inappropriately accessed before the change is complete. Non-SQL data systems trade data continuity and speed for scalability and broader compute resource allocation.

"Data quality is what the relational model offers," said Collins. "The quality of the data is really what governs how successful an application will be and so you do need solid business rules around the data." These qualities are missing from something like the SimpleDB model, Collins said.

The drive to tame original cloud traits goes beyond data. Google expanded APIs for its cloud services in April of the year to include Java support. The company's previous efforts had required developers to work with the Python language. Expect more such adaptations across the cloud space in 2010.

http://searchsoa.techtarget.com/news/article/0,289142,sid26_gci1376713,00.html