2009年6月19日金曜日

Oracle Coherence vs Gigaspaces XAP

Oracle社のCoherenceとGigaSpaces社のXAP、という2つのGrid Computingソリューションの比較を行った記事。  Coherenceの方は比較的Read OnlyのWebアプリケーションに向いている一方、GigaSpace XAPはOLTPアプリケーションなど、フレキシビリティが要求されるアプリケーションに向いている、という評価

I've been fortunate enough to work (read: get to play) with two leading data/computing grid solutions in commercial projects over the last year — so here's a short summary of differences between Oracle Coherence and GigaSpaces XAP.

If this is a topic that interests you, you might also be interested in attending a free one-day conference in London on cloud and grid technologies on 9th of July (see gamingscalability.org for more information). At the event I'll present an experience report from one of the projects I've mentioned here and go into much more detail on what we got out of it.

Data Cache

Both systems support deploying a data grid on multiple machines and automatically manage routing, fault-tolerance and fail-over. Both grids support passive data caches and sending code to be executed on the node where a particular object resides rather than pulling the object from the cache and then running a command. Both grids support querying objects by their properties.

Both grids support event-driven notifications when objects are added or removed from the space. Coherence has notifications that go out to external clients (UI apps, for example) and supports continuous queries that will send updates without polling from the client. Gigaspaces only has notifications internally in the grid, meaning that you can set up a processor to receive events about new, updated and removed objects matched by a particular template.

Both systems have concepts of local caches for remote data partitions which automatically update when the remote data changes (Coherence calls this "near cache"). Coherence supports lots of caching topologies which can be flexibly configured, but Gigaspaces only supports a local partition, global space and local cache of the global space. Local caches in Gigaspaces are really for read-only access (reference data).

Both grids support .NET/Java interop to some level. Coherence does this by requiring you to specify a serializer implementation for your class on both ends, in which you basically just need to specify the order in which fields are serialized and deserialized. I haven't tried out the Gigaspaces solution for interop. According to the documentation, if you follow a naming convention in both places (or override it with attributes and annotations), the grid will transform POJOs to POCOs and back fine. Again, without trying this myself I cannot actually tell you if it works or not.

Processing

Gigaspaces doesn't just allow you to send code to the objects, it is actually designed around an event-driven processing model where objects are sent to processing code and the same processing code runs in each data partition. Events for processing are specified by example, matching templates on classes and non-null properties, and Gigaspaces manages thread pools and other execution aspects for you automatically. Events can be triggered by the state of entities in the grid, or by commands coming to be executed on the grid. It also has a fully transactional processing model, so if an exception gets thrown everything rolls back and another processor will pick up the command from the space again. It integrates with Spring transactions so transactional processing development is really easy.

Coherence has a reference command pattern implementation, not part of the basic deployment but as a library in the Coherence Incubator project. Its allows you to send commands as data grid objects to entities in the data grid but cannot directly invoke events based on entity properties in the grid. Coherence also has a very limited support for transactions – the JCA container gives you a last-logging-resource simulation but no real transactional guarantees.

Deployment

Gigaspaces is designed to replace application servers, so it has a nice deployment system that will automatically ship your application code across the network to relevant nodes, and cloud deployment scripts that will start up machines on EC2 as well. Until recently the scripts were a bit unreliable but version 6.6.4 fixed it. Coherence does clustering itself, but it was not intended to replace application server functionality. When it comes to deployment, you have to do it yourself. There is a JCA connector for application servers which I've tried with WebLogic (and version 3.3 of Coherence finally works out of the box with this), but there are lots of reasons why you do not want to run the whole grid inside WebLogic clusters or something similar, but have it as a separate cluster.

On the other hand, Coherence has a pluggable serialization mechanism (POF) which would theoretically allow us to run multiple versions of the same class in the grid and hotdeploy a new version of the application on nodes incrementally and without downtime (I haven't tried this myself yet, though, so I don't know whether it really works like that). Gigaspace applications (processing units) are split into two parts – a shared library distributed to all the applications and the processing unit specific code. Shared libraries cannot be redeployed after the grid starts, so a hot-deployment of processing unit specific code is fine, but not for any data that is actually stored in the grid. This is apparently going to be changed in version 7. Until then, the best bet for hot deployment on Gigaspaces is to split out the data format and business logic into separate classes and JARs. I'm not too happy about this, but once the class loading changes we might go back to nice object design.

Scaling

Both grids seem to scale enough to deal with problems which I am fighting with (order of magnitude 10 computers in a grid, haven't tried them on deployments of hundreds). However, Coherence scales dynamically — you can add more nodes to the cluster on the fly, without stopping the application. This allows you to scale up and down on demand. Gigaspaces deploys data to a fixed number partitions and fixes it for the lifetime of the data space. If a machine goes down, a backup partition will take over and on clouds you can even have a new machine instance started up for you automatically, but you cannot increase or decrease the number of partitions after the grid has started.

Persistency

Both grids have read-through and write-through support. Gigaspaces comes with a Hibernate-based asynchronous persistency system with mirroring (allowing you to move database writes to a separate node) out of the box. Although the idea is nice, in the current incarnation it has quite a few rough edges so we ended up rolling out our own. For real read-through and write-through to work on Coherence you need to ensure that you configured and deployed persistency code to all the nodes, which might be a bit of a challenge if a part of the grid is running in an application server and a part of the grid is running outside of application servers (especially with non-coherence clients). Since Gigaspaces handles the deployment for you, it makes it a bit easier to run configurations such as these. Gigaspaces also has the concept of an initial load that will pre-populate the memory space with objects from the database and supports on-demand cleanup from the grid without deleting objects in the persistent store.

So when should you use what and why?

There is no clear winner in this comparison because these two products seem to be suited to different problems. Gigaspace is a heavyweight replacement for application servers and in my view best suitable for distributed transactional processing. Its processing model is much more flexible than the one in Coherence and has more features, not least proper transaction support. Coherence seems to be a much better solution for passive read-mostly data grids. It can grow and shrink dynamically. It supports much more flexible topologies and has more powerful libraries for client applications.