« Finally: transparently clustered JRuby | Main | Coherent clustering revisited »
November 19, 2008
Coherent clustering and talking past each other
posted by ari
Was reading a back-and-forth between our product manager and some folks from Oracle.
Its interesting because we are 100% sure that Oracle Coherence is not coherent and they are sure they are. Engineers definitely don't like be wrong so I am sure neither will budge here, but it is clear to me that we are talking past each other.
Does Coherence run in an async mode in order to scale? Absolutely not. It runs in a partitioned mode in order to scale. Async != partitioned and so I agree they are not incoherent.
Does Terracotta do something different though? Yes.
Let's go back to the CAP theorem from those wicked-smart Amazon.com guys.
Consistency
Availability
Partitioning
Can't get all 3 at the same time. (I think a few people would debate this and maybe some day someone will find an error in the mathematical proof but for now let's treat this as our foundation.)
According to CAP consistency is orthogonal to partitioning. Does this make sense? Yes. Consistency to me is "keep all nodes who have the data in sync at all times" and partitioning is "keep running even if you can't talk to any other node." Can I have consistency and partitioning? Sure. This is Oracle Coherence.
They split the data amongst JVMs (10 JVMs would each have 1/10th of the data) and then they make sure all data is spread around the JVMs so if I lose one JVM there is a backup of his data. Consistency is then assured by synchronously updating the copy of the data whenever a change comes into one of my JVMs. Kewl.
Terracotta does not split the data. Terracotta is all about Availability and Consistency. Any node can get at any and all data at all times. The ideal situation in Terracotta is to keep work balanced in some way. If you do (say HTTP requests sticky by session, for example) then Terracotta will scale linearly via a weak partitioning scheme (nothing is rigid...anything can move from 1 JVM to another but in ideal circumstances nothing is moving around the cluster). Also, with Terracotta all my writes are sent to the Terracotta array in case the JVM working with my objects dies. That's availability. And consistency for Terracotta means that--going back to the 10 JVM scenario--if all my JVMs happen to have my object resident in memory than whenever I attempt to mutate the object in a particular JVM, I am first guaranteed that anything that happened before I started reading the object in any and all other JVMs will be applied locally. My JVMs won't see stale data.
So, Terracotta is Available + Consistent meaning all shared data is backed up to the Terracotta array and JVMs will stay in synch if they end up sharing objects. Partitioning is weakly guaranteed where, if I partition my workload to my Java cluster, objects will only be resident in JVMs that are actively using those objects. partitioning is done implicitly. Kinda kewl.
In Coherence, developers partition the data by using an Oracle interface. Then the partitions are automatically backed up. Consistency and partitioning with a weaker availability guarantee in that if too many nodes go down, subsets of my data get lost.
I think this is a more accurate, albeit potentially more confusing discussion.
At the end of the day, Terracotta is the only technology capable of keeping an arbitrary number of JVMs consistent across updates. It is built ground up to leverage transparency and field-level updates to be able to provide reasonable scale w/ consistency guarantees. Nothing else I know of can do this.
That being said, if you want to partition manually, then you can get manual control over scalability and that's not necessarily bad. Its just not always necessary to partition deep inside your code in order to get scale.
Which you will prefer will depend on the size of your use case. If you are Amazon and have 10,000 JVMs, you plan to leverage partitioning at any and all levels so you don't need a system with weak / implicit partitioning. If you have 10 or 20 JVMs, I think you might prefer a system that does not require you to partition and instead lets you start with consistency and availability. After all, you can almost always add partitioning as long as your use case makes sense, right?
Think about that one...
update: Now that I think about it, Coherence is not quite focused on partitioning. in CAP, the P means an arbitrary runtime partition comes into existence...not that a developer divides up the data using an interface. If the network were to partition the cluster at runtime, Coherence would take what data surviving nodes can still reach and run with that subset as the working set. Once the network came back, you would have a split brain for some subset of data.
While Terracotta could split the brain, the manifestation would be different. The Terracotta server array would have to partition internally in addition to your application JVMs. Example: if you lose a network switch and 5 of 10 JVMs can reach 1 half of the Terracotta server array and 5 the other half, and the server array internally splits in half the exact same way, then we would keep going. If, however, the server array stays internally intact, no split brain even if clients cannot reach each other. If the server array splits and the clients do not, the clients will inform the array that this is occurring. Only in the scenario when both your cluster and our array split evenly down the middle (exactly in half) will we split.
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/76