« Gnip handles the entire Twitter feed and more | Main | Sun getting too fancy for their own good »
September 5, 2008
Some APIs we have. Some we don't (yet).
posted by ari
I thought it might be worthwhile to write down all the APIs and frameworks we are bringing down the pike. I have the following in implementation through various OSS contributors right now:
1. Write-behind API to the database. This framework, to be hosted in our Forge, will provide a simple Map-based collection plus an interface for you to implement. The interface will be called to flush your objects to a DB at some point after you call map.put() on that object. The API will encapsulate all the models of write-behind including idempotent updates, non-idempotent updates, all with and without automatic retry.
2. Write-thru API to the database. This framework, again in the Forge, will provide a simple Map-based collection and the appropriate TC + JDBC transaction logic to get a object into cache AND the database or at least just the database, safely and maintaining the database's role as system of record.
3. WAN API that will allow 2 datacenters running 2 uncoupled Terracotta-based clusters to share data. Again, this will be behind a pluggable collection for storage. Objects will be configurable such that some business updates can be flagged as synchronous replication across datacenters, while others are async, and still others are mostly-sync meaning wait for a runtime adjustable timeout to get an ACK from the other datacenter.
What do you think? Do you need this APIs? Have you been wondering how to do these things with TC? Its all going to be pure POJO and open source, housed in our forge. So you can change the implementation as well as learn from it and apply it to your use case. I think that's exciting.
An important side note. A few people exclaimed to me that while this list is nice, they need a distributed cache API too. This surprises me because we have 2 already:
1. If you have a true caching requirement where data expires or ages in the cache and eventually needs to be evicted. Use EHCache on top of Terracotta. Don't write your own evictor, please. Ours is highly tuned to keep objects off of JVMs that don't need them.
2. If you have a pseudo-caching requirement where data expires _only_ on business event, like a user or application conversational state object that expires at the end of a process flow, then just use a ConcurrentHashMap. Don't worry about EHCache, or what have you in this case.
EHCache and ConcurrentHashMap are like our Master / Worker framework or HTTPSession interface. These are officially supported APIs and usages of Terracotta. The stuff I am working on now will join these existing solutions ASAP (read: next 60 days).
Cheers,
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/68
Comments
+1 on the write-behind API. Definitely an interesting use-case for Terracotta to keep it as _the_ Service of Record instead of the DB. Very unorthodox, very scalable :)
Posted by: Carl Byström at September 5, 2008 4:55 PM
Very interesting news: I think that Terracotta should definitely provide a set of POJO-based APIs by itself, implementing most interesting/tricky use cases.
Can't wait to see them on forge ;)
Sergio B.
Posted by: Sergio Bossa at September 7, 2008 8:43 AM
Do we need these APIs? Absolutely! Especially 1 & 2 which basically give you an in memory data grid like Gigaspaces XAP and Coherence do (I know this is where you got the idea for the design from ;) )
It's nice to have a free and open source solution for this (gigaspace is partially open source with openspaces but not the datagrid itself and Coherence is now property of Oracle so that says it all qua price tag... Hmmm it would be nice to see how the TC solution stacks up performance wise against these two vendors. A benchmark anyone?
BTW: I have a little question regarding TC: from the documentation I read on TC I know it gives good performance thanks to all the cleaver bytecode magic behind the scenes for a small number of nodes. But I wonder in this shared distributed memory model when you try to scale to hundreds or more nodes (for large grids), if you then would not run into lock contention problems that would kill performance because the huge number of threads on all the nodes that would be competing for the same shared locks in synchronized methods/blocks? Distributed shared nothing memory models like Coherence/gigaspaces that do replication by value instead of reference would not have this problem (if that would occur for TC).
Tx.
Posted by: Neo7471 at September 19, 2008 10:01 AM
+1 on all three items
Posted by: Anonymous at September 22, 2008 3:55 AM
Neo7471,
1. I just wrote a blog on performance. Short answer: we are way faster per TC Server instance than the competition.
2. Terracotta has introduced the notion of a Terracotta Server Array and thus I am not worried about large scale problems (thousands of nodes). That said, we really don't worry today because in a share nothing world that the grid vendors like to put forward, there is no lock contention in Terracotta either--check out "greedy locks" in our documentation.
Cheers,
--Ari
Posted by: ARI ZILKA at September 22, 2008 2:35 PM