« What's the Easiest, Most Scalable, Enterprise Architecture You Know? | Main | Have you heard? JIRA was clustered using Terracotta »
December 3, 2007
FUD OF THE WEEK: JBossCache to Terracotta migration...a response to JBoss from Terracotta
posted by ari
I am happy to discuss competing architectures with anyone who wants to at any time. I also give most people the benefit of the doubt that they are too busy with their scope of work to master my product--for my part I try to get my hands on as much technology as I can but I don't that assume everyone does this. But this morning I awoke to find a disturbing amount of FUD forwarded to me in my inbox (Bela Ban of JBoss wrote an email which was forwarded to me and is included here).
Bela doesn't have to like Terracotta. That is his choice. But, in this case I had to speak up because Bela presents what is an obviously biased opinion as if it were fact. Not good.
Anyways, here is the email of which I speak (note that I deleted the parts where he attacks Terracotta the company--they are documented as "snipped") :
From: Bela BanDate: Dec 2, 2007 11:28 PM Subject: Re: Clustering local JVM's using JGroups [snipped the snipes]
TC has major *design* flaws, which can not be optimized away:
* Single centralized backup server: doesn't scale, as all client VMs
back up their data to the *same* server. So one backup server has
to keep replicas of all nodes in a cluster... Single point of
failure too, although it can replicate its state to a 2nd server
* Global lock acquisition: a synchronized statement will acquire the
lock on the client and server JVM, so enter-monitor and
leave-monitor are synchronously replicated, adding the round trip
latency to each such sequence. Imagine how fast this is... :-)
* No support for transactions. Start a TX, perform some updates, and
then the TX is rolled back, or - even worse - the primary crashes
before the TX commits. The backup server will now have
inconsistent stateYou can take the TX example one step further: if you start a TX, make a
few changes (those are replicated immediately) and then *roll back* the
TX, the backup server will *still* have the changes made in the rolled
back TX ! When Jonas Boner gave a talk last year at TSSJS and someone
asked him this questions and he replied that TXs are over valued, the
audience just nodded their heads... :-)TC are very good at marketing, but that's it ... :-) But as
soon as you eval it closer, you'll see its design flaws. That's why we
have yet to see TC (as competitor) in a major bid, whereas we see 'good'
clustering solutions like Coherence in almost every bid. We will soon
call their scam ![snipped the snipes]
--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat
I prefer to work my way backwards on this one and I apologize but I will paraphrase each of Bela's FUDs for easier reading.
1. No transaction support. Corrupt or lost state if the TC server fails: I think Bela just doesn't know our product. Our system is indeed ACID-compliant. It is not JTA-compliant but those 2 are not the same thing. With Terracotta, if the primary server fails mid-transaction, your JVM would not get an ACK and thus auto-reestablishes link to the new primary server and flies the transaction again. The entire fail-over between Terracotta actives and passives is sub-second since passives are hot and are getting every transaction as those transactions occur.
Interestingly enough, JBossCache is based on peer-to-peer comms which means you need a quorum in order to commit a transaction and, as such, transactional applications will see massive performance issues. For what its worth, transactions and scale out do not mix if you ask most architects of large-scale systems so bringing up JTA / transactions is nothing more than FUD.
2. Centralized pessimistic locking is slow: Our product is benchmarked by many customers at 10X faster than PojoCache and 100X faster than Treecache. This happens in-part because our locking is nothing like what Bela thinks. In fact, our locking is guaranteed in-order across the cluster with all sorts of performance optimizations (including optimistic ones as well). My favorite optimization is greedy locking. Greedy locks are only efficient in a centralized cluster server model such as ours. Our server tracks which threads and which JVMs use locks. Our server lets one node keep a lock without coming back to the network or our server for that lock (no matter how much the lock is acquired and released). Our server can then ask a greedy thread to give back its lock on next lock release. So all JVMs can indeed come to the server, but under greedy locks our server checks out locks to JVMs as if it were a librarian. JVMs can take the lock home, just like you can take a library book home. And when you return it, the librarian consults the list of waiters to hand out the lock again. The librarian might call your house and say, "I need that book back" at any time, however.
3. One server is a SPoB and SPoF: strangely Bela contradicts himself here and admits that he knows we have backups so why call us SPoF in the first place? (This is the one that makes me smell FUD because he shows directly that he knows better.) Anyways, Terracotta has seen users push the 1GBit / sec mark on a single TC instance this year. Note that we push only deltas to memory so 1GBit usually represents 10K - 50K transactions per second. A couple of our competitors (not including JBoss) seem to also be able to get to the 1GBit mark but they are pushing more like 1K - 5K transactions per second at the same bandwidth mark because they are serializing object graphs!
Something Bela doesn't know because we have been somewhat quiet about it is that if TC becomes the bottleneck our users can go to multiple active TC instances. So the single point of bottleneck claim is just wrong. In fact, the chief architect of a telco customer we worked with this year looked at the JBossCache versus Terracotta performance comparison and turned to me saying, "I expect no less from your central server. A peer-to-peer system has no chance of keeping up with a well-tuned server. It's like a super-fast easy to control single peer to everyone...no lookups and votes and quorums. Far less overhead." In that comparison, we delivered order-of-magnitude 1000 HTTP requests / sec per JVM compared to TreeCache at 10 requests / second and PojoCache at 100 requests / second. Both JBoss and Terracotta helped tune the application. We also scaled linearly as far as the customer tested which was not true of JBoss.
That leaves us with SPoF which Bela himself acknowledges is wrong. I have also mentioned in #1 above how we are ACID and transactional and, how the passive becomes the active and all JVMs transparently fail over sub-second. Most of our users eventually test "pull the plug" behaviors both for our TC server instance as well as for their own application instances. The customer I just mentioned tested this as well. The JBoss cluster never served another request after just one random node was yanked from the cluster. TC kept running w/ no errors reported by the load testing system and only a small slow-down for a couple of seconds while TCP connections were reestablished.
4. JBoss doesn't see TC in accounts: we have done head-to-head bake-offs in accounts and JBoss lost every one that I am aware of. In fact, in that Telco deal they lost by 100X with TreeCache and by almost 10X even with PojoCache. Not to mention, JBoss told the customer that in order to go as fast as the cache had in testing, they had to hack it up and the version that was used in the bake off should _not_ be taken to production under any circumstances.
While I have been explicitly avoiding the battles and back-and-forth debates that seem to rage amongst certain Java vendors I feel this one attacks not just by saying "we are better than you" but by asserting things--inaccurate things--about our products.
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/28
Comments
"If TC becomes the bottleneck our users can go to multiple active TC instances."
TC is very hard to sell internally because of the perception of a (far away but existing) hard performance barrier.
How does this mecanism work? Is the data partitionned between the servers, is this partitionning automatic (if manual, it is my proposed solution to the "wall" but automagical partitioning "supported by the vendor" would be better)
Best Regards,
Posted by: Josselin Pujo at December 4, 2007 2:08 PM
That one is really funny. Im am working with Terracotta and JBoss PojoCache for about 6 months (a long study on the replication of components). Integrating Terracotta in our framework has been fun and very easy since the beginning.
On the contrary, the integration of JBossCache has been (and still is) more like a nightmare. I always have to dig through numerous undocumented bugs and limitations. I had to put traces in JBossCache code, to try to find answers in poor documentation (JBoss has a tendency to confuse documentation and samples...) and had many headaches with this product.
I must admit that in pure performance, PojoCache 1.4 gave us the best performance, but with so many constraints on our business model that it was not practically usable. Currently, the 2.1 version continues to suffer from bugs and is more than 10 times slower than our solution with Terracotta !
So, Bela, I think you should have to think twice before writing such things and putting those ironic smileys in your post... :-)
Terracotta DSO is by far a better product than JBoss PojoCache.
Posted by: Mathias Bollaert at December 5, 2007 2:24 AM
Well done, Ari!!
Posted by: AL at December 5, 2007 3:10 AM
Josselin,
Terracotta has multiple transparent active servers in the works. For now, however, the multiple actives are achieved underneath our caching modules such as EHCache. Think of a concurrenthashmap where some buckets map to one TC server and other buckets map to a second TC server. You can get 2, 3, or n TC servers underneath an app in this manner.
We tend to steer users toward squeezing lots of performance out of their app and ours before worrying about this active / active config. Active / Passive TC has been successful thus far.
Posted by: Ari Zilka at December 5, 2007 10:03 PM
I looked at various cache solutions about 3 years ago. JbossCache is the absolute worst of the breed. It makes simple things hard and ignores the hard problems. There might have happend stuff with it since then but back then it was not mission ready in any sense of the word. It was cr*p.
Posted by: ACE at December 13, 2007 3:55 AM
I don't feel comfortable asserting that JBossCache sux. My point was simply that Terracotta does not. Your opinion is, of course, your own.
Have you tried Terracotta? We welcome the feedback on our stuff as well...
Cheers,
--Ari
Posted by: ARI ZILKA at December 14, 2007 2:26 PM
1. No transaction support. Corrupt or lost state if the TC server fails: I think Bela just doesn't know our product...
yes but i think question is whether user-initiated transactions are support. it is clear terracotta internal protocol is transactional, but maybe this is notwhat mr. bela means, no?
so i am still confuse. if use use terracotta cluster my program, when i transaction.begin() will my own work be isolate and commit correctly or rollback even if mr. terracotta fail over?
Posted by: frederico r at December 14, 2007 2:49 PM
memcache & Coherence are popular because they partition transparently. Essentially, you stripe the keys across nodes. If you saturate the node cpu and more nodes don't yield the headroom, load balance parallel caches.
Now that I have stumbled across that TC is working on active/active (aka partitioning), I'm interested.
How is the client heap footprint controlled? Are there config properties like .maxBytes?
How robust is the disk cache?
Posted by: ken at December 15, 2007 2:23 PM
Ari, I was wondering if it is appropriate to compare GigaSpaces to Terracotta? How would you differentiate the two?
I'm definitely going to get my hands dirty with Terracotta in the coming days :)
Posted by: Rishi at December 18, 2007 5:24 PM
Frederico: Terracotta does not participate in a TransactionManager context. We are adding a configuration option shortly where you can identify the transaction implementation and then we will inherit your transaction boundaries transparently, as opposed to synchronized{} which currently demarcates our transaction boundaries.
Ken: There are several tuning knobs and switches on the client. The easiest one is how much heap to take w/ TC caches. You can have a 100GB collection in TC and window 10MB of it into local heap and later decide to up that to 100MB without changing anything in your app The disk cache is journaled and transactional (see Sleepycat @ oracle.com for more details).
Rishi: Terracotta vs anything is an appropriate question as long as you are thinking about it :) .
I personally see Gigaspaces as a programming model + the associated runtime. If J2EE is Java + EJB for scale / availability via a DB or messaging, Gigaspaces is an alternative (based on JINI). In other words, Gigaspaces is a container. Terracotta, on the other hand is a plug-in to the Java runtime. It works under many Java programming / architecture models because terracotta transparently hooks into the Java heap and works at a memory-level, not an API-level. From containerless to J2SE container to J2EE containers Terracotta is agnostic. In fact, Terracotta can implement JINI / Spaces but the opposite is not true. make sense?
Posted by: Ari Zilka at December 20, 2007 10:06 AM