« November 2007 | Main | January 2008 »

December 2007 Archives

December 2, 2007

What's the Easiest, Most Scalable, Enterprise Architecture You Know?

I just finished an offsite where all our execs got together with our sales reps and walked through the accounts we won this quarter. What were the patterns? What resonated with the users about our product and our value proposition?

Here's the pattern. Everyone chose Terracotta because it is transparent. Several also chose because it is open source. But the pattern was that people wanted simple scalability and _then_ tested and did proofs of concept around performance and availability. In deployment, they all had roughly the application they had started with, but it was running on Terracotta and could be spread across application nodes with no recurring changes. In short, the business's problem was scalability and availability, but the developer's challenge was simplicity and time to market.

The punchline is that scalability and availability mean everything to application teams. It is what keeps their paycheck showing up in their bank account each month. But, if teams can like the application they build, they will pick that path of least resistance to getting scalability and availability. Thus, I conclude that simplicity is key.

A few examples:


  • One customer wanted to drop-in a cache of their system of record. It had to do tens of thousands of queries per second, but it had to work within their existing architecture.

  • One customer wanted to drop-in clustering of reference data for what they called a map of maps. They started with clustered caching and quickly realized that the nested maps would lead to HUGE serialization payloads and was a non-starter. Before rewriting and flattening their object graph, they tried Terracotta. And it worked

  • One customer wanted to distribute a query engine they had built using ExecutorService and multi-threaded code. They found that Terracotta could spread that query engine across 10 servers without changes because Terracotta supports many util.concurrent constructs out of the box.

So I noticed this simplicity pattern, especially when juxtaposed with my QCon, San Francisco experience where every "real world" architecture is predicated on partitioning and "eventually correct" algorithms.

Partitioning asserts that scalability comes from "add a brick" scale-out approaches. Add a node and get a node's worth of capacity.

Partitioning asserts that availability comes from stateless database-backed storage approaches. This does not conflict with scalability because a database instance can be isolated down to a function (order management and customer management could be in 2 separate instances--vertical partitioning), or to key range (users starting in "A" can be stored in a separate instance from those starting in "B"--horizontal partitioning).

And what of simplicity? Well, anyone who builds such an architecture has told me that this is no harder than building the simple form of the application. Why then do all the people I speak to at conferences, and all the customers we win say that "the less I have to partition, the happier I am."

Given that I helped build and run one of these for 4 years I think I know the answer. Transparency. When the partition is on the wrong boundary, what do you do? Change the code. When the partition needs to be chopped to an even finer grain, what do you do? Hash to more buckets, drop the data, reload it under the new number of buckets, and you are up and running again. When the partition fails to partition further, what do you do? Start tuning and caching.

In almost all cases, IT engineers are buying and deploying more hardware, architects are tuning infrastructure services like MQ, ESB servers, and database servers, and developers are changing code to compensate for the presence of more servers and services.

The most scalable architectures I know are partitioned. The most available architectures I know are stateless. And, the most simple architectures I know are stateful. It all conflicts.

I think the more we can deliver stateful development models that partition for scale and persist for durability at runtime, the more we lessen the trade-offs.

I am going to think about this some more, but for now I will end with a use case where a user had a caching service partitioned (vertically) outside the application. That caching service could partition the data, and spread it out any way it saw fit because that cache was abstracted from the core application code onto both its own interfaces and its own servers!

Well, if Terracotta is already a separate piece of software from your application, why can't you just write an application, configure which objects need to be shared and cached, and then we partition the cache transparently inside our infrastructure software? We can in fact. I need to think more about this!

I want the simplest, most scalable enterprise architecture we all know to be stateful in development, stateless at runtime, and all transparent to the Java developer.


December 3, 2007

FUD OF THE WEEK: JBossCache to Terracotta migration...a response to JBoss from Terracotta

I am happy to discuss competing architectures with anyone who wants to at any time. I also give most people the benefit of the doubt that they are too busy with their scope of work to master my product--for my part I try to get my hands on as much technology as I can but I don't that assume everyone does this. But this morning I awoke to find a disturbing amount of FUD forwarded to me in my inbox (Bela Ban of JBoss wrote an email which was forwarded to me and is included here).

Bela doesn't have to like Terracotta. That is his choice. But, in this case I had to speak up because Bela presents what is an obviously biased opinion as if it were fact. Not good.

Anyways, here is the email of which I speak (note that I deleted the parts where he attacks Terracotta the company--they are documented as "snipped") :

From: Bela Ban Date: Dec 2, 2007 11:28 PM Subject: Re: Clustering local JVM's using JGroups

[snipped the snipes]

TC has major *design* flaws, which can not be optimized away:

* Single centralized backup server: doesn't scale, as all client VMs
back up their data to the *same* server. So one backup server has
to keep replicas of all nodes in a cluster... Single point of
failure too, although it can replicate its state to a 2nd server
* Global lock acquisition: a synchronized statement will acquire the
lock on the client and server JVM, so enter-monitor and
leave-monitor are synchronously replicated, adding the round trip
latency to each such sequence. Imagine how fast this is... :-)
* No support for transactions. Start a TX, perform some updates, and
then the TX is rolled back, or - even worse - the primary crashes
before the TX commits. The backup server will now have
inconsistent state

You can take the TX example one step further: if you start a TX, make a
few changes (those are replicated immediately) and then *roll back* the
TX, the backup server will *still* have the changes made in the rolled
back TX ! When Jonas Boner gave a talk last year at TSSJS and someone
asked him this questions and he replied that TXs are over valued, the
audience just nodded their heads... :-)

TC are very good at marketing, but that's it ... :-) But as
soon as you eval it closer, you'll see its design flaws. That's why we
have yet to see TC (as competitor) in a major bid, whereas we see 'good'
clustering solutions like Coherence in almost every bid. We will soon
call their scam !

[snipped the snipes]

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat

I prefer to work my way backwards on this one and I apologize but I will paraphrase each of Bela's FUDs for easier reading.

1. No transaction support. Corrupt or lost state if the TC server fails: I think Bela just doesn't know our product. Our system is indeed ACID-compliant. It is not JTA-compliant but those 2 are not the same thing. With Terracotta, if the primary server fails mid-transaction, your JVM would not get an ACK and thus auto-reestablishes link to the new primary server and flies the transaction again. The entire fail-over between Terracotta actives and passives is sub-second since passives are hot and are getting every transaction as those transactions occur.

Interestingly enough, JBossCache is based on peer-to-peer comms which means you need a quorum in order to commit a transaction and, as such, transactional applications will see massive performance issues. For what its worth, transactions and scale out do not mix if you ask most architects of large-scale systems so bringing up JTA / transactions is nothing more than FUD.

2. Centralized pessimistic locking is slow: Our product is benchmarked by many customers at 10X faster than PojoCache and 100X faster than Treecache. This happens in-part because our locking is nothing like what Bela thinks. In fact, our locking is guaranteed in-order across the cluster with all sorts of performance optimizations (including optimistic ones as well). My favorite optimization is greedy locking. Greedy locks are only efficient in a centralized cluster server model such as ours. Our server tracks which threads and which JVMs use locks. Our server lets one node keep a lock without coming back to the network or our server for that lock (no matter how much the lock is acquired and released). Our server can then ask a greedy thread to give back its lock on next lock release. So all JVMs can indeed come to the server, but under greedy locks our server checks out locks to JVMs as if it were a librarian. JVMs can take the lock home, just like you can take a library book home. And when you return it, the librarian consults the list of waiters to hand out the lock again. The librarian might call your house and say, "I need that book back" at any time, however.

3. One server is a SPoB and SPoF: strangely Bela contradicts himself here and admits that he knows we have backups so why call us SPoF in the first place? (This is the one that makes me smell FUD because he shows directly that he knows better.) Anyways, Terracotta has seen users push the 1GBit / sec mark on a single TC instance this year. Note that we push only deltas to memory so 1GBit usually represents 10K - 50K transactions per second. A couple of our competitors (not including JBoss) seem to also be able to get to the 1GBit mark but they are pushing more like 1K - 5K transactions per second at the same bandwidth mark because they are serializing object graphs!

Something Bela doesn't know because we have been somewhat quiet about it is that if TC becomes the bottleneck our users can go to multiple active TC instances. So the single point of bottleneck claim is just wrong. In fact, the chief architect of a telco customer we worked with this year looked at the JBossCache versus Terracotta performance comparison and turned to me saying, "I expect no less from your central server. A peer-to-peer system has no chance of keeping up with a well-tuned server. It's like a super-fast easy to control single peer to everyone...no lookups and votes and quorums. Far less overhead." In that comparison, we delivered order-of-magnitude 1000 HTTP requests / sec per JVM compared to TreeCache at 10 requests / second and PojoCache at 100 requests / second. Both JBoss and Terracotta helped tune the application. We also scaled linearly as far as the customer tested which was not true of JBoss.

That leaves us with SPoF which Bela himself acknowledges is wrong. I have also mentioned in #1 above how we are ACID and transactional and, how the passive becomes the active and all JVMs transparently fail over sub-second. Most of our users eventually test "pull the plug" behaviors both for our TC server instance as well as for their own application instances. The customer I just mentioned tested this as well. The JBoss cluster never served another request after just one random node was yanked from the cluster. TC kept running w/ no errors reported by the load testing system and only a small slow-down for a couple of seconds while TCP connections were reestablished.

4. JBoss doesn't see TC in accounts: we have done head-to-head bake-offs in accounts and JBoss lost every one that I am aware of. In fact, in that Telco deal they lost by 100X with TreeCache and by almost 10X even with PojoCache. Not to mention, JBoss told the customer that in order to go as fast as the cache had in testing, they had to hack it up and the version that was used in the bake off should _not_ be taken to production under any circumstances.

While I have been explicitly avoiding the battles and back-and-forth debates that seem to rage amongst certain Java vendors I feel this one attacks not just by saying "we are better than you" but by asserting things--inaccurate things--about our products.

--Ari

December 5, 2007

Have you heard? JIRA was clustered using Terracotta

SourceSense did this without our help. And, JIRA is closed-source software from Atlassian...pretty amazing actually. (I know one can get the source quite readily. I don't know if Sourcesense had it at hand to do the job.)

http://sbtourist.blogspot.com/2007/12/news-about-scarlet.html


My favorite part of this is that Atlassian clustered Confluence over a year ago on their own and their CEO presented the results at a few conferences. I would paraphrase his presentation as:

clustering is _really hard_ and impacts the application in significant ways. Don't undertake it lightly! But it is possible.

Here's a 3rd party summary of Mike's presentation from Javapolis, 2006.

I discussed this with Mike via his blog in the past, but with Terracotta, Lucene and Quartz are both clustered out of the box.

Lessons learned in my opinion are that our ability to cluster transparently is very important. It might have made Confluence clustering easier. Not sure. It definitely made JIRA clustering possible for developers outside Atlassian. Enough said.

--Ari


December 10, 2007

Terracotta, the book

I like this:
http://www.apress.com/book/view/1590599861

you can add yourself to a mailing list where I will be keeping interested folks up to date regarding the book. The list is bookupdate AT terracottatech DOT com.

--Ari

December 13, 2007

Javapolis 2007 Run Down (from Terracotta's PoV)

Man,

what a week. I just got back from beers hosted by Google and Atlassian. I spent at least 2 hours talking to Mike Cannon-Brookes, Ross Mason, and Dion Almaer (I guess I am name dropping now...ick). Very smart folks. And they were just a sample of the high caliber users at the Javapolis conference this year in Antwerp.

I wanted to get to so many talks but ended up pinned down at Terracotta's booth for about 12 hours a day all week designing applications with developer after developer. I was surprised how many are coming to us planning to use Terracotta and just having a few questions. I would say the ratio was 1:1 people who had never heard of us and people who had.

At the end of the conference, I got quite good at explaining what we are:

"JVM-level clustering that helps you build apps that can spread out across many hardware nodes w/o custom APIs, serialization, etc. TC provides a simple pure-Java programming model, scalability _much_ faster that of replication-based clustering because it weaves itself into memory and pushes only deltas, and at the same time, TC provides database-like availability."

People got that. In 30 seconds flat, they got it. "So you mean I can just use threads and objects and not call APIs? No messaging?" "Are you telling me I don't have to keep scaling my database? I can just write objects in memory and they won't get lost?" Things like that. "Very cool. How much? Open source? Free? Wow. I will go try it this weekend."

Now, for the interesting tidbit. All the "hard problems" that TC users brought to me were solved by different means than they had expected. One developer had a mission critical system that needed to spread across 2 datacenters and could NEVER lose a transaction, otherwise the end customer could steal money from the business by reclaiming goods from 1 server in 1 datacenter and then, under a split-brain scenario, claiming the goods a 2nd time from a server in the other datacenter. We solved his WAN-based active-active need together in real time.

Another user needed to create a giant graph of all possible routes between thousands of locations world-wide. He then wanted to cache the routing table and serve queries against that table, but in a linearly scalable, scaled-out fashion, on 32 bit JVMs. We solved this one too.

The solution was always 1 of 2 things:
1. Master / Worker and some sort of divide and conquer
2. Partitioning with implicit locality and some sort of stateless load balancer sending traffic to the same node w.r.t. a particular data key

I guess the most popular use case was HTTP session replication, but that one I tend to forget because we answer the "Why TC instead of the app-server built-in option?" question with graphs, performance charts, and app server vendor partners who like to bundle us--like Geronimo.

I think I need to write a blog entry just about implicit locality. I still owe everyone a summary on performance and scalability in enterprise architectures. I will get to that next week (as the year slows down, I get more time to write here.)

Cheers,

--Ari

About December 2007

This page contains all entries posted to POJO Mojo in December 2007. They are listed from oldest to newest.

November 2007 is the previous archive.

January 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34