« Sun getting too fancy for their own good | Main | Comcast beats Microsoft...Yay! »
September 22, 2008
Junk Throughput (how to get any tech to reach 1MM tps)
posted by ari
I just finished helping our sales team work through a POC with a big customer. The usual occurred in that the data structure to be shared was a Treemap with LinkedList at the leaf nodes; Terracotta clustered these structures fine whereas the Large[st] Software vendor's distributed cache needed everything to be flattened into maps. As an example, if you wanted 1 LinkedList item per minute up to 1 hour and on the 61st minute, push the oldest minute off the list and add the newest one to the other side of the list, in a map you would create
int index = indexmap.get("lastminuteinlist");
indexmap.put( "lastminuteinlist", index+1 );
String key = "minute" + index;
val = listmap.get( key );
Of course, you need some sort of transaction on the indexmap to get() and put() atomically. But this is all the "usual" headache with data grids and distributed caches.
What is far more interesting to me because it is a new learning for us and I think for everyone who reads this blog. The distributed cache / data grid vendor produced what, on first blush looked like a faster solution than Terracotta. Here's what the customer first observed.
1. Sun T1000 / 24-cores / 16GB RAM
2. Terracotta produced 3500 TPS
3. <OTHER> produced 7000 TPS
The customer needed 1400 TPS so both solutions were "good enough" but the customer wanted to understand where our claim of 10X had gone?!?
So, we started to break it down. Terracotta used 5% of the machine to produce 3500 TPS. We used a single TC Server instance and left it almost vanilla. The competitor, being a grid, chose to chop the T1000 into 20 JVMs. They used 100% of the box. So, right there we have the 10X. What do I mean? well, Terracotta used 5% of the machine to produce 50% of the transactions per unit of time. Assume that if Terracotta produced 100% of the transactions, it would use 10% (linear scale)...this makes Terracotta 10 times more efficient than the "in memory data grid."
Kewl.
"But not so fast," said the customer. Can Terracotta scale linearly? We chose to leave Terracotta in vanilla format and spread the load across 10 instances of TC just to see what we could do. The answer: 35,000 tps (in our lab). This satisfied the customer.
The story doesn't stop there. Terracotta was configured to run in persistent mode so all 3500 transactions were on disk. Terracotta was configured to run w/ a backup TC Server on a 2nd T1000 (in our lab). This means there were 2 copies ON DISK of all data. The competitor? All copies were in RAM on the same machine--localhost--so the network overhead was zero, and the HA was non-existent.
I made up this term I now call "junk throughput." If someone shows you 1MM TPS and says, "wow, look how fast I can go!" you should ask if the server died, what would happen? Or if the server GCed what would happen? And you should also not get fooled by these grids claiming massive amounts of transactions per second (TPS). Think about the transactions per server second--TPSS. In this case 7000 TPS from the data grid software divided amongst 20 JVMS == 350 TPSS where each of their grid instances should be thought of a server. Terracotta was doing 3500 TPSS.
I ask you to ask yourself this: if the transactions are not durable anywhere and are just hanging out in memory, and I have to flatten my domain model to use the thing, why pay $20K / cpu to run 20 copies of this thing at all? Didn't this "data grid" vendor just hand me big, expensive memcache but without the source?
And, since this has turned into a blog of suggested nomenclature and testing procedure, also make sure that whenever doing a bake-off you take both options you are testing to 100%. If you don't you haven't done the test right.
FWIW,
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/70
Comments
Oh yes - in many head-to-head performance bake-offs I have had to break down the numbers into what they "really" mean. It's easy for a vendor to run a test which hides the real throughput behind an impressive-looking number, and it's kinda fun to show a customer the truth :)
Posted by: David at September 22, 2008 6:30 PM
---
"But not so fast," said the customer. Can Terracotta scale linearly? We chose to leave Terracotta in vanilla format and spread the load across 10 instances of TC just to see what we could do. The answer: 35,000 tps (in our lab). This satisfied the customer.
---
When is the open source version going to let you spread load across more than one L2 server? With co-resident L1s or terracotta server array or whatever you call it.
Posted by: Anonymous at October 10, 2008 10:56 AM
Anonymous: good question. Right now we are working on supporting upwards of 50,000 concurrent users on our reference web app with just 8 (maybe 16 JVMs). If we can get to a previously unheard of level of efficiency and throughput per TC Server instance (meaning super-small app cluster for the throughput it is producing relative to competitive solutions all without needing to tune anything), we might leave things the way they are. If not, we will revisit.
Feel free to contact me via email. I want to understand your requirements (even if you want to remain anonymous w.r.t what company you work for). It will help us in the decision making process.
Posted by: ARI ZILKA at October 16, 2008 8:14 PM