« August 2008 | Main | November 2008 »

September 2008 Archives

September 5, 2008

Some APIs we have. Some we don't (yet).

I thought it might be worthwhile to write down all the APIs and frameworks we are bringing down the pike. I have the following in implementation through various OSS contributors right now:

1. Write-behind API to the database. This framework, to be hosted in our Forge, will provide a simple Map-based collection plus an interface for you to implement. The interface will be called to flush your objects to a DB at some point after you call map.put() on that object. The API will encapsulate all the models of write-behind including idempotent updates, non-idempotent updates, all with and without automatic retry.

2. Write-thru API to the database. This framework, again in the Forge, will provide a simple Map-based collection and the appropriate TC + JDBC transaction logic to get a object into cache AND the database or at least just the database, safely and maintaining the database's role as system of record.

3. WAN API that will allow 2 datacenters running 2 uncoupled Terracotta-based clusters to share data. Again, this will be behind a pluggable collection for storage. Objects will be configurable such that some business updates can be flagged as synchronous replication across datacenters, while others are async, and still others are mostly-sync meaning wait for a runtime adjustable timeout to get an ACK from the other datacenter.

What do you think? Do you need this APIs? Have you been wondering how to do these things with TC? Its all going to be pure POJO and open source, housed in our forge. So you can change the implementation as well as learn from it and apply it to your use case. I think that's exciting.

An important side note. A few people exclaimed to me that while this list is nice, they need a distributed cache API too. This surprises me because we have 2 already:

1. If you have a true caching requirement where data expires or ages in the cache and eventually needs to be evicted. Use EHCache on top of Terracotta. Don't write your own evictor, please. Ours is highly tuned to keep objects off of JVMs that don't need them.

2. If you have a pseudo-caching requirement where data expires _only_ on business event, like a user or application conversational state object that expires at the end of a process flow, then just use a ConcurrentHashMap. Don't worry about EHCache, or what have you in this case.

EHCache and ConcurrentHashMap are like our Master / Worker framework or HTTPSession interface. These are officially supported APIs and usages of Terracotta. The stuff I am working on now will join these existing solutions ASAP (read: next 60 days).

Cheers,

--Ari

September 9, 2008

Sun getting too fancy for their own good

Spent some time today staring at the Terracotta Server's performance on an 8-core Intel running Linux and EXT3 filesystem vs. Sun T1000 24-core running UFS.

Interesetingly enough the OS scheduler, and the HW seem to cause a perfect hell for multi-threaded apps like our server. We run much better on T1000 running multiple server instances in active / active mode where each server instance is small. But on Linux / Intel we can run one big honkin' server process and it just works out of the box.

That said, I am anxious to see what it takes to tune this 24-core silliness into rough performance equivalence to a simple 4 dual-core machine.

Stay tuned.

--Ari

September 22, 2008

Junk Throughput (how to get any tech to reach 1MM tps)

I just finished helping our sales team work through a POC with a big customer. The usual occurred in that the data structure to be shared was a Treemap with LinkedList at the leaf nodes; Terracotta clustered these structures fine whereas the Large[st] Software vendor's distributed cache needed everything to be flattened into maps. As an example, if you wanted 1 LinkedList item per minute up to 1 hour and on the 61st minute, push the oldest minute off the list and add the newest one to the other side of the list, in a map you would create maps with the Strings being "minute1", "minute2", "minute3" etc. You might keep an index "last minute in list" and then do a simple string operation like:


int index = indexmap.get("lastminuteinlist");
indexmap.put( "lastminuteinlist", index+1 );
String key = "minute" + index;
val = listmap.get( key );

Of course, you need some sort of transaction on the indexmap to get() and put() atomically. But this is all the "usual" headache with data grids and distributed caches.

What is far more interesting to me because it is a new learning for us and I think for everyone who reads this blog. The distributed cache / data grid vendor produced what, on first blush looked like a faster solution than Terracotta. Here's what the customer first observed.

1. Sun T1000 / 24-cores / 16GB RAM
2. Terracotta produced 3500 TPS
3. <OTHER> produced 7000 TPS

The customer needed 1400 TPS so both solutions were "good enough" but the customer wanted to understand where our claim of 10X had gone?!?

So, we started to break it down. Terracotta used 5% of the machine to produce 3500 TPS. We used a single TC Server instance and left it almost vanilla. The competitor, being a grid, chose to chop the T1000 into 20 JVMs. They used 100% of the box. So, right there we have the 10X. What do I mean? well, Terracotta used 5% of the machine to produce 50% of the transactions per unit of time. Assume that if Terracotta produced 100% of the transactions, it would use 10% (linear scale)...this makes Terracotta 10 times more efficient than the "in memory data grid."

Kewl.

"But not so fast," said the customer. Can Terracotta scale linearly? We chose to leave Terracotta in vanilla format and spread the load across 10 instances of TC just to see what we could do. The answer: 35,000 tps (in our lab). This satisfied the customer.

The story doesn't stop there. Terracotta was configured to run in persistent mode so all 3500 transactions were on disk. Terracotta was configured to run w/ a backup TC Server on a 2nd T1000 (in our lab). This means there were 2 copies ON DISK of all data. The competitor? All copies were in RAM on the same machine--localhost--so the network overhead was zero, and the HA was non-existent.

I made up this term I now call "junk throughput." If someone shows you 1MM TPS and says, "wow, look how fast I can go!" you should ask if the server died, what would happen? Or if the server GCed what would happen? And you should also not get fooled by these grids claiming massive amounts of transactions per second (TPS). Think about the transactions per server second--TPSS. In this case 7000 TPS from the data grid software divided amongst 20 JVMS == 350 TPSS where each of their grid instances should be thought of a server. Terracotta was doing 3500 TPSS.

I ask you to ask yourself this: if the transactions are not durable anywhere and are just hanging out in memory, and I have to flatten my domain model to use the thing, why pay $20K / cpu to run 20 copies of this thing at all? Didn't this "data grid" vendor just hand me big, expensive memcache but without the source?

And, since this has turned into a blog of suggested nomenclature and testing procedure, also make sure that whenever doing a bake-off you take both options you are testing to 100%. If you don't you haven't done the test right.

FWIW,

--Ari

September 23, 2008

Comcast beats Microsoft...Yay!

ThePlatform -- a Comcast company uses Terracotta inside its multimedia management and content delivery services.

I am very happy to see them winning deals in the market. Those guys are great and they deserve all the success. Go ThePlatform!

http://www.nytimes.com/aponline/business/AP-AP-Video-Network.html

--Ari

September 29, 2008

Cloud Computing: a little jocularity

So, I have been trying to follow everything and everyone talking about cloud computing. The good news IMO is that, unlike "grid computing" cloud seems more generally agreed upon.

Anyways, I started thinking about clouds just now and realized that George Lucas already covered everything there is to cover about clouds and the associated risks.

If I run my app in the cloud, I accept the following:

1. If Lando Calrissian is managing my cloud, he could totally deceive me as to what's going on.
2. My friends and I could land in the cloud only to find out security has been breached and Darth Vader has been hanging out in our app and hacking the computers to hide their presence.
3. I could end up hanging upside down waiting for my sister to "feel the force" and help me out.

In general, big cities in the cloud seem like they have been a risky idea ever since the 1970's, even in a galaxy far, far away.

Ah well. Might be fun to take all those risks. Long live the Cloud City! Long live Lando Calrissian! Long live Billy Dee:
MV5BMjE3NjcxMTA0OV5BMl5BanBnXkFtZTcwNDU4NjQxMQ%40%40._V1._SX94_SY140_.jpg

About September 2008

This page contains all entries posted to POJO Mojo in September 2008. They are listed from oldest to newest.

August 2008 is the previous archive.

November 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34