« The World's Best Grid for POxOs | Main | Network Attached Memory: concurrency and performance tuning »
June 26, 2007
Measuring Terracotta Latency can be tricky...
posted by ari
So,
A HUGE customer asked today, "what is Terracotta's latency for a simple update at a fine-grained level." Of course, I wish he had asked the question in such simple terms but this is how the question boiled down, at least. This is a very interesting question. I have told folks for a while now that latency with Terracotta is quite different than latency with clustering tools because Terraocotta is not clustering, but is in fact HA / scalability infrastructure for Java apps. But let's avoid marketing hype. Here's the basic logic behind my assertion (the results of the actual test follow directly after I explain the concepts).
Terracotta is first and foremost, TCP-based Network Attached Memory (NAM) which means there is no "n-1 ACK" problem where a cluster of servers must ACK the transaction. The Terracotta server is the only node that needs to know about the change. Secondly, Terracotta works like network attached memory which does what it implies in that it works with Java heap at the byte level. So it is only moving the data that changes from the application's JVM to and from Terracotta. Latency, therefore, on a given network technology should be lower for Terracotta than any other HA / scalability solution, as long as Terracotta's internal overhead is kept in check. This is not to say Terracotta must be faster than anything else; more that it might be faster. So let's measure it now.
I started to make a very fine-grained change in a very tight loop. Right away, this begets a question: given the ethernet windowing size of 1460 bytes, would the smallest memory delta at 1-4 bytes, done many times quickly better test latency than more changes with a less tight loop? Put another way, since each change should logically end up going in its own TCP packet, does it really matter if I change a Boolean, an int, or a 20 character string? And, should I change that field with some rest / recovery time between updates or should I have a compact loop? After all, if I go as fast as Java can, am I testing the network, the CPU, or Terracotta? This is important in understanding Terracotta's performance. I decided to keep it simple for now and go with a double, a tight loop, and Terracotta running on the same server as my application. We will just have to run the other options later to better isolate our goal--computing Terracotta's latency.
Let's look at the code I wrote (it is from the inventory demo available in the standard Terracotta kit):
private void testLatencyWithPrice() {
Product p = null;
{
String s = "1GFR";
p = (Product) store.inventory.get(s);
if (p == null) {
out.print("[ERR] No such product with SKU '" + s + "'\n");
return;
}
}
double d = 0.99;
long latency;
for( int i = 0; i < 1000000; i++ ) {
latency = System.currentTimeMillis();
synchronized (p) {
p.setPrice(d);
}
latency = System.currentTimeMillis() - latency;
out.print( "Latency for price change: " + latency + "\n" );
}
}
This is in the file "Main.java" under $TC_HOME/samples/pojo/inventory/src/demo/inventory/Main.java. I then wired it up into the program so that I could call it directly from menu_main():
switch (input.charAt(0)) {
case 'I':
printInventory();
continue;
case 'Q':
return;
case 'D':
printDepartments();
continue;
case 'U':
updatePrice();
continue;
case 'H':
printHelp();
continue;
case 'L':
testLatencyWithPrice();
continue;
}
Here's what I found:
Now, we are pushing the price update about 3000 times per second through to Terracotta. And I confirmed that Locality of reference (I will explain this in another blog entry later just in case people are unsure what it is) is honored by running a 2nd and a 3rd inventory client and observed that the update throughput was unaffected. Anyways, back to the observation on latency.
I put a call to System.currentTimeMillis() in there which somewhat obviously proved too coarse grained to measure the latency of sending a single double to the Terracotta Server. The latency was always measured as zero with both the inventory app and the Terracotta Server running on my laptop (mackbook pro core duo @ 2GHz). I was CPU-bound, BTW (maxed out both cores) so I am not sure this is an accurate test. Nonetheless, I did 3000 updates per second, which means in any one millisecond, I did 3 Terracotta transactions or a latency of 333 microseconds. How did this super low latency occur? The answer is batching and windowing. Rest assured Terracotta did not send 3000 TCP packets as I naively assumed (would have amounted to 34Mbits per second). Terracotta's implementation batches up many changes in a single network call to the Terracotta server.
Well, the next step will be to take it to 2 computers and see if I get more or less throughput. If I get less throughput when Terracotta and my app are each running on different machines (on gigabit ethernet), then the loopback interface is the answer to the low latency.
If I get more throughput with Terracotta on its own linux server / commodity server, then the fact that I was CPU-bound suggests I under-estimated the Terracotta Server's latency and its latency is in fact lower. I would then need to go back to 1 machine but with a test that does not peg CPU prematurely. Granted I am backing in to latency via throughput testing, but it illustrates how tricky performance testing is and it also test latency under load which is far more important than one-off latency.
Stay tuned.
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/9