« Scalability vs. Correctness | Main | Tradition, Object Identity and DSO - Part 3 »

October 20, 2005

Developer vs. Runtime Responsibilities

posted by ari

Developers of enterprise software have both the best and worst job in the world. While they are prototyping or developing core business functionality, they live in a world of logic – a world they create by themselves where they are completely in charge of the rules. This dream is shattered, however, when they have to deal with the underlying infrastructure services that are necessary for their applications to perform well or scale beyond their workstation – services like caching, or clustering the app to run on more than one server. The promise of managed runtimes like Smalltalk, Java, or C# - that the runtime will take care of the underlying plumbing – has not yet been fulfilled. Ask any developer how much of his or her code is taken up with infrastructure code (typically 20-40%!) and it’s obvious that the “managed runtimes” have a ways to go before delivering their utopian ideal.

When Java was first introduced, many developers were reluctant to completely trust the JVM to manage memory – believing it impossible for a system to understand the memory requirements better than the developer. Even today, it is common to see people calling System.gc() in their code, even though modern JVM’s are far more efficient at managing garbage collection cycles than any human can possibly be.

This is not at all to say that the developers of modern JVM’s are better developers than any others – the main point here is that the runtime has far more information about how an application is behaving when it counts – during production - than the original developer can hope to predict. This is the main reason why the JIT compilers in modern JVM’s (like JRockit, for example) end up producing more efficient and faster code than static compilers. Check out what Martin Fowler says on his blog "While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said 'only 12 times faster than C means you haven't started optimizing'."

Other than memory management, Java also provides services that make developers’ lives easier, like thread management, synchronization, and of course automatic compilation and optimizations for various underlying CPU/OS combinations.

The idea here was to focus developers on the things they really need to focus on – namely development of business logic. The Java runtime was supposed to handle everything else automatically and extremely efficiently. Separation of business logic from infrastructure services was and is the ideal.

All of this is good as long as you stay in the confines of a single JVM. Once you venture beyond those walls, however, things start falling apart. Here are some of the common things that happen:

  • Communicate with a database: Nearly all Java applications need to interact with a relational database. While Java includes API’s to make this easier (JDBC), the impedance mismatch (differences in response times and storage paradigm) between the database and the app tier frequently make the database the chief bottleneck in most applications.
  • Sharing state between JVM’s for High Availability: A common example of this is HTTP session replication. Most modern J2EE application servers offer the option to cluster session state between servers, but this is far from transparent or cheap – users typically pay a 40-60% performance penalty for session replication (in addition to paying thousands of dollars more per CPU for the software licenses), and have to do unnatural acts that they wouldn’t necessarily have to do on a single node, like calling HttpSession.setAttribute() after every change in the session state.
  • Sharing data between JVM’s for clustering: If a Java application needs to scale beyond one node, object data needs to be shared between nodes. There are several common methods people use to share live object data between nodes, such as using JMS, or a clustered hashmap (like a JCache implementation) or using the database itself to shepherd data between nodes (exposing the app to the database bottleneck described above). All of these methods are far from transparent – they force the user to code explicitly for the clustered case, and in some cases force unnatural acts, like having to remember to put fields back into a clustered hashmap if they’ve changed, and signaling between nodes when data needs to be shared. None of these things are necessary when you are working in the confines of a single JVM.
  • Coordinating between JVM’s: In order for an app to scale seamlessly from one node to many, there must be a way for threads on one JVM to signal to threads in another JVM that it’s time to do something, like fire off a method, or that a lock has been released, or that data has changed, etc. In the confines of a single JVM, all of this happens naturally, without any intervention by the developer. The multi-node case is not that simple today.

What Terracotta offers is scalability using natural Java. Our HA-JDBC component provides an event-based database cache using natural JDBC APIs only – eliminating the impedance mismatch with the database while being completely transparent to the application. Our DSO component virtualizes multiple JVM’s and makes them look like one JVM to the application – completely transparently. All of the facilities and conveniences of a single JVM – like singletons, preservation of object identity, wait/notify, synchronization, etc. all become distributed. With DSO you can have a clustered singleton, objects that exist in a cluster without breaking object identity, distributed wait/notify, synchronization that works naturally across a cluster, etc. Our drop-in session replication modules (beta available today for WLS; will release for WAS, Tomcat & JBoss in January) enable high-performance HTTP session clustering with persistence without having to ever call SetAttribute(). The possibilities are mind-boggling – this truly will change the way Java developers write distributed applications. Give it a try, and tell us what you think.

Comments

Is there a case study which describes the performance improvements your solution enables compared to a traditional JDBC approach or the use of an in-memory database with cluster and back-end replication like TimesTen? Thank you. -Edwin

Posted by: Edwin Khodabakchian at November 2, 2005 07:58 PM

We are finishing up some case studies that illustrate our performance advantages as you suggest. We will be posting them on our site as soon as they are finished and approved. Feel free to contact me directly if you'd like a preview.

Thanks!

Bob

Posted by: Bob at November 2, 2005 08:03 PM

Post a comment




Remember Me?