« Atlassian Crowd Clustered with Terracotta | Main | 2-tier coherent clustering »

February 16, 2009

Offloading a DB even when updates need to be transactional

posted by ari

We have a lot of active deployments where people need to integrate Terracotta into a use case where the goal is to offload the database. Transactional updates between Terracotta, working as a cache, and the database would seemingly slow the cache updates down to the speed of the database. This line of reasoning is not only common but quite logical. However, it's not true at all. There are two ways that using Terracotta as a cache in front of the database can offload that database, and offload it significantly. The first mechanism for offloading the database and the most common is using Terracotta in write-behind mode. The second is to use Terracotta as a coherent read cache writing through to the database. In both cases, the database can either be made smaller or can handle many more transactions per unit time.

When writing behind to the database, Terracotta's integration module for asynchronous updates, (Async TIM) provides a simple, scalable pure Java way to offloading I/O. The Async TIM provides an API for flagging objects in any collection as "dirty" and needing update to any backing store. The module guarantees that objects dirtied by a particular JVM do not migrate across JVMs in the process of getting flushed to the DB. This means that each JVM is working with only the changes it makes--all in a highly localized fashion. The Async TIM thus provides a mechanism for flushing data that changes from application JVMs back to the database; a way to flush those changes at a rate the database can support, keeping data in the Terracotta array until it reaches the db; and a way to make sure the entire system is reentrant and that no data or events will be lost or duplicated between the cache and the database, even in the case of an entire cluster restart. Async TIM's work queues are all Java queues running on top of Terracotta so the queues are durable, transactional data backed transparently by the Terracotta server array. A large online gaming site and a social networking customer both use Async TIM to write fast changing information directly to the cache. The online gaming site writes odds and game data, as well as all bets being placed by end users into the DB for eventual settlement / payment. The social networking site writes highly contented inventory information to a central cache of product availability. Both have decreased their database from an Oracle instance running on Sun hardware from >12 CPUs down to 4, which means a really nice savings in Oracle licenses.

When writing through to the database, Terracotta's configuration-based locking semantics layered on top of a pure Java collections prove sufficient for our customers in offloading excessive I/O. Start with a simple HashMap. Execute all reads from the cache and all writes straight to the database. In order to execute a transactional update with the DB, use a Terracotta synchronous lock to flag the cache as stale, cluster-wide. Then the database write is done in a normal, fast Terracotta asynchronous lock (using just Java synchronization but flagging the object locks as asynchronous at runtime), finishing with reflagging the cache as clean. We at Terracotta call this a 1.5 phase commit. Whenever reading from cache, we are guaranteed that the database is the master of all data and that the cache is subservient because the cache will not be valid until set clean, and it cannot be clean unless the database and cache got updated without error. A major telco uses Terracotta collections with a 1.5 phase commit locking scheme in order to offload their database, increasing the capacity of their existing cluster by 7.5X. At time when you want to do more with less, that's a good return.

As you can see, Terracotta helps both asynchronous write-behind users and synchronous write-through users offload the DB quite significantly. In one case, customers use durable, transactional POJO queues to write to the DB in a throttled manner. In the other, customers use coherent cluster-wide read caching and a 1.5 phase locking scheme to all but eliminate read traffic to the database.

Trackback Pings

TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/82

Comments

another great article!

since a picture is often worth a thousand words, i would love to see Sequence Diagrams for these kinds of multi-system interactions

Posted by: bob Pasker at February 17, 2009 1:14 PM

"since a picture is often worth a thousand words, i would love to see Sequence Diagrams for these kinds of multi-system interactions"

+1

Posted by: anjan bacchu at February 18, 2009 12:40 AM

Ari,

This is great stuff - a good high level overview of what we're looking at doing with Terracotta. Do you have any code available that provides an example of this? I don't remember the examinator reference app doing anything quite like this.

Another dilemma we face with Terracotta is when we change the structure of our shared objects. To my knowledge Terracotta doesn't offer us the ability to "clean" just one shared root, or objects of a certain type. Generally we have resorted to simply deleting the terracotta data directory and starting over. But as we move more mission critical data into Terracotta (especially storing write-behinds using tim-async), this option is not on the table any more.

Posted by: Zach Bailey at February 21, 2009 11:38 AM

Bob: Will do...

Zach: (a) the 1.5PC with synch-write locks is not in Examinator. It uses TIM-Async to get the job done. I will likely write up the pattern. (b) the roots, in certain cases could be cleared by writing a Java app that zeros out the root. For example, clearing a hashmap.

--Ari

Posted by: Ari Zilka at February 23, 2009 8:02 PM

Write-behind would be wonderful for performance, but how do you handle db transaction errors? What if someone places a bet (in your real-life example), and the bet fails to commit? How do we recover from that? Was the user originally told (when the bet was marked dirty) that the bet was successfully placed (and committed)? Or is the user told that the bet is “posted” or something else that implies that the bet is not truly committed yet?

If the application considers the objected committed once it is marked dirty (and/or placed on the write-behind queue), then I assume that there could be potential consistency issues with write-behind, where one box has the bet placed (on the queue, but not in the db), but the other boxes do not see the bet placed?

And even on the box that placed the bet, if the db is queried with a projection that includes the bet (rather than the Bet domain object; eg. Some report of all bets, joined in with the user and their account status), then I assume the query will return inconsistent results (ie. The report of bets will not include the bet on the write-behind queue)?

Not that any of this is an argument against write-behind; I just want to understand the use cases where this can be applied safely.

Thank you.

Posted by: Marko at March 16, 2009 12:59 PM

Post a comment




Remember Me?

(you may use HTML tags for style)