« SpringOne + Chicago Jug | Main | Terracotta Joins Software AG »

May 17, 2011

Terracotta vs Memcache

posted by ari

Here's a table of capabilities / features as I see them delivered by Ehcache + Terracotta Server Arrays vs. Memcache:

(sorry for the bad formatting. I will fix it later.)





feature Terracotta Memcache
look-aside read-mostly caching Terracotta does this just fine. Then again, so do most caches. the pattern looks like if(cache.get() == null ) then DAO.load() and cache.put(). A differentiator here is that Terracotta's cache will be 2-tier (L1+L2). It will automatically cache locally what you ask for and manage its own memory outside GC's view so you get the benefits of microsecond latency in-process caching without the GC pauses. Memcache is famous for this. It outdoes Terracotta at the moment on one dimension here: Memcache has a good getall() API which batches up lots of gets in 1 network call. Works well. Memcache is also famous for being tuning-free. This is an overstatement because different use cases with vastly different object sizes should be put in different Memcache instances on the same machine so there is lots of science to running Memcache over and above simply starting processes up. Lack of L1 is a real hindrance to low-latency caching but simultaneously means cache is free to grow w/o tuning implications.
read-thru / write-thru caching This one is more challenging for caches. The difference between look-aside and write-thru is that the cache will call back to user code if it is about to return null or if it is about to mutate data. Example: Ehcache self-populating cache will fire your callback before returning null to the calling thread. As for write-thru, Ehcache's cache writer allows you to take control of the writer thread and have it write to a DB or web service or what have you as part of the put() operation. Also, with JTA transactions--not that you should use JTA/XA when doing distributed caching--its possible to have auto rollback / cleanup of caches if a put() detects a failure in the SoR. To my knowledge this is basically impossible with Memcache. Everything you write to the cache is what I call "best efforts." If you want to keep Memcache in synch with a database, I have seen architecture at places like Facebook that seem like giant hacks to make this happen. I just spoke with an EC2 gaming company who said they do the following: read from cache, trust the cache value, and then when they go to write the DB, they quickly re-read the data in DB and throw a caching error back to the user. So, if Memcache thinks you have $10 in your wallet because it missed an $8 debit, you just check with the DB before debiting the wallet in Memcache. In short, I would say, this is not the point of Memcache but it can be worked around to some extent with classic code smells.
write-behind caching Write-behind caching in Terracotta is achieved again through the cache writer interface in Ehcache. The cache writer that allows you to hook in and write-thru to a system of record can be set into asynchronous mode. In this case, you register a callback that we will fire on the other side of a LinkedBlockingQueue at some time later. The asynch queue we write to is also capable of collation (remove old updates to the same key if they haven't flushed before a new update to that same key arrives) and several other kewl things. Note that in local-only (unclustered) Ehcache, the LBQ is not durable anywhere and can be lost in the case of a crash. With Terracotta clustering on, the LBQ is in the Terracotta Server Array and is guaranteed to not be lost. You also have to deal with idempotency since we guarantee not to remove items from the LBQ till they successfully complete w/o throwing an exception (at least once semantics as opposed to the slower "once and only once" guarantee we could try to provide). Again, not sure this is worth trying with Memcache. You could wire up a similar architecture to Ehcache / Terracotta by using a JMS queue and coding up your own writer-node listening on a topic coming from all the other JVMs in your cluster. Could work pretty well, I guess.
Cache timeouts In Ehcache, there is a mode named Nonstop you can turn on for particular caches. Nonstop augments the normal get( element ), put( element ) API with a timeout. The timeout is actually a setting in the cache manger / cache definition. This means you can turn on and off Nonstop in config w/o actually going through your code looking for all cache call-sites and updating the timeouts manually. However, it means for a particular cache, you cannot have different operations or calls to the cache run with different timeouts. For that, you will have to instantiate different caches, each with its own cache-wide timeout. Note that you can have 1 cache with Nonstop mode and another without it--not sure you would want to do that though. Having one part of your app capable of timing out and another part block forever on data store updates will still lead to an app that hangs when the datastore is down. BTW, there are several automated timeout handlers built in to the cache. Your options are to fail silently or to throw exception on timeout. When failing silently, one option is to return NULL on get() which things like Hibernate and other ORMs will like--they will simply go to the DB. You can also choose to fallback to local-only caching when timeouts occur (trust my local cache until you can get back into the cluster). This is a core Memcache strength -- and a bit of an implementation artifact (something forced on you, the developer, because of the implementation of Memcache) in my opinion. Unless you get the Facebook multi-threaded Memcache fork, you are working with a single threaded Memcache server and client. BTW, ever wonder why Memcache never uses much CPU? Because it is single-threaded. Since Memcache is sequential and blocking in everything it does, all operations must include a timeout or the server / client will risk freezing up for some unknown reason. This means every call you make, be it a CASOperation, or a simple get or put or getall or putall or what have you, you must pass in a timeout. Works well, but doesn't have any builtin timeout handlers like Terracotta that auto-return what you want in case of timeout. All your code that calls the cache has to also deal with potential timeouts.
Composing atomic operations With Terracotta and Ehcache, you have choices. I guess that would be a theme across this entire analysis. Lots of choices (sometimes too many). In this case, you can define atomic operations as with or without the inclusion of rollback capabilities. If you want atomicity but don't need the added overhead of rollback, use explicit locking API in Ehcache / Terracotta toolkit (btw, you can get a simple cluster lock w/ Ehcache or Quartz in Terracotta...using the toolkit directly). The basic pattern is lock(); get(); get(); get(); put(); put(); put(); unlock(). All changes will be visible together after the lock gets released. That said, the cache will not rollback if an exception is thrown. You may get part of the data pushed because in Terracotta an exception doesn't tell us to cancel the already-completed put() calls. If you want rollback, use transactional Ehcache. There are also some powerful CAS operations in Ehcache such as putifabsent() and multi-arg replace( old-element, new-element). These will give you atomic operations where you really are mutating only 1 element in the cache but it would otherwise take 2 calls to the "regular" Ehcache API to get that task done. With Memcache, you have CASOperations and regular get/put. There are also locking APIs but since locking is not a first class citizen in Memcache but instead just an empty cache entry that your code spins trying to write to, I would avise against trusting them. My question to Memcache folks who insist on locking and composing operations across multiple cache calls: what if you are granted a lock, in a critical memory section, and your JVM garbage collects (assuming a Java app, of course)? Your JVM will pick up right where it left off before the GC, thinking it still has the lock only to find out later another JVM was granted the lock when it timed out. Is this the end of the line for Memcache? No, not really. You go back to double writing everything you put() back to a DB and doing a sanity check before commit (in some designs, you can do what I have described using a read-repair algorithm). Problem is you can never trust Memcache and that lack of trust permeates not just your code but in mission-critical apps, will actually end up getting exposed to your end users too. Users will see errors like those on Facebook where one minute a person you are searching for is there and the next minute they are gone. For facebook, that's fine but for banking apps, not so much. BTW, I am not suggesting Facebook's use of Memcache leads to the aforementioned behavior...I don't know their architecture well enough to assert why stuff appears and disappears--I just know that it does.
Strong consistency With Ehcache, Consistency settings are built into the core and exposed programatically or via configuration. Strong consistency in Ehcache uses the Werner Voegels definition . I won't repeat it here. All I will add to his excellent papers and analysis on the topic is that with Ehcache running on top of a Terracotta Server Array, consistency is guaranteed across JVMs, across machines, across the network in a grid or cluster. with memcache, consistency is achieved in a simple fashion. Simple is always good but not always flexible. In this case, Spy client for memcache as well as the memcache server itself are both single-threaded and blocking. Operations are atomic and sequential. Flying the same set of operations at the cache twice in a row will always give the same result. This is good and simple. But it does not deal with locking at all. Given my previous discussion on locking and atomic operations, there really is no good way to guarantee that 2 nodes are not racing on data except for the CAS library. There again, the CAS library falls short because you are supposed to spin on a cache mutation till it succeeds. How many times should you loop, trying to update a cache entry while racing with another JVM? 10 times? 100 times? 10000? And what do you do when you reach the maximum iterations of your spin-logic? What do you tell the end user? And how long will you wait, spinning? In short, consistency is really not something you should worry about with Memcache. It can do atomicity in the central store but it cannot do much to protect against races up in the spy client across JVMs across the network. I see many people design around this by using a sticky load balancer and binding user traffic to an individual JVM. Then they can guarantee that the user will not be mutating cache entries across the cluster. This is good but it is not perfect in terms of user experience. What if that node goes down? And what if that node burps without going down (example: a garbage collection). The load balancer will inevitably send the same user to 2 nodes; it is just a matter of time.
Eventual consistency Eventual consistency is again stealing terminology from Werner Voegels. With Ehcache + Terracotta, you get monotonic reads, read your writes, and all that proper eventual stuff. It is way more than just being lock free or last-one-in-wins semantic. Eventual consistency is a valid mode of coordinating access and guaranteeing the state of data in a clustered store. Ehcache offers this setting which performs about an order of magnitude faster than strong consistency. It is set on a cache-by-cache basis in config or via API if constructing caches programatically not offered or possible in memcache
JTA / XA participation There are 3 modes but you can simplify it to 2. XA / JTA vs. Terracotta-only transactions. Updating multiple caches or elements with rollback? Use Terracotta-only transactions. They are faster. Updating the cache and another data store (queue or DB or what have you)? Use XA if you want. I personally prefer a 1.5 phased-commit which I have discussed in the past and will not cover here in detail. Anyways, you have all the capabilities you need to get the level of isolation you want and the level of simplicity you want without sacrificing ALL your performance. Not possible in Memcache
Durable-to-disk Ehcache can flush to disks on your local machine or it can persist the entire cache to disk. When Terracotta is present, the job of persistence pushes down to the Terracotta Server Array. Again, you can turn on disk if you choose. You can never fully turn off disk in Terracotta Servers because they are a 1st class data store. They do not want to go down for any reason other than operational maintenance. So, they will find a place on disk to spill over before they would ever OOME and exit. Memcache does not have disk persistence. For this, you have to go to membase, but that's an entirely different animal than Memcache--pretty much the polar opposite with stringent data guarantees and very little flexibility; no flexibility to enable you to avoid replication and persistence where you want to minimize overhead
Write 2 places If you want Ehcache to write to 2 places, use a Terracotta Server Array to back up Ehcache across the network, or use disk persistence. With Terracotta Server Arrays behind Ehcache, you get an Active and a Mirror and striping / partitioning / sharding as well. You should read up on Terracotta Arrays if you have never heard of them before, but I will explain in words here. Terracotta Arrays are pure Java software, running anywhere a JRE will run (pretty much). If you start one up, it will contain 100% of your data coming from your Ehcache nodes / Apps / JVMs across the network; in this base case it is like an NFS server. Start up 2 and they clone all your Ehcache data between them. They also heartbeat each other and automatically (sub-second timing) take over for each other on slow down or failure. Start up 4 and they will form 2 groups of 2 Terracotta nodes. The groups are called "stripes" (a.k.a. partitions) where each stripe holds 1/nth of the data and "mirror groups" where each mirror contains duplicate data. So 4 nodes would form 2 stripes each with 2 mirrors in it. 50% of the data is in each stripe and the stripes are each internally redundant. In essence, the mirror groups work to write all your Ehcache data to 2 places.

Writing 2 places can be achieved with unclusted / local-only Ehcache by turning on disk persistence. The two places you will be writing in this case are memory and disk so it is not off-process storage and is not really in the spirit of what you might be looking for w.r.t. this feature.

I have seen this delivered in memcache simply by instantiating 2 spymemcache clients. The basic construct is to write to a memcache you designate as primary and write to a secondary as well. Both put() calls will include a timeout and either one, or both could fail. Since you write 2 places, the chances are one will succeed. Always read from the first one though and consider it your memcache-of-record. Never check in both Memcaches and diff because they will often have different data. Only use the secondary memcache if you decide the 1st is down over an extended period, or if the 1st times out on a single call. Since you cannot write to 2 Memcache servers under an atomic operation safely, you will still want to guard yourself from a dirty cache using the DB re-read trick or hacking memcache to block certain operations.
Lock mgmt Maybe this one should have come sooner. We have discussed atomicity, transactionality (atomicity with rollback), Compare and Swap, and even timeout-based operations. All those topics are best understood when we understand the notion of first class locking. First class locking exists in many distributed caching platforms, including ours. "First class" means that you have an API called lock() and unlock() and not just get() and put(). But it means more than this. lock() and unlock() need to not be implemented as a cache.put() call. This is cheating. Writing a cache entry and then blocking others from writing that cache entry till you twiddle a bit in that cache entry marking yourself as releasing the logical lock doesn't work. This is just like reference counting garbage collectors. In reference counting, common expectation amongst developers is you will lose references in weird edge cases and races (such as a crashed JVM that will never release its locks). In pseudo-locking you will end up with similar races and lock leaks where caches corrupt or fail to mutate and then never unlock. In Ehcache + Terracotta Arrays, locks can be smart. They keep statistics about how often they are hopping around your Ehcache nodes and then the system starts to grant locks in a less and less fair mode while simultaneously guaranteeing to avoid lock starvation. You cannot implement this with simple / naive "cache entries as locks" model.

Further, locks need to be released when a JVM dies. Cache entries do not have the notion of being acquired or released. Locking in Terracotta is separate of caching and is tied straight to the communication layer. If locks are held without timeout, that's fine. You will not get stuck when a JVM crashes. the communication layer in the Terracotta Array automatically releases all locks on behalf of a JVM that dies or hangs (or GCs for too long).

We have explained earlier that Memcache uses spinning on cache entries to approximate locking. Locks in memcache cannot get stuck because they always timeout. Locks in memcache have the opposite problem of getting stuck...you never know if you actually have them exclusively or if another thread elsewhere is in a protected code segment alongside you, racing. In short, locking and Memcache are orthogonal and you should avoid mixing the two. I think almost every memcache-based app uses CAS-based locking though. I guess that is pretty scary."
Multi-tenancy Ehcache's Automated Resource Control is being designed into Terracotta 3.6 / Ehcache 2.5 and beyond to make multi-tenancy safe. Safe multi-tenancy centers around control of storage and network. Storage resources need to be protected on a usecase-by-usecase basis. I mean that if you have a memory pool of 10 GBs across a few machines, and you have 2 apps, each of which needs 5GBs of cache, it is important that at runtime one does not end up with 9.9GB and the other with .1GB. Resource Controls in Ehcache and Terracotta ensure that one cache can steal from another inside the same app to keep things running whereas apps cannot steal from each other so as to be safe. Automated Resource Control has a ways to go before it has _all_ the features we are planning and we plan to add to it over the next 3 or 4 releases after 3.6 but it is the only fencing (like Solaris Zones or mainframe lpars, etc.) in the caching space about which I am aware.

Separate of ARC, Terracotta Arrays can be dedicated per use case. This is the opposite of multi-tenancy in that each use case gets its own storage resources, dedicated to it, but such an approach will work just fine for most folks. Such an approach will also work for most distributed caches, including Memcache. The problem with this approach, separate of the inability to fully utilize your data store instances, is that this not cloud-friendly. If you could just start up a 25-node Terracotta array and point all your apps at it and grow the array as you go, adding use cases pointing to one static array, you would get the effect of Amazon EC2-style resources like EBS, RDS, or S3 which just seem to scale indefinitely. If you have to partition use cases to separate data stores, you add complexity and get less "blackbox" nature to your services in your datacenter.

Memcache will deliver multi-tenancy via manual partitioning. Putting multiple use cases into the same Memcache farm will yield all the challenges and problems that doing so with any non-fencing data store would yield. It is not impossible to do. It just is.
Multi-datacenter Ehcache has built-in WAN replication. There are several models of replication and replication is configured at a cache-level, not at a system-level. You can WAN replication cache A and not cache B. You can replicate cache A synchronously, meaning all writes to the cache block till all datacenters apply the update. You can replicate cache A asynchronously meaning you write locally and queue the change to be sent to other datacenters as soon as possible. When asynchronous, cache replication can automatically resolve conflicts to cache entries across datacenters or you can plug in your own conflict resolver that chooses which update to apply when 2 datacenters change the same entry at the same time.

Ehcache also has multi-datacenter replication technology so you can have 3, 4 or 10 datacenters replicating changes to each other, synchronously or asynchronously, on a cache-by-cache basis.

Memcache has no such facility. Some people write to a local and a remote memcache server sequentially to approximate WAN replication. Other people write cache changes to Memcache and then JMS to replicate asynchronously to another datacenter. WAN replication is possible but requires manual implementation (at least until someone implements JMS replication wrappers for spy client out on github or something like that).


Trackback Pings

TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/111