May 22, 2011

Terracotta Joins Software AG

posted by ari

Hello all,

I wanted to send a personal note thanking everyone in and around the Terracotta community for their years of hard work and dedication to our technology. Without you, there would be no Terracotta. I also wanted to give you a reasonably detailed preview of where we are headed as part of Software AG going forward.

As you have likely heard, Software AG has entered into a definitive agreement to acquire Terracotta. You are probably asking what this means for you and the open source community, what this means for customers, and what this means for the Java industry as a whole. While I do not have a crystal ball, I can speak to why Software AG and Terracotta joined forces and explain our vision for a bright future for both our users and Java.

First, if you are a community member, and have faithfully been using or just contemplating using any of our technology – from Ehcache, to Quartz, to Terracotta Server Arrays – rest assured, it gets better from here. We intend to continue our strategic investments in open source technology focused on improving performance and scale for your applications. We will continue to host the same web sites, services, and domains you are used to, and in fact will soon be increasing our efforts to make the community sites easier to use. Everything will continue as planned – or faster – on the open source front.

Second, if you are a customer, you will soon enjoy a much higher level of coverage from and access to the Terracotta team. With Software AG’s worldwide scale brought to bear, we can offer more onsite services, more support, and more guidance than ever before. Software AG’s revenues are above €1.1 billion per year. The company has over 3,000 services engineers worldwide, offices in 70 countries, and generally more resources for customers than Terracotta could provide in the near future at our current scale. That said, those resources are as high quality as you have come to expect from Terracotta. As a customer you will see all the power of Terracotta that you are used to on the product front, all the power of Software AG’s organization on the services and support front, and more rapid innovation to boot.

Third, and most critically, is what this means for the Java community at-large. Where are Software AG and Terracotta taking our technologies, products, and solutions in the future? One of our first stops on the product roadmap will be to integrate Terracotta’s family of solutions, including Ehcache, BigMemory, and Quartz Scheduler, with Software AG’s existing solutions. The good news for Software AG’s customers is that Terracotta’s snap-in performance and scale works equally well with packaged software such as WebMethods—enabling WebMethods customers to scale up with BigMemory and scale out with Ehcache—as it does for the direct integrations that many of our customers have done to-date. Software AG and Terracotta intend to bring a whole new level of cost-effective, massive scale to existing products in the portfolio.

We also intend to deliver more features faster for the Terracotta user community. Working to solve customer problems at Software AG’s scale will help us achieve another level of simplicity and performance. Our goal is to deliver petabyte-scale solutions.

And there’s another important benefit for customers from this acquisition: a better cloud. In addition to enhancing Software AG’s already-great products and accelerating Terracotta’s roadmap that much faster with more resources and more concentrated feedback, we also intend to embark on a new vision and strategy for cloud-enabled applications. Call it a stack, a platform, a service, or whatever you like. But rest assured, the enterprise application development community – whether you work in Java, .Net, or C++ – will soon have a powerful and exciting new option for building applications in the cloud. This new option will include the most scalable, widely used in-memory data store on the market today, Ehcache with BigMemory coupled to Terracotta Server Arrays. And it will be standards-based and open, to work with most application servers, development frameworks, and data stores. In fact, we plan to make it the most open platform on the market, with interoperability as a core value proposition.

I think this is a great time for Java. Lots of innovation is happening in data management technology as well as in operational runtime environments. Together, Software AG and Terracotta intend to take your app from local-only simple caching, to distributed in-memory solutions, and all the way to datacenter-wide or cloud-scale deployments on the order of petabytes. This cloud vision encompasses all Software AG’s products and Terracotta’s products far into the future.

Thank you so much for your support, your contributions, and your loyalty. I look forward to continuing to work with all of you, and I am especially excited to have your support for our newly united efforts as we go to that next level.
Onward and upward!

--Ari

Comments (4) | TrackBack (0) | Permalink

May 17, 2011

Terracotta vs Memcache

posted by ari

Here's a table of capabilities / features as I see them delivered by Ehcache + Terracotta Server Arrays vs. Memcache:

(sorry for the bad formatting. I will fix it later.)





feature Terracotta Memcache
look-aside read-mostly caching Terracotta does this just fine. Then again, so do most caches. the pattern looks like if(cache.get() == null ) then DAO.load() and cache.put(). A differentiator here is that Terracotta's cache will be 2-tier (L1+L2). It will automatically cache locally what you ask for and manage its own memory outside GC's view so you get the benefits of microsecond latency in-process caching without the GC pauses. Memcache is famous for this. It outdoes Terracotta at the moment on one dimension here: Memcache has a good getall() API which batches up lots of gets in 1 network call. Works well. Memcache is also famous for being tuning-free. This is an overstatement because different use cases with vastly different object sizes should be put in different Memcache instances on the same machine so there is lots of science to running Memcache over and above simply starting processes up. Lack of L1 is a real hindrance to low-latency caching but simultaneously means cache is free to grow w/o tuning implications.
read-thru / write-thru caching This one is more challenging for caches. The difference between look-aside and write-thru is that the cache will call back to user code if it is about to return null or if it is about to mutate data. Example: Ehcache self-populating cache will fire your callback before returning null to the calling thread. As for write-thru, Ehcache's cache writer allows you to take control of the writer thread and have it write to a DB or web service or what have you as part of the put() operation. Also, with JTA transactions--not that you should use JTA/XA when doing distributed caching--its possible to have auto rollback / cleanup of caches if a put() detects a failure in the SoR. To my knowledge this is basically impossible with Memcache. Everything you write to the cache is what I call "best efforts." If you want to keep Memcache in synch with a database, I have seen architecture at places like Facebook that seem like giant hacks to make this happen. I just spoke with an EC2 gaming company who said they do the following: read from cache, trust the cache value, and then when they go to write the DB, they quickly re-read the data in DB and throw a caching error back to the user. So, if Memcache thinks you have $10 in your wallet because it missed an $8 debit, you just check with the DB before debiting the wallet in Memcache. In short, I would say, this is not the point of Memcache but it can be worked around to some extent with classic code smells.
write-behind caching Write-behind caching in Terracotta is achieved again through the cache writer interface in Ehcache. The cache writer that allows you to hook in and write-thru to a system of record can be set into asynchronous mode. In this case, you register a callback that we will fire on the other side of a LinkedBlockingQueue at some time later. The asynch queue we write to is also capable of collation (remove old updates to the same key if they haven't flushed before a new update to that same key arrives) and several other kewl things. Note that in local-only (unclustered) Ehcache, the LBQ is not durable anywhere and can be lost in the case of a crash. With Terracotta clustering on, the LBQ is in the Terracotta Server Array and is guaranteed to not be lost. You also have to deal with idempotency since we guarantee not to remove items from the LBQ till they successfully complete w/o throwing an exception (at least once semantics as opposed to the slower "once and only once" guarantee we could try to provide). Again, not sure this is worth trying with Memcache. You could wire up a similar architecture to Ehcache / Terracotta by using a JMS queue and coding up your own writer-node listening on a topic coming from all the other JVMs in your cluster. Could work pretty well, I guess.
Cache timeouts In Ehcache, there is a mode named Nonstop you can turn on for particular caches. Nonstop augments the normal get( element ), put( element ) API with a timeout. The timeout is actually a setting in the cache manger / cache definition. This means you can turn on and off Nonstop in config w/o actually going through your code looking for all cache call-sites and updating the timeouts manually. However, it means for a particular cache, you cannot have different operations or calls to the cache run with different timeouts. For that, you will have to instantiate different caches, each with its own cache-wide timeout. Note that you can have 1 cache with Nonstop mode and another without it--not sure you would want to do that though. Having one part of your app capable of timing out and another part block forever on data store updates will still lead to an app that hangs when the datastore is down. BTW, there are several automated timeout handlers built in to the cache. Your options are to fail silently or to throw exception on timeout. When failing silently, one option is to return NULL on get() which things like Hibernate and other ORMs will like--they will simply go to the DB. You can also choose to fallback to local-only caching when timeouts occur (trust my local cache until you can get back into the cluster). This is a core Memcache strength -- and a bit of an implementation artifact (something forced on you, the developer, because of the implementation of Memcache) in my opinion. Unless you get the Facebook multi-threaded Memcache fork, you are working with a single threaded Memcache server and client. BTW, ever wonder why Memcache never uses much CPU? Because it is single-threaded. Since Memcache is sequential and blocking in everything it does, all operations must include a timeout or the server / client will risk freezing up for some unknown reason. This means every call you make, be it a CASOperation, or a simple get or put or getall or putall or what have you, you must pass in a timeout. Works well, but doesn't have any builtin timeout handlers like Terracotta that auto-return what you want in case of timeout. All your code that calls the cache has to also deal with potential timeouts.
Composing atomic operations With Terracotta and Ehcache, you have choices. I guess that would be a theme across this entire analysis. Lots of choices (sometimes too many). In this case, you can define atomic operations as with or without the inclusion of rollback capabilities. If you want atomicity but don't need the added overhead of rollback, use explicit locking API in Ehcache / Terracotta toolkit (btw, you can get a simple cluster lock w/ Ehcache or Quartz in Terracotta...using the toolkit directly). The basic pattern is lock(); get(); get(); get(); put(); put(); put(); unlock(). All changes will be visible together after the lock gets released. That said, the cache will not rollback if an exception is thrown. You may get part of the data pushed because in Terracotta an exception doesn't tell us to cancel the already-completed put() calls. If you want rollback, use transactional Ehcache. There are also some powerful CAS operations in Ehcache such as putifabsent() and multi-arg replace( old-element, new-element). These will give you atomic operations where you really are mutating only 1 element in the cache but it would otherwise take 2 calls to the "regular" Ehcache API to get that task done. With Memcache, you have CASOperations and regular get/put. There are also locking APIs but since locking is not a first class citizen in Memcache but instead just an empty cache entry that your code spins trying to write to, I would avise against trusting them. My question to Memcache folks who insist on locking and composing operations across multiple cache calls: what if you are granted a lock, in a critical memory section, and your JVM garbage collects (assuming a Java app, of course)? Your JVM will pick up right where it left off before the GC, thinking it still has the lock only to find out later another JVM was granted the lock when it timed out. Is this the end of the line for Memcache? No, not really. You go back to double writing everything you put() back to a DB and doing a sanity check before commit (in some designs, you can do what I have described using a read-repair algorithm). Problem is you can never trust Memcache and that lack of trust permeates not just your code but in mission-critical apps, will actually end up getting exposed to your end users too. Users will see errors like those on Facebook where one minute a person you are searching for is there and the next minute they are gone. For facebook, that's fine but for banking apps, not so much. BTW, I am not suggesting Facebook's use of Memcache leads to the aforementioned behavior...I don't know their architecture well enough to assert why stuff appears and disappears--I just know that it does.
Strong consistency With Ehcache, Consistency settings are built into the core and exposed programatically or via configuration. Strong consistency in Ehcache uses the Werner Voegels definition . I won't repeat it here. All I will add to his excellent papers and analysis on the topic is that with Ehcache running on top of a Terracotta Server Array, consistency is guaranteed across JVMs, across machines, across the network in a grid or cluster. with memcache, consistency is achieved in a simple fashion. Simple is always good but not always flexible. In this case, Spy client for memcache as well as the memcache server itself are both single-threaded and blocking. Operations are atomic and sequential. Flying the same set of operations at the cache twice in a row will always give the same result. This is good and simple. But it does not deal with locking at all. Given my previous discussion on locking and atomic operations, there really is no good way to guarantee that 2 nodes are not racing on data except for the CAS library. There again, the CAS library falls short because you are supposed to spin on a cache mutation till it succeeds. How many times should you loop, trying to update a cache entry while racing with another JVM? 10 times? 100 times? 10000? And what do you do when you reach the maximum iterations of your spin-logic? What do you tell the end user? And how long will you wait, spinning? In short, consistency is really not something you should worry about with Memcache. It can do atomicity in the central store but it cannot do much to protect against races up in the spy client across JVMs across the network. I see many people design around this by using a sticky load balancer and binding user traffic to an individual JVM. Then they can guarantee that the user will not be mutating cache entries across the cluster. This is good but it is not perfect in terms of user experience. What if that node goes down? And what if that node burps without going down (example: a garbage collection). The load balancer will inevitably send the same user to 2 nodes; it is just a matter of time.
Eventual consistency Eventual consistency is again stealing terminology from Werner Voegels. With Ehcache + Terracotta, you get monotonic reads, read your writes, and all that proper eventual stuff. It is way more than just being lock free or last-one-in-wins semantic. Eventual consistency is a valid mode of coordinating access and guaranteeing the state of data in a clustered store. Ehcache offers this setting which performs about an order of magnitude faster than strong consistency. It is set on a cache-by-cache basis in config or via API if constructing caches programatically not offered or possible in memcache
JTA / XA participation There are 3 modes but you can simplify it to 2. XA / JTA vs. Terracotta-only transactions. Updating multiple caches or elements with rollback? Use Terracotta-only transactions. They are faster. Updating the cache and another data store (queue or DB or what have you)? Use XA if you want. I personally prefer a 1.5 phased-commit which I have discussed in the past and will not cover here in detail. Anyways, you have all the capabilities you need to get the level of isolation you want and the level of simplicity you want without sacrificing ALL your performance. Not possible in Memcache
Durable-to-disk Ehcache can flush to disks on your local machine or it can persist the entire cache to disk. When Terracotta is present, the job of persistence pushes down to the Terracotta Server Array. Again, you can turn on disk if you choose. You can never fully turn off disk in Terracotta Servers because they are a 1st class data store. They do not want to go down for any reason other than operational maintenance. So, they will find a place on disk to spill over before they would ever OOME and exit. Memcache does not have disk persistence. For this, you have to go to membase, but that's an entirely different animal than Memcache--pretty much the polar opposite with stringent data guarantees and very little flexibility; no flexibility to enable you to avoid replication and persistence where you want to minimize overhead
Write 2 places If you want Ehcache to write to 2 places, use a Terracotta Server Array to back up Ehcache across the network, or use disk persistence. With Terracotta Server Arrays behind Ehcache, you get an Active and a Mirror and striping / partitioning / sharding as well. You should read up on Terracotta Arrays if you have never heard of them before, but I will explain in words here. Terracotta Arrays are pure Java software, running anywhere a JRE will run (pretty much). If you start one up, it will contain 100% of your data coming from your Ehcache nodes / Apps / JVMs across the network; in this base case it is like an NFS server. Start up 2 and they clone all your Ehcache data between them. They also heartbeat each other and automatically (sub-second timing) take over for each other on slow down or failure. Start up 4 and they will form 2 groups of 2 Terracotta nodes. The groups are called "stripes" (a.k.a. partitions) where each stripe holds 1/nth of the data and "mirror groups" where each mirror contains duplicate data. So 4 nodes would form 2 stripes each with 2 mirrors in it. 50% of the data is in each stripe and the stripes are each internally redundant. In essence, the mirror groups work to write all your Ehcache data to 2 places.

Writing 2 places can be achieved with unclusted / local-only Ehcache by turning on disk persistence. The two places you will be writing in this case are memory and disk so it is not off-process storage and is not really in the spirit of what you might be looking for w.r.t. this feature.

I have seen this delivered in memcache simply by instantiating 2 spymemcache clients. The basic construct is to write to a memcache you designate as primary and write to a secondary as well. Both put() calls will include a timeout and either one, or both could fail. Since you write 2 places, the chances are one will succeed. Always read from the first one though and consider it your memcache-of-record. Never check in both Memcaches and diff because they will often have different data. Only use the secondary memcache if you decide the 1st is down over an extended period, or if the 1st times out on a single call. Since you cannot write to 2 Memcache servers under an atomic operation safely, you will still want to guard yourself from a dirty cache using the DB re-read trick or hacking memcache to block certain operations.
Lock mgmt Maybe this one should have come sooner. We have discussed atomicity, transactionality (atomicity with rollback), Compare and Swap, and even timeout-based operations. All those topics are best understood when we understand the notion of first class locking. First class locking exists in many distributed caching platforms, including ours. "First class" means that you have an API called lock() and unlock() and not just get() and put(). But it means more than this. lock() and unlock() need to not be implemented as a cache.put() call. This is cheating. Writing a cache entry and then blocking others from writing that cache entry till you twiddle a bit in that cache entry marking yourself as releasing the logical lock doesn't work. This is just like reference counting garbage collectors. In reference counting, common expectation amongst developers is you will lose references in weird edge cases and races (such as a crashed JVM that will never release its locks). In pseudo-locking you will end up with similar races and lock leaks where caches corrupt or fail to mutate and then never unlock. In Ehcache + Terracotta Arrays, locks can be smart. They keep statistics about how often they are hopping around your Ehcache nodes and then the system starts to grant locks in a less and less fair mode while simultaneously guaranteeing to avoid lock starvation. You cannot implement this with simple / naive "cache entries as locks" model.

Further, locks need to be released when a JVM dies. Cache entries do not have the notion of being acquired or released. Locking in Terracotta is separate of caching and is tied straight to the communication layer. If locks are held without timeout, that's fine. You will not get stuck when a JVM crashes. the communication layer in the Terracotta Array automatically releases all locks on behalf of a JVM that dies or hangs (or GCs for too long).

We have explained earlier that Memcache uses spinning on cache entries to approximate locking. Locks in memcache cannot get stuck because they always timeout. Locks in memcache have the opposite problem of getting stuck...you never know if you actually have them exclusively or if another thread elsewhere is in a protected code segment alongside you, racing. In short, locking and Memcache are orthogonal and you should avoid mixing the two. I think almost every memcache-based app uses CAS-based locking though. I guess that is pretty scary."
Multi-tenancy Ehcache's Automated Resource Control is being designed into Terracotta 3.6 / Ehcache 2.5 and beyond to make multi-tenancy safe. Safe multi-tenancy centers around control of storage and network. Storage resources need to be protected on a usecase-by-usecase basis. I mean that if you have a memory pool of 10 GBs across a few machines, and you have 2 apps, each of which needs 5GBs of cache, it is important that at runtime one does not end up with 9.9GB and the other with .1GB. Resource Controls in Ehcache and Terracotta ensure that one cache can steal from another inside the same app to keep things running whereas apps cannot steal from each other so as to be safe. Automated Resource Control has a ways to go before it has _all_ the features we are planning and we plan to add to it over the next 3 or 4 releases after 3.6 but it is the only fencing (like Solaris Zones or mainframe lpars, etc.) in the caching space about which I am aware.

Separate of ARC, Terracotta Arrays can be dedicated per use case. This is the opposite of multi-tenancy in that each use case gets its own storage resources, dedicated to it, but such an approach will work just fine for most folks. Such an approach will also work for most distributed caches, including Memcache. The problem with this approach, separate of the inability to fully utilize your data store instances, is that this not cloud-friendly. If you could just start up a 25-node Terracotta array and point all your apps at it and grow the array as you go, adding use cases pointing to one static array, you would get the effect of Amazon EC2-style resources like EBS, RDS, or S3 which just seem to scale indefinitely. If you have to partition use cases to separate data stores, you add complexity and get less "blackbox" nature to your services in your datacenter.

Memcache will deliver multi-tenancy via manual partitioning. Putting multiple use cases into the same Memcache farm will yield all the challenges and problems that doing so with any non-fencing data store would yield. It is not impossible to do. It just is.
Multi-datacenter Ehcache has built-in WAN replication. There are several models of replication and replication is configured at a cache-level, not at a system-level. You can WAN replication cache A and not cache B. You can replicate cache A synchronously, meaning all writes to the cache block till all datacenters apply the update. You can replicate cache A asynchronously meaning you write locally and queue the change to be sent to other datacenters as soon as possible. When asynchronous, cache replication can automatically resolve conflicts to cache entries across datacenters or you can plug in your own conflict resolver that chooses which update to apply when 2 datacenters change the same entry at the same time.

Ehcache also has multi-datacenter replication technology so you can have 3, 4 or 10 datacenters replicating changes to each other, synchronously or asynchronously, on a cache-by-cache basis.

Memcache has no such facility. Some people write to a local and a remote memcache server sequentially to approximate WAN replication. Other people write cache changes to Memcache and then JMS to replicate asynchronously to another datacenter. WAN replication is possible but requires manual implementation (at least until someone implements JMS replication wrappers for spy client out on github or something like that).


| TrackBack (0) | Permalink

October 16, 2010

SpringOne + Chicago Jug

posted by ari

I am in Chicago this coming week for SpringOne. I have a talk there about Big Memory--learn how we got 350GB pauseless JVMs running over 500K tps each.

I am also speaking at the Chicago JUG. More of a primer on Ehcache and all the pieces and parts and how to use them best.

Would love to see you there :)

| TrackBack (0) | Permalink

September 16, 2010

Pedal to the Metal. How Car Town Grew to One Million Users in Days on EC2

posted by ari

Jeff Barr, the Car Town team, and I are doing a joint webinar. I think you might find it valuable...so here's the description and the link.

October 13, 11:00am PST
Ever wondered how an application can scale from zero to millions of users in a matter of weeks? Cie Games, the makers of Car Town, will tell you how they did it running on Amazon EC2 and using Java and Enterprise Ehcache to scale on-demand. Cie Games' lead-level engineers will share their experience for growing Java apps running on EC2. The discussion will cover RDS, EBS, S3, ELB, Cloudfront, Ehcache, Terracotta, and how to choose between and optimize your use of these powerful scale-out technologies.


The following link will take you straight to the event registration page. There are other events there too if you just go to terracotta.webex.com.

https://terracotta.webex.com

Cheers,

--Ari

| TrackBack (0) | Permalink

September 15, 2010

BigMemory Explained a bit...

posted by ari

Ok. Its important to explain why we did what we did relative to market need and other vendors' state of the art.

First you need some context. BigMemory is designed to provide what we call a tiered storage engine. Tierd storage looks like this:

What it implies is a system that automatically and transparently works behind the cache to keep data as close to the app as possible. You can have a cache with 350GB of data in it, keeping the most frequently used (calculated at runtime) portion in memory with read / write latency under 1 microsecond. You can keep the remaining 350GB in memory but a bit further away from Java heap, and hidden from the Garbage Collector so that it never causes a pause in the JVM while it sits there resident, and that 350GB can be accessed at 100 microseconds. You have seen this before in the hardware. In that sense it is not new. On-chip cache is faster than RAM and on-chip has multiple levels (L1, L2, L3). And, the caches transparently pull and push data from RAM to the core that needs the data. It works.

In Ehcache there is the disk which you can use to snapshot your 350GB cache locally and avoid having to re-warm the cache if you restart the JVM (with BigMemory running inside it). So BigMemory is in-RAM. It is in-process with the JVM. And it is hidden from the garbage collector--won't be collected or examined at all. Several folks seemed to miss that and a few even asserted that BigMemory writes to disk. We shall ignore their incorrect assumptions and move on. BigMemory is in RAM, in-process and 100% pure Java.

Just remember. if you want 1GB or 10GB or 100GB of BigMemory for Ehcache, what you do is set -Xmx and -Xms to, say, 500MB and then in Ehcache.xml you ask for a 1, 10, or 100GB cache. The operating system process will report as being 1G + 500MB or 10G+500M or 100G + 500M meaning as large as 101GB of physical memory being consumed. But the JVM will only be GCing 500MB! That's right. At 500MB you can pretty much not worry about GC tuning anymore, right? Well, almost. You could have other memory problems in your app other than the data inside Ehcache...I understand that.

Next, you need to remember that BigMemory is a product from Terracotta, just like our server array or Ehcache. BigMemory is a plug-in to everything else we do. In other words, you can get Ehcache with BigMemory plugged in. You can get your Terracotta server array with BigMemory plugged in. And, maybe soon, you can get HTTP session replication with BigMemory--imagine your session store inside your container not bothering your garbage collector at all. Wouldn't that be nice? So do you need to cluster via Terracotta to get BigMemory? No.

As I explained last night to the Java User Group, you could have a 1TB dataset you want to cache and run 1 JVM with a 1TB BigMemory store or 1000 JVMs each with a 1GB BigMemoryStore (still hiding the cache from the collector), or you can run anywhere in between. This is important. I was talking to a huge Telecom company last week and leaked them the news about BigMemory. They have 48GB heaps right now and talked to one of the datagrid vendors and to us about fixing their pause times. The machines they use are Sun X4800's with 256GB of RAM. With 48GB heaps, this Telecom user was seeing Java pause times over 5 minutes. The datagrid guys said to rewrite the entire app to use their grid. Then, run 80 JVMs at 2GB each after tuning for 2 weeks.

We came in and suggested offloading the 48GB of index data into Ehcache on 1 JVM. The customer was intrigued. Surely 1 JVM is better than 80 from a management perspective, they thought. Then they said, "I don't want 1 JVM...its a single point of failure." So we offered them 10GB JVMs...2 of them running Ehcahe with the 48GB dataset stored in a smallish Terracotta array using BigMemory to keep the data all in RAM and faulting in only 10GB of the index into any Ehcache node at any point in time. They went this route.

So now we are on the same page. BigMemory is in-memory and just a hair slower than Java heap. Its in process with the JVM so there is no management complexity and it is pure Java so there is no deployment complexity or sensitivity to JVM version, etc. No tricks. And no significant restrictions--save the fact that it works behind Ehcache, not all Java objects in the heap. Put another way, if it is not in cache, we cannot help yet.

What value is all this?

Interestingly enough, when I talk to Java User Groups, not all people know about 64-bit JVMs and the nightmare these things can cause. Contrary to what other vendors are claiming, most shops such as Unibet, PartyPoker, Expedia, Sabre Holdings, Intercontinental Hotels Group, JP Morgan, Goldman, and more will tell you a 64-bit JVM pauses unpredictably and for minutes at a time. even when a 64 bit JVM is small (<2GB) it takes 30+% more RAM than a 32-bit equivalent JVM running under the same app.

The pause times (stop-the-world) can be minutes and we have even seen hours in some cases. So while our Dell and HP and Cisco UCS machines are shipping with upwards of 16GB of RAM, VMWare and Xen are offering us to chop that ram into 4 or 6 virtual machines where we run smallish 2GB instances of our apps. Its a vicious cycle where the HW is now outpacing the JVM and can provide way more RAM to us than we can effectively use. So we virtualize and run many copies which adds complexity. So we buy more HW to run management, elastic, and other services as we scale out. More HW more HW more HW, all because Java cannot use the RAM that comes in the boxes we can buy off the shelf!

So, the value of BigMemory is:

1. Simplicity...write to Ehcache and flip on the BigMemory switch. It will take care of hiding all your cache data from the GC on its own.

2. Minimal GC tuning...if you have ever tuned GC, you know it is a black art. And the CPU has only a fixed amount of cycles in 1 second with which to do work. The more time you spend in GC, the less you spend on your app logic. People tune and tune and tune for weeks or months to get rid of GC pauses. Last night, I heard from our team that a customer struggling with GC issues for 6 months dropped in BigMemory alpha and the pauses went away. Of course this was an app with a 6GB heap and 4GB+ of that was cached data, but still, the point is there.

3. Density...only with EHcache on BigMemory can you chose to run 1 JVM @ 1TB of cache in RAM or 1000 at 1GB each, all the while not having to tune GC. This means you can buy 32GB of RAM per box and use all that ram with only 1 JVM running on the OS! That's density. That's operational simplicity. That's a capability you haven't had before.

4. Clustered or unclustered. Terracotta--a scaling company--now lets you scale up and out as you see fit. You should no longer wait till you have a big multi-JVM cluster to call upon our technology. If you have Ehcache (as most of you do), you can snap this in and just scale your JVM up until the point of breaking and then scale it out as well. Don't worry, our server array uses BigMemory too which means our cluster won't need any GC tuning, just like your app. Randy Shoup -- distinguished engineer at EBay and all around smartie -- loves to tell people to always weight the trade-offs and do a proper cost analysis of any architectural decision. Don't just cluster because you can. Don't just shard because you ran out of capacity in that server you own. Weigh the cost of new hardware vs. app changes and added complexity. And make sure you weigh all the costs from equipment to development to operations. If you follow Randy's advice you will find BigMemory a critical weapon in your kit. Scaling up can often be the fastest, easiest path to more capacity.

We took BigMemory and made sure it really solves the GC problem in the context of caching inside your JVM. To that end we needed to make sure it made customer applications that we had previously manually tuned run smoothly even when we removed all the delicate GC settings. We needed to make sure it handled heavy read/write scenarios and that our MemoryManager didn't just push the garbage collection problem from JVM-GC to our own. This second point is where we focused much testing effort.

See, as some have rightly questioned, the GC is years-mature now and has to clean up behind apps. Given that our cache is used in read/write scenarios, don't we essentially have to reinvent the GC-wheel? No. Caches are not as powerful as generic object heaps. You can't just store arbitrary data in a cache that is interconnected. I mean, if (k1, v1) is in cache in Ehcache BigMemory, it has no references to (k2, v2). They cannot refer to each other. Which means I don't have to walk the cache looking for cross-references in the data like a GC does. No mark-and-sweep. I merely wait for the dev to call cache.remove( element ) or for the evictor to evict stale data. The evictor and the remove() API are signaling my MemoryManager explicitly just like new() and delete() or malloc() and free(). There is no notion of garbage or collection. Only memory re-use and fragmentation over time.

If I read from the cache 7MM times / second I miss the point by doing a read-only test. 7MM reads per second might as well be 7 trillion. reads don't create any garbage or any challenge for anything but the CPU. But if I write to the cache as fast as I can, replacing (k,v) pairs with other (k,v) pairs of significantly varying sizes, then I create holes all over the heap that have to be reorganized so that I don't end up with a huge, fragmented heap that takes forever to write any objects. This was our test. Write, write write. And change the size of the payload causing different-sized holes in RAM to be managed by our system. And we can do 500,000 writes per second with a 350GB heap and we can do it for 24 hours straight without a single full pause and with the variance in transactions per second essentially ZERO. The duration is important too BTW. ConcurrentMarkSweep collectors can run for hours w/ no full pauses and then suddenly stop the world for 30 minutes trying to catch up to the mess they have made. Don't believe anyone who tells you they routinely run 10GB+ JVMs that never have pause problems. Only once have I seen a 54GB heap with CMS collector with no full pauses and even Sun engineers cannot explain to me how that's possible except if the objects are all of a certain data type (I will spare you the details here).

In Summary

We built BigMemory to help Java catch up to hardware. We got sick of customers asking us to help them max out 32GB machines and forcing them to run 2, 4, or 10 JVMs per machine to do it. We got sick of the GC pausing the JVM in which sensitive clustering code was running and thus causing all sorts of failure scenarios that wouldn't exist in a predictable realtime or near-realtime runtime environment. So we solved the problem for ourselves and our customers; then we realized we were on to something. We realized we didn't have to force people to cluster / scale-out to get more performance and more data into Java apps.

Our customers' apps proved to us (taking months of tuning and being able to delete it all and go back to a vanilla JVM configuration) that we solved the problem. And our benchmark is a well-crafted benchmark that proves we didn't punt the GC problem to another subsystem. It is something no vendor has done before in the distributed caching space--it is a heavy-write benchmark with massively varying object sizes. And, most critically, we built BigMemory to offer people the most important capability we can: scale up or out. It is now your choice.

Forget grids of thousands of machines--unless you are NASA Ames or a Wall St. firm. Avoid the need to scale to 100's of machines and instead run on 1, 2, or as many JVMs as you want. And, buy a Dell R910 with >200GB of RAM like I did and cache your 190GB data table from your MySQL box, as we did in a single JVM. That's the point. Its new. Its different. And its what people needed.

Try it out today: http://www.terracotta.org/bigmemory

Comments (13) | TrackBack (0) | Permalink

September 14, 2010

OK...so the news is out on BigMemory

posted by ari

BigMemory == d178021a17b9c8800d8f45da53b3bd04

How? that MD5 sum is of the integer number value for 256GB. 256GB is the maximum size of cache we had tested up to the time Greg Luck and I decided to start this little campaign. Since then we have gone past 350GB. But ah well. 350GB is a HUGE cache to load in a single JVM, without any full GC pauses and without any disk I/O (swap / flush to disk), and without any C / C++ hacks or weak reference tricks or what have you.

I will blog more about the value but for now alls [sic] you need to know is that we can load hundreds of gigabytes of cached data into a JVM without affecting GC at all. And, we can do it inside Ehcache and inside Terracotta servers. That's right. Both Ehcache and Terracotta transparently offer off-heap in-process storage for their cache data. Guess what else...that makes our server nearly pauseless.

More later on how to make a tiered storage system between heap, off-heap, local disk, and Terracotta arrays that let's you balance performance vs. offload and scale up vs. scale out!

--Ari

Comments (2) | TrackBack (0) | Permalink

September 7, 2010

What is d178021a17b9c8800d8f45da53b3bd04?

posted by ari

More news coming soon!

--Ari

| TrackBack (0) | Permalink

August 5, 2010

Terracotta versus Hadoop

posted by ari

I have had at least 10 companies ask me to compare and contrast Terracotta with Hadoop in the past 30 days. This question, to me, is an easy trap in which to fall.

Hadoop is recursion-based (MapReduce). You can't express all programatic constructs in recursion very easily, in my opinion.

Terracotta and Ehcache work in Java and .Net. Those languages are full programming languages, devoid of query constructs but rich in data structure constructs.

To me, the question is not how many TPS each of us can deliver or how much data one can handle versus the other. It is a question of right tool for the job.

My answer: most shops should have both Hadoop and Terracotta in use. Think about it this way. Terracotta is generally part of read/write business systems that you might call transactional (not meaning XA / 2-phased commit). Hadoop is generally part of read-only analytics systems you might call business intelligence. Hadoop is used to create cool things like Yahoo's predictive-typing feature in their search engine. Terracotta is used in financial services to execute more trades per hour or run algorithmic trading solutions faster or in gaming to handle more gamers per day. Terracotta is used to make sure no state is lost if Java or .Net apps stop running mid-day. Hadoop is used to crunch massive datasets looking for patterns.

I say use both. What do you think? Do you want more info from me on my perspective? I am happy to put technical detail behind these claims. Just want to see if anyone out there is contemplating a "A vs. B" decision like I have been seeing. If everyone agrees with me, then I will just stop here.

Let me know what you want to hear.

Cheers,

--Ari

Comments (3) | TrackBack (0) | Permalink

May 19, 2010

Ehcache 2.1 just released

posted by ari

We just released 2.1. Its pretty amazing. CAP theorem capabilities. Very very easy to use. And, as usual, more throughput than before.

I am speaking about it tonight @ Oakland JUG.

Come on by!

--Ari

| TrackBack (0) | Permalink

April 22, 2010

@Scale 2010

posted by ari

%40Scale.png

I am very excited to announce the @Scale 2010 one day conference. It's pronounced "At Scale" and it represents a chance for you to get to spend time with all the leaders in virtualization, and scale in one place for an entire day.

The date is September 24th, 2010. It is 1 day after Java One / Oracle Open World so if you are in San Francisco already, why not pop over for an extra day to discuss the real low down and real world stuff going on. The conference is downtown only about 6 blocks from Moscone center!

I am super excited to be announcing the speakers for the conference as well.


  • Jeff Barr of Amazon

  • Simon Crosby, CTO of Citrix

  • Rich Wolski, founder of Eucalyptus

  • Adrian Cole, found of JClouds

  • Paul Querna of CloudKick and libcloud fame

  • Jos Bowmans of Canonical--the Ubuntu guys who also do UEC

  • Greg Luck, James House, and myself from Terracotta

and more. We've got high scale customers who will be describing their use case in detail. And we've got integrations to demo to you. You will see full stack cloud solutions in action including Xen, Eucalyptus, Terracotta, Tomcat, MySQL and more.

Essentially, we will cover the latest and greatest in all aspects of application scale from monitoring and management to application frameworks to virtualization and automation. By the end of the day, you will have a great understanding of these concepts:

  • Elastic Data
  • Elastic Infrastructure

Get your ticket early. The early bird discounts are significant. And, for all the content we will be providing, I think the prices are very reasonable.

BTW, I also want to extend an open invite to the NoSQL leadership (Cassandra, Voldemort, Mongo, Couch, etc.) Everyone with a role in building out Clouds @ Scale or Enterprise apps @ Scale is welcome to be a sponsor or speaker.

Thanks everyone and I am very excited to see you all there!

The URL that you can distribute via twitter, etc is: http://atscale2010.eventbrite.com/

--Ari

| TrackBack (0) | Permalink