« When requirements can lead you astray | Main | Second book on Terracotta »

June 18, 2008

Caching Doesn't Work

posted by ari

I just finished a discussion where someone asserted that developers can be frustrating because they make blanket assertions such as "caching doesn't work" or "the only thing that works for storage is the database." Of course, it is not in my best interest to agree with these statements, but I can see where they are coming from.

Here's the problem though. I have seen several use cases where people take streams of data (gigabytes an hour), shove the streams into Oracle and then want to report on random slices of the stream.

This is tough stuff. It is tough because the database can barely keep up with the insert volumes if at all. Then the ad hoc query and reporting workload breaks the database's back.

So people pick up their head and say, "cache the queries." Problem is, you end up with what I call a "long tail" where every query is somewhat different and thus, a long tail of seemingly one-off queries misses the cache and ends up hitting the db.

Solution? Process the data in-stream. Pre-generate the object oriented representation using Terracotta in memory (you get access to lots of memory outside the scope of your Java heap and you get durability in case of system crash--everything is on disk). Essentially, if all your data is made up of events and you have to analyze those events, might as well do the analysis as the data is flowing by and not do analysis just when the user pulls the report.

The difference is that reporting on data stored in Oracle can be cached but the cache is only useful if the same analysis is executed twice (exact same SQL). Pre-analyzing the data and storing the summaries / roll-ups (parse tree, if you will) and sharing those is not about caching. It can handle much higher rates of change because the ad hoc analysis workload is eliminated. Furthermore, you can store a fine-grained analytical tree of data that can support many ad hoc types of queries from memory.

So, while it is ludicrous to assert that a rule might exist where caching does not work, it is just as ludicrous to attempt to cache a long tail problem. Go at it another way.

--Ari

Trackback Pings

TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/62

Comments

Post a comment




Remember Me?

(you may use HTML tags for style)