« May 2008 | Main | August 2008 »

June 2008 Archives

June 1, 2008

The Book is Done!

From the publisher...

Hi guys:
I wanted to let you know that your book shipped today. It's technically a
day late but I talked our Manufacturing Director into counting it as still
being on time for our Friday ship date. I know this was a long haul, so
congratulations and thank you very much for your efforts.

Your book's official publication date is June 23rd, so I expect you'll
receive your [author] copies by the end of June or in the first week of July.

Thanks again and safe travels in the coming months.
Apress team.

June 13, 2008

When requirements can lead you astray

First an aside:

I was reading European Car (June 2008) magazine this morning. Okay. Okay. I am an über geek. I don't just geek on software and infrastructure but on all things engineering. I admit it.

Anyways, the contributing editor's letter was all about top-down vs. bottom-up design. He thought till last week that auto-driving highways is a near impossibility for now because we need perfectly precise GPS, huge compute horsepower to keep track of all the cars everywhere in real time, etc. He then drove a BMW that was using radar and cameras to follow the car in front (this car is a production car available at your local BMW dealer...nothing secret and experimental). He realized immediately that the cars can work in a mesh of self-piloting little brains all working with a few inputs and internally handling special cases like emergency stop, etc. all on their own.

I found it relevant to software design from a requirements gathering perspective. If the requirement from the business is stated as "make all the cars drive on their own from Los Angeles to Las Vegas and make sure no one gets killed," how are we as engineers supposed to know that we should use a centralized or decentralized brain for the implementation? The editor called this a top-down vs. bottom-up sort of thing but he was wrong. Its not about agile, top-down, requirements first, or anything else. Its about good engineering. One engineer can take the requirement and come up with a centralized approach while another can do exactly the opposite.

The point? Yesterday I got a requirement from a customer to store 1TB of data churning every 3 hours and then enable users to report on it, generating ad hoc queries. So, I dug around a bit and figured out that (a) the reports are canned and (b) the reports are all about statistical analysis of the data (mins, maxes, averages, what have you). So we flipped the requirements to packing the data into report-ready form sort of like cubing in an OLAP system. We then did away with the raw data saying to the users that if you install a new reporting dimension into the system, it will be ready to view 3 hours later after it has been populated.

Exact same problem as the car-self-driving thing. User asked, "how do I show I retain a window of 3 hours of data for users to report against." The answer turns out to be "generate the reports in stream as the data is flowing by. Then forget about the raw data."

The punchline: the user was struggling with a distributed in-memory SQL engine on top of Terracotta when none was required.

Don't jump to quickly to design and implementation. IMHO, it is important to keep asking questions long after you think you understand the use case. Poke at all the requirements. And if you want a more formal framework, I would say, if you know what is hard and easy based on lots of QA and stress testing and performance analysis, use that info to poke and prod on the requirements. Cast a business problem toward your strengths, not your weaknesses, and never assume that the business cannot relax a rule or two.

--Ari

June 18, 2008

Caching Doesn't Work

I just finished a discussion where someone asserted that developers can be frustrating because they make blanket assertions such as "caching doesn't work" or "the only thing that works for storage is the database." Of course, it is not in my best interest to agree with these statements, but I can see where they are coming from.

Here's the problem though. I have seen several use cases where people take streams of data (gigabytes an hour), shove the streams into Oracle and then want to report on random slices of the stream.

This is tough stuff. It is tough because the database can barely keep up with the insert volumes if at all. Then the ad hoc query and reporting workload breaks the database's back.

So people pick up their head and say, "cache the queries." Problem is, you end up with what I call a "long tail" where every query is somewhat different and thus, a long tail of seemingly one-off queries misses the cache and ends up hitting the db.

Solution? Process the data in-stream. Pre-generate the object oriented representation using Terracotta in memory (you get access to lots of memory outside the scope of your Java heap and you get durability in case of system crash--everything is on disk). Essentially, if all your data is made up of events and you have to analyze those events, might as well do the analysis as the data is flowing by and not do analysis just when the user pulls the report.

The difference is that reporting on data stored in Oracle can be cached but the cache is only useful if the same analysis is executed twice (exact same SQL). Pre-analyzing the data and storing the summaries / roll-ups (parse tree, if you will) and sharing those is not about caching. It can handle much higher rates of change because the ad hoc analysis workload is eliminated. Furthermore, you can store a fine-grained analytical tree of data that can support many ad hoc types of queries from memory.

So, while it is ludicrous to assert that a rule might exist where caching does not work, it is just as ludicrous to attempt to cache a long tail problem. Go at it another way.

--Ari

June 19, 2008

Second book on Terracotta

Time to get started on the next book. Steve Harris (our head of engineering) had a good idea. "Clustered Design Patterns." I think he is just the man to help write it too. Any interest out there in such a book?

--Ari

June 20, 2008

The Terracotta Book is in the Mail

The following items have been shipped to you by Amazon.com:
---------------------------------------------------------------------
Qty Item Price Shipped Subtotal
---------------------------------------------------------------------
Amazon.com items (Sold by Amazon.com, LLC):
1 The Definitive Guide to Te... $29.69 1 $29.69

Shipped via USPS (estimated arrival date: 26-June-2008).

Get it for yourself.

June 23, 2008

Avoiding the Palm / Treo Mistake

I was thinking on the flight back from Scotland to London today (mostly because several of us were geeking out about our phones and the OSes on those phones):

Palm Treo almost cornered the market on smartphones / PDAs. What happened? Well, apart from the bulkiness and slowness of the physical device, the OS was unstable. I remember that it couldn't even get a semaphore implemented in a stable fashion. My Treo would regularly lock up trying to pull mail while I was in the middle of a phone conversation. The radio can only be used for data or voice protocol so lock the radio out to protect from confusing the device. But they couldn't seem to get it right.

What killed Treo in my opinion was largely its instability. I think it is safe to assume Palm had a hardcore QA team. After all they were building devices, OSes, and apps at some point in their past. So what went wrong? The wrong type of QA. Concurrent applications. High stress (lots of email in memory). Slow network connectivity. While I know that Palm regularly sent users into the field with pre-release versions of systems, a more explicit framework was required.

Clustered application QA is a hot-button issue for us here at Terracotta. I expect to put a few documents together to help explain things but for now, here's a quick set of rules:

1. functionally QA without Terracotta in the mix (since it is transparent)
2. Then functionally QA clustering with Terracotta in the mix. By example: if you have a web-based workflow, run through the functional flow on a single JVM. Then rerun the flow while proxying through a round-robin load balancer across 4 JVMs. Then do it with multiple simulated users. This will confirm your business data is coherent and shared. (Yes, I have seen use cases where the data is not shared even though the application team integrated Terracotta into the application.)
3. Stress test at 1X, 2X, 5X, 10X, and 100X your production workload. Can't afford a production scale stress lab? Then push a scaled down cluster to the same TPS as production. Example: if production sees 10 TPS per server at peak, then test to 1000TPS per server, even if you have only 2 servers in rotation. Its not perfect but it will teach you a lot.

This should help you to get started producing stable clustered applications over and over. It will make your boss and the line of business to which he and you are accountable very happy. BTW, these rules have nothing to do with Terracotta. They are just good practice. In general, I also recommend something our head of engineering always reminds us to do. And that is to stay high level. Don't only worry about whitebox QA. Think of user stories and scenarios and walk your application through the scenario, from top to bottom.

Let me know what rules you adhere to to avoid the Treo mistake.

About June 2008

This page contains all entries posted to POJO Mojo in June 2008. They are listed from oldest to newest.

May 2008 is the previous archive.

August 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34