« The Terracotta Book is in the Mail | Main | When scale out goes wrong... »
June 23, 2008
Avoiding the Palm / Treo Mistake
posted by ari
I was thinking on the flight back from Scotland to London today (mostly because several of us were geeking out about our phones and the OSes on those phones):
Palm Treo almost cornered the market on smartphones / PDAs. What happened? Well, apart from the bulkiness and slowness of the physical device, the OS was unstable. I remember that it couldn't even get a semaphore implemented in a stable fashion. My Treo would regularly lock up trying to pull mail while I was in the middle of a phone conversation. The radio can only be used for data or voice protocol so lock the radio out to protect from confusing the device. But they couldn't seem to get it right.
What killed Treo in my opinion was largely its instability. I think it is safe to assume Palm had a hardcore QA team. After all they were building devices, OSes, and apps at some point in their past. So what went wrong? The wrong type of QA. Concurrent applications. High stress (lots of email in memory). Slow network connectivity. While I know that Palm regularly sent users into the field with pre-release versions of systems, a more explicit framework was required.
Clustered application QA is a hot-button issue for us here at Terracotta. I expect to put a few documents together to help explain things but for now, here's a quick set of rules:
1. functionally QA without Terracotta in the mix (since it is transparent)
2. Then functionally QA clustering with Terracotta in the mix. By example: if you have a web-based workflow, run through the functional flow on a single JVM. Then rerun the flow while proxying through a round-robin load balancer across 4 JVMs. Then do it with multiple simulated users. This will confirm your business data is coherent and shared. (Yes, I have seen use cases where the data is not shared even though the application team integrated Terracotta into the application.)
3. Stress test at 1X, 2X, 5X, 10X, and 100X your production workload. Can't afford a production scale stress lab? Then push a scaled down cluster to the same TPS as production. Example: if production sees 10 TPS per server at peak, then test to 1000TPS per server, even if you have only 2 servers in rotation. Its not perfect but it will teach you a lot.
This should help you to get started producing stable clustered applications over and over. It will make your boss and the line of business to which he and you are accountable very happy. BTW, these rules have nothing to do with Terracotta. They are just good practice. In general, I also recommend something our head of engineering always reminds us to do. And that is to stay high level. Don't only worry about whitebox QA. Think of user stories and scenarios and walk your application through the scenario, from top to bottom.
Let me know what rules you adhere to to avoid the Treo mistake.
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/65