« FUD of the Week: Terracotta has no API | Main | FUD OF THE WEEK: Object Identity is akin to cheating »

September 6, 2007

A Story About Tuning DB / SAN and Application bottlenecks

posted by ari

Let me tell you a story about a pattern I used to notice but stopped thinking about till just now. This story is about John, the DBA. John is a good DBA. He knows Oracle inside and out. John has a strong partner in Sam who is the storage admin. John and Sam were hell-bent on tuning Oracle underneath a largescale Java application without involving source code changes in the app, itself. This is the tale of their efforts.

The first step was to do classical db table optimizations in line with the query patterns reported in the Oracle V$ tables. John and Sam had the advent of being trained by one of the fathers of the filesystem and storage industry. He explained how queries can manifest on the disk controller, and on down to the disk heads. You can sort of picture his lessons visually because he had db tuning down so clearly a child could do it.

"Picture the disk like a record player and the disk head, the needle," their teacher said. "Now if you have a lot of I/O wait, that disk is spinning at a constant speed but the needle is racing as fast as it can back and forth to random locations on the platter. What you need is to ensure that the needle is floating back and forth at a constant speed across the platter and that the writes your db is trying to do get dropped onto the disk occur only when the needle happens to be over the correct location on the platter. Never let the disk seek to a single location just to write one block! Regular I/O avoids the latency of travel of the needle over the record and, thus, improves throughput."

Sam and John got it. They would partner together to get this db tuned. And so they started with queries that ran for a long duration, each. With slow queries, they examined whether the queries were a full table scan or running off of an index. Step one was to add an index to all FTS's. If an index was insufficient, John could go add hints to help the index get used more often, or John could partition the table with Sam onto multiple disks so that aggregate bandwidth to disk was increased. This made sense because these long running queries were one of the types causing the needle to move too much on the platter and /or causing it to return too much data relative to the needle's capacity.

Next to optimize were queries that in aggregate were taking a long time. John said to Sam, "so, we have tuned the disk underneath the big expensive queries. But what about the little expensive queries?" Sam asked, "What is a LITTLE EXPENSIVE query? That makes no sense?" John retorted, "well, if the query only takes 2ms, but it runs 20 million times a day, it is more a problem than the query that runs for 2 hours!" So their quest entered a new stage.

They hunted for such expensive little queries (not too hard in Oracle or most modern DBs) and their tuning method? If the queries were SELECTS, they would pin those tables in the SAN's cache (their SAN had a 64GB cache so this was viable). This was preferable to just allocating a giant SGA that would help with read hit ratios because all I/O was isolated away from the disk. And the SGA inside the operating system RAM is not as easily controled as the SAN for isolating a specific table to a specific I/O subchannel. In the analogy to the record player, they were taking this entire class of workload off of the needle so that it could focus on sequential writes of business transaction data such as new sales or new customer records. New business records, the two partners had figured out were writing sequentially to the redo logs and thus, were laying down onto the platter in a regular linear pattern as the disk head would pass by in a smooth flowing motion.


Now, they stepped back and marveled at their work. They had managed to eliminate about 25% of the DB's workload by more tightly baking in I/O assumptions into the table-to-disk relationships. Their boss gave them both spot bonuses and so John and Sam said to him, "Let us take a second pass at this. We can squeeze out more." Their boss immediately took them up on the offer and offered up vacation time if they could get another 10% out of the exercise. So John and Sam waited a week to build up a new statistics sample in the V$ tables and set about the exercise again.

This time, they found a culprit query that they wanted to tune but upon inspection, couldn't tell what to do with it. So they came to the Java application architect with their list. He asked them to explain the exercise and the results to date. He was puzzled as to why he wasn't involved. Being a non-political sort of chap he shrugged off this issue and focused on how he could help.

"Let me go grep the code for that SQL. Ah. I see it right here. We can totally change the app and help you out. This just doesn't need to be calling the database like this at all."

John immediately replied, "wait! Don't change the app. It will throw everything off, plus this was meant to be a totally blackbox exercise."

The problem was as follows. This particular web application has an alphabetical navigation widget in the servlets. To explain by example, if you are trying to list all the music CDs or books you have for sale, you can't list them all on one page, nor can you simply paginate to 10 items or even 100 items per page. There needs to be an efficient non-search-based navigational paradigm. So, imagine the entire alaphabet being displayed with each letter, 'A', 'B', 'C', and so on as a hyperlink if and only if there are products available under that letter. You can also help the user by showing a count of how many items are underneath that letter.

A (10 items)
B (29 items)
...
X (0 items)
...

Turns out the servlet was issuing 26 SELECT COUNT() queries to the database, one for each letter, for each user requesting this servlet. So pinning the table and optimizing the indexes and hints, etc. proved futile due to frequency and amount of data being crawled (over 1MM rows each query). The change to the app was quite simple though. And the Java architect eventually got his way. First, he changed the query to GROUP BY letters of the alphabet and could get all 26 data points in a single round trip to the DB under one cursor. But he didn't want to stop there.

He offered a simple (less than a day's work) change to the app where it wrote to a temporary DB table, the counts of each item under each letter in the alphabet so that the DB would scan the actual data only once and from there, the temp table would be queried directly for the precomputed integer counts.

The eventual solution was to hit the database only once and then rely on an in-memory array with 26 elements, which contained the counts per-letter.

This one change lowered DB utilization another 5%. And so the Java architect earned his stripes and John and Sam's partnership now included a third team member. The next step was to take the LITTLE EXPENSIVE queries and cache them in the application as well. Why hit the database at all?

And the last step was to create a clustered cache because the app tier was 100 nodes and having each node warm up its cache independently seemed a waste of DB resources.

In the end, Sam, John, and the Java architect pulled out 60% reduction in DB utilization. It actually saved their boss $5 million that he was planning to spend on upgrading the DB server in the next year from a 24 CPU machine to a 64 CPU machine.

The story is true. The lesson learned was to create a purpose-built way for JVMs to share memory and coordination without talking to each other through a database. Yes. That alaphabet widget query could have been cached in the database but why get a JDBC connection out of a pool, prepare a SQL statement and fire it off, waiting for TCP, SQL*NET, and the query optimizer to all fire, only for the DB to go to its SGA RAM cache and return the same result it returned 5 seconds ago? Why not store the alphabet array as a java.util.Array? Furthermore, why not have that array shared across JVMs so regardless whether our app servers number 10 or 10,000, we don't hit the database more than once.

This is the power of Terracotta's concept and how our Network Attached Memory approach gives you the power to just use the data structure you want, in your Java code (java.util.Array in this case) and to share that data and locking and more across JVMs without having to share the entire heap across JVMs.

By the way, Sam and John got that vacation their boss promised! Even though the app changed.

Trackback Pings

TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/19

Comments

Post a comment




Remember Me?

(you may use HTML tags for style)