« Know your use case and optimize accordingly | Main | Sun's Wacky JavaOne idea...which I actually like »
April 18, 2009
Answering a reader's question: When to offload the DB with Terracotta
posted by ari
I have been remiss in not answering a reader's very intelligent question. Here's the question.
Write-behind would be wonderful for performance, but how do you handle db transaction errors? What if someone places a bet (in your real-life example), and the bet fails to commit? How do we recover from that? Was the user originally told (when the bet was marked dirty) that the bet was successfully placed (and committed)? Or is the user told that the bet is “posted” or something else that implies that the bet is not truly committed yet?If the application considers the objected committed once it is marked dirty (and/or placed on the write-behind queue), then I assume that there could be potential consistency issues with write-behind, where one box has the bet placed (on the queue, but not in the db), but the other boxes do not see the bet placed?
And even on the box that placed the bet, if the db is queried with a projection that includes the bet (rather than the Bet domain object; eg. Some report of all bets, joined in with the user and their account status), then I assume the query will return inconsistent results (ie. The report of bets will not include the bet on the write-behind queue)?
Not that any of this is an argument against write-behind; I just want to understand the use cases where this can be applied safely.
Thank you.
Now to attempt an answer.
Write-behind and detaching are not a panacea. I mean that it is true that this pattern cannot always be used. But at the same time it does not suffer from the issues you raise, at least not in this use case.
Always start from the top down as we would in this online betting use case. What is the user experience we want and why? Make design decisions from there. In the questions you ask, I sense a desire to keep the database as the master of all data which is a bottom-up approach. Let's cover the pitfalls of a bottom-up or database-up approach first.
As an example, I was on a panel with Brian Goetz and several "grid" vendors in 2008 at QCon San Francisco. A similar question to yours was asked--"Can I really always build asynchronous apps?" The answer from all the grid vendors was a thundering, "Yes!" They need you to believe you can. My answer was, "No. You can't always afford it. Asynchronous nature might be hard to add given your business requirements." Brian got much more concrete and asked the audience to think about EBay and Amazon. Amazon can be simplified down to an e-catalog site. This is not to take away from their monumental achievements in scale and reliability. I only mean to say that they have a massive need for caching where people buy maybe 5 - 10% of the time so 90% of visits are readers. EBay has a massive write rate and those writes are highly localized to auctions ending in the next hour or minute.
EBay and Amazon need to make almost entirely inverse architecture decisions yet both are called eCommerce sites. Both have a catalog. Both have a user account management function. And, both have payment clearing capabilities. Amazon also has fulfillment capabilities and warehousing concerns that EBay does not but that is beside the point I am making.
Amazon might tell you that Sleepycat or voldemort or memcache rule (they only use Sleepycat, BTW. Voldemort is an OSS clone of their Dynamo approach). EBay would tell you that partitioned Oracle works great. The two are both correct. EBay needed transactional ACID updates in many of their pageviews whereas Amazon needed a different optimization (read Amazon's Dynamo paper for their approach to eventually consistent data storage).
Back to our topic. In the case of betting, you have:
- a catalog of games / matches / things on which to place bets
- User account info
- payment clearing system
- placed bets which are like items you have sold that you have yet to fulfill / deliver
and more.
We can surmise together that when I place a bet and I get an email or web-based confirmation saying my bet has been placed for, let's say, $100, at 2:1 odds in the affirmative (meaning I think the results will be in favor of the bet direction), I fully expect to get paid $200 after the game if I win and to have my $100 debited if I lose. If I win and I do not get the $200, there will be no excuses the bookmaker can make that will have me come back. A bookmaker who does not know the commitments he has made will soon be out of business.
So as an architect, I start to think about coherence and transaction isolation, and databases sound really good. But are they necessary? No. What I really want is:
- all my commitments on disk so I never risk losing anything
- make that on more than 1 disk in case I even lose my disks
- coherent recording of the odds and the amounts of the bet
- a transactional view of my users' account balances so I don't extend credit when I do not intend to.
- a fast, efficient way to search for, display matches and also a coherent way to update odds as my staff learn new info (say the quarterback breaks his leg).
Mapping Terracotta to this business requirement I would suggest:
- well, Terracotta is disk-based just like the DB so I can in theory use it in more places than I would use a cache
- Yes, Terracotta server array will have at least 2 copies or more if I configure properly, on 2 or more separate _machines_. That's lots of redundancy, more than the DB in fact.
- Terracotta operations have isolation levels just like a DB. There are read locks, write locks, synch-write locks and concurrent locks. I would likely use a combination of these to record a bet that is being placed. Specifically, read lock the catalog to bet against the currently-available odds, and write-lock the user's account or my list of placed bets or both to debit the account and write the placed bet.
- I can use readwritelocks in util.concurrent to compose a transaction across a debit + credit operation against a user's account. But, I want the accounts recorded in the DB and cannot use Terracotta in a 2PC manner, so I might choose to cache the account in Terracotta but keep it in the DB, using instead a 1.5PC (I covered 1.5PC in the original blog and will not cover it again here).
- Perhaps put the catalog of games in the DB, and cache the catalog in Terracotta and when my back office content management tools change the DB, also send a JMS message to a node in my app cluster to update the in-memory cache as well, or I could just make my back-office updater a part of my production app cache in Terracotta.
In summary, cache all the read only data in Terracotta to offload the DB of its otherwise wasted usage under web-onlookers. Put user financial info in the DB and manage account balances there, using DB transactions to update. Then put the bets into Terracotta, lazily flushing to the DB so that a back office SQL script can run payouts and collections at the end of each match, against people's accounts. This will offload not just read-only onlookers but actual updates and business transactions as well, at least for a time. At Terracotta we call this "shaving the peak load" where the DB does not have to be sized large enough to handle traffic spikes.
To make this all work, we need a few more details though. We have to make sure that we write an "end of match" eventing system that makes sure to flush all placed bets to the DB so that we can clear payments accurately at the end of a match from entirely within the DB. Could you put accounting in Terracotta? Yes, but all your back office tools would have to be rewritten to work against Java web services of some sort instead of against the DB they most likely work against today.
As for a stable view of bets and odds that are constantly being changed by the back office team members who are trying to optimize company revenue and there are a few challenges. First, the odds. Easiest way to update them in a transaction with the DB is to have the back office app that changes the odds, work within a Terracotta transaction.
Now, the odds will be updated or fail and the business will know what risk the failure poses. Our employees will always get clear messages from the Office updater app that they successfully changed the odds for a match or that the change failed. They can even choose to stop all betting if something goes horribly wrong. But they will always know the status of the games and odds.
Now, let's secure our betting model. Bets are being placed into Terracotta and onto 2 disks so they are safe. And the odds at which they are placed were valid odds at the time given we made a good back office tool for updating odds. We just need to make sure all bets are flushed to the DB before the DB executes its payout process. Should be simple. When a match begins, flip the DB state to "INCOMPLETE" for that match in some table. the payout script will refuse to run payouts against incomplete matches. Now, have the app cluster flip the match to complete using a timer task of some sort. Put the COMPLETION marker in the asynch write-behind queue with bets. When the app tier decides to COMPLETE the match, all subsequent bet attempts will be rejected. And a COMPLETION marker will be placed in the queue that will flip the DB state. But it is placed in sequence or in order with all the bets. So all you need is an in-order asynch write behind and bets can be placed asynch with the DB commit. Let's take a look at in-order queuing in a picture. It is a kewl technique that can help offload the db with fewer worries.
We have 2 recipes. First, adding Terracotta and 1.5PC updates to the DB-based application. Second, using write behind queuing and sometimes guaranteeing the queue's processing order. There are more but I just want to give you a taste of the sort of thinking you must embark upon when offloading the DB using Terracotta, in a by-hand manner.
As you can see, these recipes won't apply in all use cases. And now I think we close in on an answer to your question. Bets are not shared data amongst multiple users. Bets are per-user, you see? And thus bets can be asynchronously written back to the DB and we can even build db tasks that should only run when the asynch queue is drained and have no polling. If bets were in fact as you suggest and predicated on what bets others have placed we couldn't do asynch write behind. We would instead have to do write through. If we were not a bookmaker, by the way, and instead a betting platform, then the pattern would shift from asynch write behind to matching engine. Matching engines can be built in Terracotta that are 100X faster than Oracle RDBMS, but that's a different blog entry. Nonetheless, I would still not need to write through to a DB nor do I have to give up on offloading the db in such use cases where I share data amongst user threads. I would just need a different pattern. In fact, I hear from many customers that they have moved highly contended multi-user write operations to Terracotta from the DB because the DB lock manager deadlocks regularly and Terracotta can handle much much higher write rates (check out the quote from Guy Moller, CTO of Brands4Friends on our terracottatech.com homepage--that is one such use case).
If I were to give you 1 rule it is this: use the freeze/thaw method. If you are using an ORMapper and thawing data out of a DB record back into memory every time you use that data in the app, consider keeping that data thawed in Terracotta. If the thaw occurs, but the data is only thawed once a day, or once a week or once in a month or once in a year, you could leave it frozen. Thawed data format is good for data being accessed once a second or once a minute or once an hour. Once you leave data in thawed form, you eventually have to freeze it back if your database is supporting reporting or other back office operations that your app cannot support on its own. Freezing data that has been thawed for a long time is a write-behind pattern.
Hope this helps. If not, consider just using Hibernate, Spring, and Terracotta together as in our Examinator reference app. You can then plug Terracotta in as a 2nd level cache providor and offload the DB as much as possible w/o major app surgery.
Cheers,
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/88
Comments
Thanks Ari for the very detailed response. It definitely helped me better understand where you coming from, especially when you suggest that write-behind and detaching are not a panacea (unfortunately).
>> In the questions you ask, I sense a desire to keep the database as the master of all data which is a bottom-up approach. Let's cover the pitfalls of a bottom-up or database-up approach first.
On the contrary, I would love to dispense with the database (as the system of record), but I was assuming in my question that the db was the system of record. If the app is, then most of my transactional concerns disappear.
Do you think it’s safe to assume that if a write-behind pattern is used, then the app *must* be the system of record for those objects?
I also assume that even if the db is secondary, the write-behinds should still be coherent (well ordered). If that is true, then what if a db write-behind transaction fails? The db in general may be failing for system reasons, or maybe the failure is due to some technical mapping issue. In either case I assume that the write-behind queue could not drain until the transaction error was corrected? I’m trying to think of the practical implications of this. I would have to stop any write-behinds on failure and send an alert, and the queue would continue to fill up while the problem is resolved? In the worst case the app, or at least some piece of it, would need to be turned off to fix the issue that is causing the db transaction to fail?
Ever since I first read of Terracotta I have been tantalized by the thought of completely removing the need for a database. For some applications I’m sure that is being done today. But for business apps I have several problems. Note that I’m not trying to suggest that Terricotta is claiming to support this usecase. It’s just so close, that it is useful to consider where the gaps are.
- need for querying.
- In practice there are many cases where non-trivial collections of objects need to be queried, either for business logic, or for UI rendering. Most cases this also done over large collections of objects, so iterating over a large object graph for search may not be practical.
- Manually creating many specialized indexes and keeping them all transactionally consistent is possible, but complex and hard to maintain.
- schema/object browsing and adhoc querying.
- This may seem non-essential, but in practice this is done all time, especially for analysis purposes and troubleshooting.
- reporting.
- Reporting often requires non-trivial queries, and reporting tools often require (or work more effectively) with a jdbc interface.
- Many reports can be offloaded to a warehouse, but there are also cases where real-time operational reports are necessary.
- Even for the warehouse, how do we synchronize app state into the warehouse? Messaging is possible, but I can not think of a complete solution that is not very complex, particularly recovering from lost messages.
- support for other tools that interface with jdbc.
- eg. ETL for warehousing, OLAP, or some other forms of analysis.
Because of these issues I have concluded that the database is still necessary for me (unfortunately).
But maybe the app can still (mostly) be the system of record? In that case I may be able to write behind to a db for many/most scenarios?
Querying consistent data is still an issue, but your in/complete pattern may help solve that, but I’m not sure if that will scale to a general solution (where any general sql query will know how long to wait until the db is consistent with the query semantics), or is it best when used tactically for particular use cases?
>> If I were to give you 1 rule it is this: use the freeze/thaw method. If you are using an ORMapper and thawing data out of a DB record back into memory every time you use that data in the app, consider keeping that data thawed in Terracotta.
I would honestly still be very happy to settle for having the database as the system of record, and cache/thaw the domain model as much as possible (ideally all objects), maximizing locality-of-reference, and using write-through “1.5” transactions.
But if I have a rich (non-transient) domain model (deep object graph) and if I use an ORM such a Hibernate, I run into many issues that make caching detached objects difficult.
- Hibernate does not maintain Java object identity for detached objects.
- Likely a side-effect of Hibernate’s serialized second-level cache.
- Problems with lazy loading detached references.
- We either eagerly load the whole data model (no lazy load), or we need to reattach to sessions with Lock() before navigating the graph.
- If we lazy load, then inconsistent copies are possible in the detached cache because of the lack of object identity.
- Reattaching for update is heavy-weight.
- The object graph cascades updates causing a database update or select for every object (regardless if it is has changed).
- Otherwise the app must track which objects have changed and manually update of each object individually.
- On failed transactions how do we rollback state changes to the in-memory (detached) objects?
- I think the normal Hibernate pattern is to drop the cached object and reload the pre-transaction object state.
- But if some other detached object holds a reference to it, then what (similar to the lazy load problem)?
I am no hibernate expert, so some of these issues may have solutions, or I may be incorrect.
I think a fundamental problem is trying to cache without support for object identity. So another option is finding an extension to Hibernate to support object identity, or use some other ORM with object identity support.
One possible option here is Eclipselink (Toplink). It’s cache architecture is designed to maintain java object identity. I have never personally tested Eclipselink to validate this behaviour, so there may be issues I’m not aware of, but on paper it seems like a great match for Terracotta and it seems to solve most of the issues that make it difficult to do long term caching of rich domain models.
I actually suggested a Terracotta/Eclipselink integration some time ago on your Jira.
https://jira.terracotta.org/jira/browse/CDV-1088
http://www.nabble.com/Eclipselink-coordinated-cache-integration-with-Terracotta.-td21116004.html
Thank you Ari.
Posted by: Marko at April 28, 2009 4:09 PM