« Good article on the rise of Glassfish | Main | Finally: transparently clustered JRuby »
November 11, 2008
Tearing down the relational database
posted by ari
I was thinking last week...the database can prove extremely hard to live with for the application developer when working with hierarchical data. This is something with which many experts and seasoned veterans of enterprise Java app development to whom I know readily agree. But, when spending time out in the market with customers and users, once in a wile I land upon a few folks asking, "why Terracotta? Why can't the database keep up?" Here's an attempt at an answer to the underlying question of where the database doesn't work and, as a result, where Terracotta can help the otherwise database-centric application.
The following diagram is of a classic computer science state machine. These devices are sometimes used to define languages / grammars or business processes and workflow. In our case, let's treat the diagram as both.

Starting with a language specification, let's use the diagram together for a second. Assume we have a method named
boolean valid( final String s );
| input | returns |
|---|---|
| abe | true |
| abcde | true |
| ac | false |
valid( "ac" ) returns false because there is no arrow connecting 'a' directly to 'c' in the state machine. A few longer ones would include:
| input | returns |
|---|---|
| abcbcbcbcbcdbe | true |
| abcdbcdbcdc | false |
So, implementing valid() in Java naively would look like this (I am sure I have typos...I didn't try to compile this so I apologize up front):
int STARTED = 0; // only triggered if 'a' is encountered
int ENDED = 0; // only triggered if 'e' is encountered
int A = 0; // last token was 'a'
int B = 1; // last token was 'b'
int C = 2; // last token was 'c'
int D = 3; // last token was 'd'
int e = 4; // last token was 'e'
int lastToken = -1;boolean valid( final String s ) {
int index = 0;
while( index < s.length() ) {
char c = s.charAt( index++ );
switch( c ) {
case 'a': if( STARTED == 1 ) {
System.out.println( "error at" + index ); break; }
else { STARTED = 1; lastToken = A; }
case 'b': if( STARTED == 0 || ENDED == 1 ) {
System.out.println( "error at" + index ); break; }
else lastToken = B;
case 'c': if( STARTED == 0 || ENDED == 1 || lastToken != B ) {
System.out.println( "error at" + index); break; }
else { lastToken = C; }
case 'd': if( STARTED == 0 || ENDED == 1 || lastToken != C ) {
System.out.println( "error at" + index); break; }
else { lastToken = D; }
case 'e': if( STARTED == 0 || ENDED == 1 || lastToken == C || lastToken == A ) {
System.out.println( "error at" + index); break; }
else { lastToken = E; ENDED = 1; }
}
}return ENDED;
}
Okay. That was ugly but it sorta works. The next step is to implement this same logic in a database with SQL. I don't know how. More accurately I know several ways which I have no intention of trying to write down as the solution requires several parts. Part one--a table with some columns to track intermediate states. Part two--some fancy SQL UPDATES with JOINs onto oneself and maybe some validation rules. Perhaps I would create a limitation of, say, 256 characters on the input to keep things simple. Then I would create 256 columns in a table and I could insert the String one character at a time, looking at the previous column's value in a query before inserting the current column. Then the String would only make it into the table in its entirety if the String passed my parsing logic. This way I don't need much work and I could insert / update a new row for each input and thus report on which Strings passed and failed validation by looking for rows that do not end in 'e'. Surely my easiest option would be to write a stored procedure in the database--in Java even--to represent this state machine but that is somehow cheating.
Let's add a twist. If I want to analyze how a particular String made its way through the state machine (say I want to draw the state transitions and changes over time), in the Java version, the String is itself the history of the String. Example: "abcbe" tells me that the state started at "a" and then added a "b", "c", back to "b", and then ended in "e". I just need to take characters off the String in order to see how they got there.
I believe this state machine suggests that in an object-oriented world, the object graph itself is the hierarchy--the relationship--and I can easily represent the object's history too. I recently helped a user create a VersionedTree by hiding LinkedLists at each level in the Tree. Every time I update a level in the Tree, I create a new entry in the LinkedList representing the new path in the Tree. There VersionedTree with Copy-On-Write semantics. Kewl, right?
In the database, history or state transitions have to be modeled separately. I have to write a transaction log or history table and I have to restrict all developers to write to it via my interfaces and rules or I have to update this history via triggers. I have to do lots of work to shoehorn my "machine" into the database.
Hierarchies and directed graphs like a state machine (which are hierarchies that are allowed to loop on themselves randomly) are hard to represent in the database. Hopefully I have proven my point even though I used more than a little hand-waving.
If hierarchies and directed graphs are hard to represent and manipulate in the database, then a few things come to mind:
1. ORMappers can't help because the SQL is not related to schema or metadata and cannot be generated automatically.
2. The database round-trip is not the only thing the slows performance but so does all the weird querying the database has to do to try to assemble flow from a tabular storage format.
3. The only reason to put state machines / workflows into a database is because it is "durable" meaning it stores everything to disk.
Where I used to work, we fought with the database developers who demanded that all these workflows be modeled in stored procedures. This meant that all states and transitions needed to be modeled in tabular form. No Java-based flow or state. Their argument was that the database is persistent and workflows that crash / stop can be repaired and restarted using SQL command-line tools; in Java these broken flows are stuck in memory and must be dropped on the floor. I couldn't fight this easily before Terracotta existed but now I can:
Terracotta is persistent like a database but without the burden of ORM and tables. Entities in the database, I get. Customers and users, products and sales transactions, and other such objects belong in the database. When the database cannot keep up, then we can partition the database or cache that database. Workflow in the database, I don't believe is going to work though.
Just a random thought.
--Ari
Trackback Pings
TrackBack URL for this entry:
http://blog.terracottatech.com/cgi-bin/mt/mt-tb.cgi/74