POJO seems so prevalent in the past year that I think, even fear, that my grandmother knows what a POJO is. But, then again, I have to wonder if I even know what one is. Seems like recently, POJO, has come to mean “not EJB”. We as a community need a better POJO definition; one that comes complete with a razor like that of Occam with which we can rapidly conclude what is and is
not a POJO. So, bear with me while I set out to define the components of a POJO, an accompanying razor, and a software test harness that only POJO’s can work under.
I asked a friend what he thought about the POJO hype and he suggests that it boils down to object identity. He has a long explanation but this excerpt summarizes the need for POJO quite well:
I think the problem lies implicitly in the language. equals is very nearly a blemish on the language.
Here is the problem - how can two separate distinct heap objects be "equal" as in the sense of equals? What the does that mean. For example, if a.equals(b) && a != b is true, then we have some kind of mysterious object here. If I call a.set(foo) does that mean that a.equals(b) continues to hold? Or has it now flipped from being true to being false??!? For "true" objects that it continues to hold, I would argue that the programmer expectations were met, but for ones where it is not held, then programmer expectations are horribly broken.
The problem is that a.equals(b) should be reflexive, per the javadocs (b.equals(a) also must hold) but mutations of state are not transitive - thus a.set(foo) or b.set(foo) invalidates the prior "equality" of a and b that are not a == b.
You cannot do this for example:
a.equals(b); // true
c = b;
c == b; // true
c.set(foo);
c == b; // true
a.equals(b); false!
1. Components of POJO
From Martin Fowler, according to
http://en.wikipedia.org/wiki/POJO> Wikipedia
"We wondered why people were so against using regular objects in their systems and concluded that it was because simple objects lacked a fancy name. So we gave them one, and it's caught on very nicely."
...
As of November 2005, the term "POJO" is mainly used to denote a Java object which does not follow any of the (major) Java object models, conventions, or frameworks such as EJB.
So, here's a list of components of the definition of POJO that should matter. Note that in the rest of this discussion, any reference to "components of the definition" refers to this table.
|
Name
|
Description
|
|
Object Identity
|
Simply put, the equality operator must work.
|
Clean Business Interface
|
No beans with getters and setters. No Serialization. Only interfaces
that I choose to define are acceptable here
|
|
Proxy-free
|
Proxies break .equals() and == and, thus object identity
|
|
[de]referencing
|
Following reference, keeping a handle to them, etc. is all ok. If
we don't have references, and everything is passed by value and treated
as values-only, then how truly object oriented can our software be?
|
|
Annotation-optional
|
If annotations are used in place of configuration, that’s ok. If
annotations are used to complete the functionality of the class and the
class ceases to function w/o aspects, this is not POJO (see
http://jonasboner.com/2006/04/24/domain-driven-pointcut-design)
|
|
System Aspects / Concerns dependency
|
System concerns are implementations of frameworks that are not simply an
abstraction of objects on heap but require resources outside the heap
such as sockets, IPCS, files, to function. Interestingly, enough,
there is a recursive aspect of this. If I write a POJO but that
POJO depends on a system concern, I might feel the impact of that system
concern (serialization of session attributes and all objects in the
attribute map, for example). If I inherit from a class that is not
a POJO, I am likely not going to be able to code a POJO. If I
delegate to a non-POJO, however, I can absolutely remain POJO
(dependency injection is an example, here)
|
|
[Im/Ex]plicit Identity fields
|
Using an identity field to map an object to a store is not POJO. A
POJO's identity is already defined in the JVM by that object's
reference. Another notion of ID is inherently an indicator of lack
of POJO nature.
|
|
No Manager
|
Having to get objects from a management context that manages their
lifecycle and transparently calls APIs and interfaces on get() / put()
is not POJO
|
|
Free-typing
|
No restrictions on data types I can use. If it only works for
maps, or doesn't work for arrays because they are special-cased in
bytecode, I shouldn't have to know
|
2. Accompanying Razor
All these components amount to breaking down POJO into "plain" meaning just the Java language without restrictions on its use, "Java" meaning the Java language, and "object" meaning object-oriented design with pass-by-reference semantics. The simplest razor I could come up with is based on the fact that although all the technical components of the definition of POJO seem overwhelming, they all focus on keeping our Java classes free of dependencies and as modular and well-factored as possible. Basically, its about simplicity:
In order to be a POJO, a class must support strict object identity by operating directly on heap, cannot operate on system resources, and cannot expose system concerns.
Note that Hibernate fails this test, but that is okay. We want it to. We are not working with plain Java objects but intentionally working with database rows; they happen to be abstracted as Java objects. As for Hibernate's attempt to simplify the database, it absolutely succeeds.
Spring passes the POJO razor due to the fact that dependency injection of POJO's into other POJO's implies that all those objects operate only on heap. When a non-POJO framework such as an O/R-mapper or a clustering library like JGroups or a JMS queue gets injected into application, the fact that the exposed system concerns spill into my application code cannot be avoided. This is because the framework / library / queue does not operate on heap but on system resources such as sockets. The non-POJO nature comes from the framework and not from dependency injection itself.
Actually, to me both Spring and Hibernate are sort of orthogonal to this discussion in that they are designed to help developers factor complex business applications so as to remove the code smell of databases and scalability / tuning--all system concerns--from our code. They deliver well-factored code in that most of our code looks just like plain old Java and all the dependencies and assumptions are abstracted into a handful or core classes and XML configuration (or annotations). This discussion will cover Spring and Hibernate as compared to a poorly factored sample but it does not seek to pass judgment on those frameworks.
3. Software Test Harness
I think a good harness that can be manipulated to test all the above technologies is our own Inventory demo located in the Terracotta download kit in $TC_HOME/samples/pojo/inventory. (It seems I have been obsessed with it lately, but let's ignore that for now.) The basic construct is a domain model in which I need trees of objects and maps at the same time. The business driver in the demo comes from the real world need to update inventory by SKU (stock keeping unit -- a unique ID for each product a store might sell) but to sell that inventory in multiple departments. In the example, we have a 1 gigabyte flash card that is both in computers and electronics.
A product is defined as follows:
public class Product {
public double price;
public final String name;
public final String sku;
public Product(String n, double p, String s) {
name = n;
price = p;
sku = s;
}
public void setPrice(double p) {
synchronized (this) {
price = p;
}
}
public int hashCode() {
return sku.hashCode();
}
}
And with POJOs my store's domain model is as follows:
public class Store {
public List departments = new ArrayList();
public Map inventory = new HashMap();
...
In the demo, the Store constructor initializes our tiny little store for testing purposes:
1 public Store() {
2 Product warandpeace = new Product("War and Peace", 7.99, "WRPC");
3 Product tripod = new Product("Camera Tripod", 78.99, "TRPD");
4 Product usbmouse = new Product("USB Mouse", 19.99, "USBM");
5 Product flashram = new Product("1GB FlashRAM card", 47.99, "1GFR");
6
7 Department housewares = new Department("B", "Books", new Product[]{warandpeace});
8 Department photography = new Department("P", "Photography", new Product[]{tripod, flashram});
9 Department computers = new Department("C", "Computers", new Product[]{usbmouse, flashram,});
10
11 departments.add(housewares);
12 departments.add(photography);
13 departments.add(computers);
14
15 inventory.put(warandpeace.sku, warandpeace);
16 inventory.put(tripod.sku, tripod);
17 inventory.put(usbmouse.sku, usbmouse);
18 inventory.put(flashram.sku, flashram);
19 }
Note the reference to "flashram" above on lines 8 and 9. In a real store, this sort of thing happens all the time. For that matter, in real applications this will happen. Now, we want to start up copies of this application because, after all, scaling Java (and PHP, .Net, and most other languages) applications by running them on multiple machines is pretty commonplace now. What needs to happen to make this work with various POJO technologies?
3.A. Serialization approach (any of DB blobs, proprietary clustering, JGroups, RMI, or JMS)
First, we turn everything serializable:
public class Product implement Serializable {
...
public class Store implements Serializable {
...
Now, in the body of our main application code, anywhere we update a product or the store, we need to [de]serialize and [get or] send that change back to our storage / clustering mechanism:
1 private void updatePrice() {
2 Product p = null;
3 {
4 out.println("\nEnter SKU of product to update:");
5 out.print("> ");
6 out.flush();
7 String s = getInput().toUpperCase();
8 p = (Product) store.inventory.get(s);
9 if (p == null) {
10 out.print("[ERR] No such product with SKU '" + s + "'\n");
11 return;
12 }
13 }
14 double d = -1;
15 out.println();
16 do {
17 out.println("Enter new price for '" + p.name + "': ");
18 out.print("> ");
19 out.flush();
20 String s = getInput().toUpperCase();
21 try {
22 d = Double.valueOf(s).doubleValue();
23 }
24 catch (NumberFormatException nfe) {
25 continue;
26 }
27 synchronized (p) {
28 p.setPrice(d);
29 }
30 ;
31 } while (d < 0);
32 out.println("\nPrice updated:");
33 printProduct(p);
34 }
We must change lines 8, and line 28/29. Specifically, line 8 has to change from a map.get() call to a lookup of some sort. Perhaps a SQL SELECT query using the String s as a key and retrieving a serialized blob. Or, if we are using proprietary serialization, JGroups, or JMS, we would not have to change line 8. We would instead have some code elsewhere that asynchronously updates our inventory map so that we can trust our local map representation to be as accurate as we need it to be. Line 28/29 needs a SQL UPDATE call or some such code:
27 PreparedStatement stmt = connection.prepareStatement( "SELECT * FROM INVENTORY_TABLE WHERE PRODUCT_ID = ? FOR UPDATE");
28 try {
29 stmt.execute();
30 } catch( sql_exception e ) { }
31 try {
32 ByteArrayOutputStream bos = new ByteArrayOutputStream();
33 out = new ObjectOutputStream(bos);
34 out.writeObject(time);
35 out.close();
36 byte[] buf = bos.toByteArray();
38 PreparedStatement stmt = connection.prepareStatement( "UPDATE INVENTORY_TABLE SET BLOB=? WHERE PRODUCT_ID=?");
39 stmt.setBlob(1, buf);
40 stmt.setString(2, s);
41 stmt.execute();
42 } catch( ...
Without going into the rest of the gory details, you can see that we added lots of code to snapshot our changes to product back down to storage or snapshot those changes around our cluster. The important thing to note though, is not just the changes to class Main that does all the input / output of changes to our domain model, but to the Store design. The store was made up of an ArrayList of departments and a HashMap of Inventory (which is a map of products). So the above code does not even work because we have only updated the product in inventory and ignored the references to it in the ArrayList. (Look back at the code where "flashram" is added to the Store in its constructor both in "photography" and "computers." So, flashram breaks and when we update its price using the above code fragment, we would not see any changes in the 2 departments. So, I guess I should now redefine lines 7 - 9 of my Store constructor to not just add product references to the departments but to instead add product.sku (a String) and I can use that as a pseudo-reference to look up products by ID / SKU. But this means I have to rewrite all of Main.java to get departments out of the ArrayList and then work with Strings representing product keys that I then go get from the Inventory HashMap. Might look like this (printDepartments is an actual method in Main.java in the sample):
1 private void printDepartments() {
2 out.println("+-----------------------+");
3 out.println("| Inventory Listing by Departments |");
4 out.println("+-----------------------+");
5 out.println();
6 for (Iterator i = store.departments.iterator(); i.hasNext(); ) {
7 Department d = (Department) i.next();
8 out.println("Department: " + d.getName());
9 String[] product_skus = d.getProductKeys();
10 for (int i = 0; i < product_skus.length; i++) {
11 Product nextProduct = Inventory.get( product_skus[ i ] );
12 printProduct(nextProduct);
13 }
14 out.println();
15 }
16 }
That works. Good. And I only had to alter lines 9 - 12 to use my Inventory HashMap to lookup actual Serializable product references. So, I can definitely make this approach work but it leaves a code smell based on my scalability architecture (proprietary or OSS clustering or database blob storage). And this is clearly not POJO by the razor's definition since without the database, JMS provider, JGroups, etc. that gets wired in to UpdatePrice() and all my setter methods, I cannot run this application.
3.B. Spring
Spring has several values, one of which is removing all of the code smell and implementation dependencies from the serialization-type approach. I can actually take all my getters and setters where products are added, deleted, and pricing and inventory info changed and inject a Product instance as a Spring Bean where the bean's lifecycle is abstracted from the getter and setters. I can change my Store constructor and populate it via dependency injection so that the issue with passing references between my ArrayList and HashMap is hidden; Spring can actually map my String lookups to beans on the fly so that I don't have to see the impacts of my scalability-abstractions (database, JGroups, JMS, etc.) in my code. So Spring's dependency injection engine seems to get all the above frameworks to pass the POJO razor. In reality, this is perception. The code still behaves as in the naive serialized blob example in section 3.A. in that all my objects are getting serialized and passed across the network to a database or another application instance. The difference is that I can factor the smell such that it is not visible in Main.java, Store.java, Product.java, etc. This is important because without Spring, this application will not function on a single node, nor will it scale out to multiple nodes. If the dependencies cannot get injected then the instances and references will all be null at runtime. Thus, by our POJO razor, the Spring-version of this app, when clustered using a database, JGroups, or JMS, is no more POJO than it was when hand-coded. It is far superior in maintainability, extensibility, and more, but it is no more POJO than it ever was. (Note that I am in no way suggesting that Spring violates POJO, but more on that later.)
3.C. O/R Mappers
This is where things get interesting. Specifically, object proxying and lazy-loading of object relationships seem like they would keep this application more well-factored than the naive-serialization approach. And, in fact they do. the following bit of Hibernate config:
<class name="demo.Inventory.Product" table="INVENTORY_TABLE">
<id name="id" column="PRODUCT_ID">
<generator class="native"/>
</id>
<property name="price"/>
<property name="name"/>
<property name="sku"/>
<set name="departments" table="DEPARTMENT_TABLE">
<key column="DEPARTMENT_ID"/>
<many-to-many column="PRODUCT_ID" class="demo.Inventory.Product"/>
</set>
now implies that my departments can remain an ArrayList and that the multiple references to the "flashram" product will get resolved correctly by PRODUCT_ID. Again, like Spring, this is great. But it still fails the POJO razor because all the calls we will be making to Hibernate.getSessionFactory().* will actually not run outside the presence of Hibernate. While our application will be very well factored and the database dependency will be modularly tucked away (either in some Spring config or in just a few lines of extra code in our getters and setters) it will not run without its underlying database and data tables.
This is not to say that the Inventory example in the Terracotta download kit should not store its data in a database. In fact, I believe it should. This is merely to say that the razor holds true to my expectations that an application that uses Hibernate and O/R-mapping to scale to multiple application instances by sharing a common database instance is not a POJO app.
3.D. Terracotta
With Terracotta, we have one key line of configuration. It is as follows (visit http://www.terracotta.org/ to learn more):
<root>
<field-name>demo.inventory.Main.store</field-name >
</root>
It says that the field named "store" in Main.java should be clustered. That's it. Which means Terracotta has a chance of passing the razor. But not quite yet because our getters and setters just naively update products with no assumption that Terracotta needs to be told that the objects changed. In other words, the code in Main.java all assumes that object references and identity are not getting violated and if I do something like:
String s = "1GFR";
Product p = Inventory.get( s );
double newPrice = 12.34;
p.updatePrice( newPrice );
That I do not need to do Inventory.put( p ) because p is already in the Inventory HashMap. This is true with Terracotta because it plugs in to the JVM and replicates field-level changes at a heap level. It does not require object serialization and it works with normal Java thread coordination like the synchronized() call on line 27 in updatePrice() in one of the code samples above. In fact, here is the configuration snippet that makes Terracotta work with that code, natively. This configuration dictates that my appliaction's use of synchronization will be used by Terracotta to push heap changes from my app to Terracotta and around to my other app instances as they need it. Basically, it tells Terracotta to use the sync-calls in all methods as lock acquisition and release points:
<locks>
<autolock>
<method-expression>* *..*.*(..)</method-expression>
</autolock>
</locks>
So, by the razor's edge, Terracotta is POJO because the app compiles with /usr/bin/javac and it runs whether or not Terracotta is present. When Terracotta is present, then many instances of this application demo will work together on a shared Inventory HashMap and shared departments ArrayList. If Terracotta is not present, each copy will run stand-alone and changes to pricing in one JVM will not impact any others.
4. Aside: Why Vendors Say “POJO” When they are not
So if so many things get cut by the POJO razor's edge, why is POJO important? The value of POJO is in simplicity, and control. When the developer is in control of his object graph, from data types through object passing and references, he is in control of his domain model. Any framework that calls itself a “POJO framework” does so to connote simplicity and control. In the example above, Hibernate gave us control of our data types but we couldn't pass object references around. We needed to allow Hibernate to maintain the relationship between Departments and Inventory. In the example above, Spring gave us a way to factor out the impacts of serialization on our code, but we still could not pass objects by reference. We had to rely on dependency injection and Spring Beans to do the heavy lifting for us. And, when writing clustering code by hand, the code began to look nothing like its original form and we fear the long term maintainability and extensibility of that code base.
Without passing judgement on the value behind or the validity of any framework, most frameworks are not POJO because most frameworks fail the razor. Most frameworks are, however, trying to copy Spring's success in the market and assert that they help deliver cleanly factored code. The reality is that most frameworks that help factor out infrastructure and operational concerns do so with a combination of Spring and Hibernate, both of which fail the razor. Frameworks that use Spring or Hibernate do not produce any greater POJO-ness in applications than Spring or Hibernate themselves can. Quite the contrary. Spring makes non-POJO and otherwise leaky framework abstractions appear to be as POJO as any other Spring application.
| Framework
| POJO?
| Gaps
|
|
Clustering Summary |
NO |
- |
| JCache implementations |
No |
fails on all components of the POJO definition |
|
JGroups |
No | same as above |
|
JavaSpaces implementations |
No |
Objet Identity, free-typing, Identity fields, Clean Business Interface |
| O/R – Mapping Summary |
NO |
- |
|
Hibernate |
No |
Identity Fields |
|
iBatis |
No |
identity Fields |
|
OpenJPA |
No |
Identity Fields, annotations |
|
Dependency Injection | Can Be | |
|
Spring |
Can Be |
Proxy-free (before Spring 2.0) |
|
Others… |
|
|
|
Messaging |
No |
ALL |
|
App Server Clustering (Tomcat, WLS, WAS) |
No | Object Identity, Clean Business Interface, No Manager |
|
Terracotta
|
Yes |
|
It would seem that the "P" in POJO now tends to stand for "pretend" Java object. Most of the pretenders are vendors and frameworks who produce tools that abstract system concerns. There is no longer a regard to dependencies and quality of application factoring that a framework can provide. More specifically, if good design requires flexibility, reuse, and lack of fragility, most non-POJO frameworks that wrap system concerns in fact introduce a rigid nature to our application. It seems that the basic plan is that if my framework can be dependency injected (like when Spring wraps a framework), those frameworks want to call themselves POJO . The problem is of course that they are the opposite of POJO. And dependency injection is only hiding their bootstrap and boilerplate code...but not the dependencies themselves.
The question I have is when do we as a community adopt something such as the POJO razor and hold our entire community to that yardstick? Or do we even bother? Does claiming POJO matter as much as _being_ POJO? Or should we treat it like "Free trial software" versus "Open Source"? A bait and switch is made up of 2 parts: bait and switch. Bait in this case is saying dependency injection begets POJO. Switch in this case is the reality of framework dependencies and tight coupling. I suppose time will tell. One thing is for certain and this is the fact that POJO in Martin Fowler's definition is highly valuable in keeping our day-to-day as developers sane.