When scale out goes wrong...
I met with someone last week who has a massively scaled out system. The application contains user / document information. The system is designed for linear scalability today, and this is achieved mostly by hand.
With a series of load balancers and a good partitioning scheme, the architecture delivers about 500 partitions each with fewer than 10,000 users stored inside. This system scales linearly, all on top of a series of relational database instances. The nasty truth is that scale out has become too expensive.
The goal: get more out of each partition. How? By offloading the DB altogether. Detaching from it and keeping the transient data transient--in memory where it belongs.
Have you hit the scalability wall even though your scalability is linear? Tell me more.
--Ari