Sunday, February 01, 2009

Asynchronous Write Behinds and the Repository Pattern

The following is a typical implementation of service methods of the domain model of an application. The Repository is injected and is used to persist the domain model or lookup objects from the underlying store. The entire storage and the mechanics of the underlying retrieval is abstracted within the DAO / Repository layer.

public class RestaurantServiceImpl implements RestaurantService {

  @Autowired
  public RestaurantServiceImpl(..) {
    //..
  }

  // injected
  private final RestaurantRepository restaurantRepo;

  public void storeRestaurants(List<Restaurant> restaurants) {
    restaurantRepo.store(restaurants);
  }
}


In a typical layered architecture, the database often proves to be the hardest layer to scale. And in the above implementation, restaurantRepo.store() is a synchronous method that keeps you in abeyance till the data gets persisted across all the layers of your architecture down to the bits and pieces of the underlying relational store. Of course it can be any other store as well - after all, the repository is an abstraction, so it doesn't matter to the application whether you use a relational database, a native file system or a document database underneath. But you get the idea, synchronous communication with the database / hard disk often turns out to be the bottleneck here.

Terracotta provides a nice option of virtualizing your interaction with the database. Async tim (Terracotta Integration Module) provides asynchronous write behind to the database, while the application works on in-memory data structures. Terracotta offers network attached memory with transparent JVM clustering that allows data structures to be *declaratively* clustered. The value proposition here is that, the user can work on the object model, using POJOs, delegating the concerns of persistence to an asynchronous Terracotta process.

Here is an example of the above service extended to handle asynchronous write behinds ..

public class AsyncRestaurantServiceImpl extends RestaurantServiceImpl {

  // need to be clustered
  @Root
  private final AsyncCoordinator<Restaurant> asyncCommitter =
    new AsyncCoordinator<Restaurant>(new RestaurantAsyncConfig(), new NeverStealPolicy<ExamResult>());

  // dependency injected
  private final RestaurantCommitHandler handler;

  @Autowired
  public AsyncRestaurantServiceImpl(..) {
    super();
    asyncCommitter.start(handler, ..);
  }

  @Override
  public void storeRestaurants(List<Restaurant> restaurants) {
    asyncCommitter.add(restaurants);
  }

  //.. other methods
}


The AsyncCoordinator<> is the agent that handles the persistence asynchronously in the background. The class RestaurantCommitHandler contains the actual code that writes the collection of Restaurants to the database. RestaurantCommitHandler implements ItemProcessor<> - instances of ItemProcessor gets bucketed and throttled asynchronously for database commits, while the application continues by adding the objects to be persisted to a POJO.

@Service
public class RestaurantCommitHandler implements ItemProcessor<Restaurant> {
  //..
}


Now, we can take this one step further. The Repository is supposed to abstract the handling of the storage and retrieval - why not abstract the asynchronous persistence within the repository itself and keep the service implementation clean. Then it becomes simply injecting the proper repository to enable asynchrony at the service layer ..

interface RestaurantRepository {
  void store(List<Restaurant> restaurants);
}

class RestaurantRepositoryImpl implements RestaurantRepository {
  public void store(..) {
    //.. standard DAO based implementation
  }
}

class AsyncRestaurantRepositoryImpl implements RestaurantRepository {
  @Root
  private final AsyncCoordinator<ExamResult> asyncCommitter =
    new AsyncCoordinator<Restaurant>(new RestaurantAsyncConfig(), new NeverStealPolicy<Restaurant>());

  // dependency injected
  private final RestaurantCommitHandler handler;

  public AsyncRestaurantRepositoryImpl() {
    super();
    asyncCommitter.start(handler, ..);
  }

  public void store(..) {
    asyncCommitter.add(restaurants);
  }

  //.. other methods
}


I have not yet used the above in any production application. But the idea of decoupling the main processing from the underlying database decreases the write latency of domain objects. And couple this idea with Terracotta's original value proposition of cluster-wide in-process distributed coherent caching, I think it can prove to be a really wicked cool platform for scaling out your application. The system of record (SOR) is now closer to the application, and the database can act as a snapshot for audit trails and reporting purposes. Of course this asynchronous write behind is not suitable for a plug-in into an existing architectural framework where you have lots of loosely coupled systems interconnected through databases. But I guess there can be many use cases for which this can be a viable solution.

However, looking at the current state of Terracotta async write behind framework, one area that concerns me is the lack of an out-of-the-box support for cases when the database may be down for an extended period. The framework leaves it to the client to implement any such failover support. The ItemProcessor is a non-clustered local instance - hence the user can very well catch the ProcessingException and act upon it according to business needs. Still it will be nice to have some support from the framework, where by the application can continue to run in-memory and later can sync up when the database comes up.

Would love to hear some real life stories from anyone with experience to share on usage of Terracotta Async module ..

7 comments:

Sergio Bossa said...

Hi Debasish,

very nice post about DDD concepts + scalability patterns with Terracotta.

Just a minor correction: I don't see in your example where you actually add the list of restaurants to the in-memory store, you just add them to the async coordinator ... am I missing something?

Cheers,

Sergio B.

Unknown said...

Actually the snippet posted is not the complete method. For brevity I have excluded that part which adds the list to the in-memory data structure. Thanks for pointing this out .. I should have indicated that .. will fix.

Thanks.

Tushar Khairnar said...

Terracotta Sample App : Examinator is good example of tim-Async.

Its not real live-application but close to one.

And other features you discussed are being close to In-memory-grid than Asynchronous Write Behind pattern.

Lets hope we see those features soon in Terracotta too.

regards,
Tushar

Alex Miller said...

Nice post, Debasish! In the Examinator reference web app, we use tim-async to do async commits of the user's final exam results to the database. That means that if a big group of users completes a test at the same time, they can all create their results, asynchronously commit, and return a valid page to the user.

The cool thing is that the way tim-async is written, state will stay in Terracotta (and thus be persistent and highly available) until it is dropped in the database. There is a little bit of extra work with Hibernate to ensure that you can handle the narrow window where the data is in both Terracotta and the database. If the system crashes at that point, you have to have a way to ensure that you don't recommit later. We do that by having the commit handler create the id and remember it in Terracotta before we write to the db. That way, we can check whether the key already exists in the db and avoid a re-commit.

It's also worthwhile to note that Hibernate actually does asynchronous writes most of the time behind the scenes, but without this extra layer of reliability.

Anonymous said...

Most DataGrids do write behind out of the box with a flag so while cool, this is kind of table steaks now. It should just do this with no coding.

Unknown said...

@Anonymous : While true that most data grids offer similar capabilities, the primary focus of the post is to highlight how this paradigm fits nicely into the DDD stack's Repository pattern. The way the repository behaves can be abstracted completely from the application layer. And I like the non-invasiveness of the approach that Terracotta offers.

Unknown said...

This post has been cross-posted in Javalobby .. some interesting comments there as well ..