Thursday, December 24, 2009

A case for hybrid SQL - NoSQL stack

Alex Popescu talks about Drizzle replication in his MyNoSql column. He makes a very interesting observation in his post regarding Drizzle's replication capabilities into a host of NoSQL storage backends ..

"Leaving aside the technical details — which are definitely interesting .., the solution using the Erlang AMQP .. implementation RabbitMQ .. — I think this replication layer could represent a good basis for SQL-NoSQL hybrid solutions".

We are going to see more and more of such hybrid solutions in the days to come. Drizzle does it at the product level. Using RabbitMQ as the transport, Drizzle can replicate data as serialized Java objects to Voldemort, as JSON marshalled objects to Memcached or as a hashmap to column family based Cassandra.

Custom implementation projects have also started using the hybrid stack of persistent stores. When you are dealing with real high volumes and patterns of access where you cannot use joins, anyway you need to denormalize. ORMs cease to be a part of your solution toolset. Data access patterns vary widely across the profile of clients using your system. If you are running an ecommerce suite and your product is launched you may have an explosive use of the shopping cart module. It makes every sense to move your shopping cart from the single relational data store where it was lying around and have it served through a more appropriate data store that lives up to the scalability requirements. It's not that you need to throw away your relational database that has served you so long. Like Alex mentioned, you can always go along with a hybrid model.

In one of our recent projects, we were using Oracle as the main relational database for a securities trading back office solution implementation. The database load was computed based on all calculations that were done initially. In a very late stage of the project a new requirement came up that needed heavy processing and storage of semi-structured data and meta-data from an external feed. Both the data and the meta-data were extensible which meant that it was difficult to model them with a fixed schema.

We could not afford frequent schema changes since it would entail long downtime of the production database. But there also was the requirement that after processing of these semi-structured data lots of them will have to be made available in the production database. We could have modeled it following the key/value paradigm in Oracle itself, which we were using anyway as the primary database. But that's again going down the age old saying of the hammer and nail story.

We decided to supplement the stack with another data store that fits the bill for this specific use case. We used MongoDB, that gave us phenomenal performance for our requirements. We were getting the feed from external data sources and loaded our MongoDB database with all the semi-structured data and meta-data. All necessary processing was done in MongoDB on those data and relevant information from MongoDB were pushed to JMS based queues for consumption by appropriate services that copied data asynchrnously to our Oracle servers.

What did we achieve with the above architecture ?

  1. Kept Oracle free to do what it does the best.

  2. Took away unnecessary load from production database servers.

  3. Introduced a document database for serving a requirement tailor made for its use - semi structured data, mainly reads, no constraints, no overhead of ORM ceremony. MongoDB supports a very clean programming model, a very decent query interface, simple to use and easy to convince your client.

  4. Used message based mapping to sync up data ASYNCHRONOUSLY between the nosql MongoDB and sql based Oracle. Each of the data stores were doing what they do the best, keeping us away from the blames of the hammer-nail paradigm.


With more and more of the nosql stores coming up, message based replication is going to play a very important role. Even within the nosql datastore, we are seeing choices of sql based storage backends being offered. Voldemort offers MySql as one of the storage backends - so the hybrid model starts right up there. It's always advisable to use multiple storage that fits your use case than trying to force-fit everything into a single paradigm.

Monday, December 14, 2009

No Comet, Hacking with WebSocket and Akka

Over the weekend I upgraded my Chrome to the developer channel and played around a bit with Web Sockets. It's really really exciting and for sure will make us think of Web applications in a different way. Web Sockets have been described as the "TCP for the Web" and will standardize bidirectional communication technology for Web applications.

The way we think today of push technology is mostly through Comet bsed implementations that use "long polling". The browser sends a request to the server and then waits for an event to happen and then sends a response. The client, on getting the response consumes it, closes the socket and does a new long polling connection to the server. The communication is not symmetric and connections can drop off and on between the client and the server. Also, the fact that Comet implementations are not portable across Web containers make it an additional headache to portability of applications.

Atmosphere offers a portable framework to implementing Ajax/Comet based applications for the masses. But still it does not make the underlying technology painless.

Web Sockets will make applications much more symmetric - instead of having long polling it's all symmetric push/pull. Once you get a Web Socket connection, you can exchange data between the browser and the server through send() method and an onmessage event handler.

I used this snippet to hack up a bidirectional communication channel with an Akka actor based service stack that I was already using in an application. I needed to hack up a very very experimental Web Socket server, which did not take much of an effort. The actors could do message exchange with the Web Socket and communicate further downstream to a CouchDB database. I already had all the goodness of transaction processing using Akka STM. But now I could get rid of much of the client code that looked like magic and replace it with something that's more symmetric from the application point of view. It just feels like a natural extension to socket based programming on the client. Also the fact that Javascript is ideally suited for an event driven programming model makes the client code much cleaner with the Web Socket API.

Talking about event based programming, we all saw the awesomeness that Ryan Dahl demonstrated a few weeks back with evented I/O in Node.js. Now people have already started developing experimental implementations of the Web Socket protocol for Node.js. And I noticed just now that Jetty also has a WebSocket server.

Tuesday, November 03, 2009

DSLs In Action - Updates on the Book Progress

DSLs In Action has been out for a month now in MEAP with the first 3 chapters being published. DSL is an emerging topic and I am getting quite some feedback from the readers who have already purchased the MEAP edition. Thanks for all the feedback.

Writing a book also has lots of similarities with coding. Particularly the refactoring part. The second attempt to articulate a piece of thought is almost always better than the first attempt, much like coding. You cannot imagine how many times I have discarded my first attempt and rewrote the stuff only to get a better feel of what I try to express for my readers.

Anyway, this post is not about book writing. Since the MEAP is out, I thought I should post a brief update on the progress since then .. here they are more as a list of bullet points :-

I delivered Chapter 4 and already got quite a few feedbacks on it. Chapter 4 starts the DSL Implementation section of the book. From here onwards expect lots of code snippets, zillions of implementation techniques that you may try out on your IDE. Chapter 4 deals with Implementation Patterns in Internal DSL. I discuss some of the common idioms that you will be using while implementing internal DSLs. As per the premise of the book, the snippets and idioms are in Ruby, Groovy, Clojure and Scala. It's more of a focus on the strengths of each of these languages that help you implement well designed DSLs. This chapter prepares the launching pad for some more extensive DSL implementations in Chapter 5. It's not there in MEAP yet, but here's the detailed ToC for Chapter 4.

Chapter 4 : Internal DSL Implementation Patterns

4.1. Building up your DSL Toolbox
4.2. Embedded DSL - Patterns in Meta-programming
4.2.1. Implicit Context and Smart APIs
4.2.2. Dynamic Decorators using Mixins
4.2.3. Hierarchical Structures using Builders
4.2.4. New Additions to your Toolbox
4.3. Embedded DSL - Patterns with Typed Abstractions
4.3.1. Higher Order Functions as Generic Abstractions
4.3.2. Explicit Type Constraints to model Domain logic
4.3.3. New Additions to your Toolbox
4.4. Generative DSL - Boilerplates for Runtime Generation
4.5. Generative DSL - One more tryst with Macros
4.6. Summary
4.7. Reference

Chapter 5 is also in the labs right now. But almost complete. I hope to post another update soon ..

Meanwhile .. Enjoy!

Monday, November 02, 2009

NOSQL Movement - Excited with the coexistence of Divergent Thoughts

Today we are witnessing a great bit of excitement with the NoSQL movement. Call it NoSQL (~SQL) or NOSQL (Not Only SQL), the movement has a mission. Not all applications need to store and process data the same way, and the storage should also be architected accordingly. Till today we have always been force-fitting a single hammer to drive every nail. Irrespective of how we process data in our application we have traditionally stored them as rows and columns in a relational database.

When we talk about really big write scaling applications, relational databases suck big time. Normalized data, joins, acid transactions are definite anti-patterns in write scalability. You may think sharding will solve your problems by splitting data into smaller chunks. But in reality, the biggest problem with sharding is that relational databases have never been designed for it. Sharding takes away many of the benefits that relational databases have traditionally been built for. Sharding cannot be an afterthought, sharding intrudes into the business logic of your application and joining data from multiple shards is definitely a non trivial effort. As long as you can scale up your data model vertically by increasing the size of your box, that's possibly the sanest way to go for. But Moore .. *cough* .. *cough* .. Even if you are able to scale up vertically, try migrating a really large MySQL database. It will take hours, and even days. That's one of the problems why some companies are moving to schemaless databases when their applications can afford to.

For horizontal scalability of an application if we sacrifice normalization, joins and ACID transactions, why should we use an RDBMS ? You don't need to .. Digg is moving to Cassandra from MySQL. It all depends on your application and the kind of write scalability that you need to achieve in processing of your data. For read scalability, you can still manage using read-only slaves replicating everything coming to the master database in realtime and setting up a smart proxy router between your clients and the database.

The biggest excitement that the NOSQL movement has created today is because of the divergence of thoughts that each of the products is promising. This is very much unlike the RDBMS movement which started as a single hammer named SQL that's capable of munging rows and columns of data based on the theory of mathematical set operations. And every application adopted the same storage architecture irrespective of how they process the data from within their application. One thing led to another, people thought they can solve this problem with yet another level of indirection .. and the strange thingy called an Object Relational Mapper was born.

At last it needed the momentum of the Web shaped data processing to make us realize that all data are not processed alike. The storage that works so well for your desktop trading application will fail miserably in a social application where you need to process linked data, more in the shape of a graph. The NOSQL community has responded with Neo4J, a graph database that offers easy storage and traversal of graph structures.

If you want to go big on write scalability, the only way out is decentralization and eventual consistency. The CAP theorem kicks in, and you need to compromise on at least one of consistency, availability and network partition tolerance. Riak and Cassandra offer decentralized data stores that can potentially scale indefinitely. If your application needs more structure than a key-value database, you can go for Cassandra, the distributed, peer-to-peer, column oriented data store. Have a look at the nice article from Digg which compares their use case between a relational storage and the columnar storage that Cassandra offers. For a document oriented database with all the goodness of REST and JSON, Riak is the option to choose. Also Riak offers linked map/reduce with the option to store linked data items, much in the way the Web works. Riak is truly a Web shaped data store.

CouchDB has yet another very interesting value proposition in this whole ecosystem of NOSQL databases. Most of the applications are inherently offline and need seamless and painless replication facilities. CouchDB's B-Tree based storage structure, append only operations with MVCC based model of concurrency control, lockless operations, REST APIs and incremental map/reduce operations position it with a sweet enough spot in the space of local browser storage. Chris Anderson, one of the core developers of CouchDB sums up the value of CouchDB in today's Web based world very nicely ..

"CouchApps are the product of an HTML5 browser and a CouchDB instance. Their key advantage is portability, based on the ubiquity of the html5 platform. Features like Web Workers and cross-domain XHR really make a huge difference in the fabric of the web. Their availability on every platform is key to the future of the web."

MongoDB, like CouchDB is also a document store. It doesn't offer REST out of the box, but it's based on JSON storage. It has map/reduce as well, but also offers a strong suite of query APIs much like SQL. This is the main sweet spot of MongoDB, which plays very well to people coming from a SQL background. MongoDB also offers master slave replication and has been working towards an autosharding based scalability and failover support.

There are quite a few other data stores that offer solutions to problems that you face in everyday application design. Caching, worker queues requiring atomic push/pop operations, processing activity streams, logging data etc. Redis and Tokyo Cabinet are nice fits for such use cases. You can think of Redis as a memcached with a backup persistent key-value database. It's single threaded, uses non-blocking IO and is blazing fast. Redis, besides offering every day key/value storage also offer list and sets to be stored along with atomic operations on each of them. Pick the one that fits your bill the best.

Another interesting aspect is the interoperability between these data stores. Riak, for example offers pluggable data backends - possibly we can have CouchDB as the data backend for Riak (can we ?). Possibly we will also see a Cassandra backend for Neo4J. It's extremely heartening to see that each of these communities has a deep sense of cooperation in making the entire ecosystem more meaningful and thriving.

Sunday, October 18, 2009

Are ORMs really a thing of the past ?

Stephan Schmidt has blogged on the ORMs being a thing of the past. While he emphasizes on ORMs' performance concerns and dismisses them as leaky abstractions that throw LazyInitializationException, he does not present any concrete alternative. In his concluding section on alternatives he mentions ..

"What about less boiler plate code due to ORMs? Good DAOs with standard CRUD implementations help there. Just use Spring JDBC for databases. Or use Scala with closures instead of templates. A generic base dao will provide create, read, update and delete operations. With much less magic than the ORM does."

Unfortunately, all these things work on small projects with a few number of tables. Throw in a large project with a complex domain model, requirements for relational persistence and the usual stacks of requirements that today's enterprise applications offer, you will soon discover that your home made less boilerplated stuff goes for a toss. In most cases you will end up either rolling out your own ORM or start building a concoction of domain models invaded with indelible concerns of persistence. In the former case, obviously your ORM will not be as performant or efficient as the likes of Hibernate. And in the latter case, either you will end up building an ActiveRecord model with the domain object mirroring your relational table or you may be more unfortunate with a bigger unmanageable bloat.

It's very true that none of the ORMs in the market today are without their pains. You need to know their internals in order to make them generate efficient queries, you need to understand all the nuances to make use of their caching behaviors and above all you need to manage all the reams of jars that they come with.

Yet, in the Java stack, Hibernate and JPA are still the best of options when we talk about big persistent domain models. Here are my points in support of this claim ..

  • If you are not designing an ActiveRecord based model, it's of paramount importance that you keep your domain model decoupled from the persistent model. And ORMs offer the most pragmatic way towards this approach. I know people will say that it's indeed difficult to achieve this in a real life world and in typical situations compromises need to be made. Yet, I think if you need to make compromise for performance or whatever reasons, it's only an exception. Ultimately you will find that the mjority of your domain model is decoupled enough for a clean evolution.

  • ORMs save you from writing tons of SQL code. This is one of the compelling advantages that I have found with an ORM that my Java code is not littered with SQL that's impossible to refactor when my schema changes. Again, there will be situations when your ORM may not churn out the best of optimized SQLs and you will have to do that manually. But, as I said before, it's an exception and decisions cannot be made based on exceptions only.

  • ORMs help you virtualize your data layer. And this can have huge gains in your scalability aspect. Have a look at how grids like Terracotta can use distributed caches like EhCache to scale out your data layer seamlessly. Without the virtualization of the ORM, you may still achieve scalability using vendor specific data grids. But this comes at the price of lots of $$ and the vendor lock-ins.


Stephan also feels that the future of ORMs will be jeopardized because of the advent of polyglot persistence and nosql data stores. The fact is that the use cases that nosql datastores address are very much orthogonal to those served by the relational databases. Key/value lookups with semi-structured data, eventual consistency, efficient processing of web scale networked data backed with the power of map/reduce paradigms are not something that your online transactional enterprise application with strict requirements of ACID will comply with. So long we have been trying to shoehorn every form of data processing with a single hammer of relational databases. It's indeed very refreshing to see the onset of nosql paradigm and it being already in use in production systems. But ORMs will still have their roles to play in the complementary set of use cases.

Tuesday, October 06, 2009

DSLs in Action : Sharing the detailed Table of Contents (WIP)

Just wanted to share the detailed Table of Contents of the chapters that have been written so far. Please send in your feedbacks either as comments on this post or in the Author Online Forum. The brief ToC is part of the book home page. Let me know of any other topic that you wopuld like to see as part of this book.

Chapter 1. Learning to speak the Language of the Domain


1.1. The Problem Domain and the Solution Domain
1.1.1. Abstractions as the Core

1.2. Domain Modeling - Establishing a Common Vocabulary

1.3. Role of Abstractions in Domain Modeling

1.3.1. Minimalism publishes only what YOU promise
1.3.2. Distillation Keeps only what You need
1.3.3. Extensibility Helps Piecemeal Growth
1.3.3.1. Mixins - A Design Pattern for Extensibility
1.3.3.2. Mixins for extending MAP
1.3.3.3. Functional Extensibility
1.3.3.4. Extensibility can be Monkey Business too
1.3.4. Composability comes from Purity
1.3.4.1. Design Patterns for Composability
1.3.4.2. Back to Languages
1.3.4.3. Side-effects and Composability
1.3.4.4. Composability and Concurrency

1.4. Domain Specific Language (DSL) - It's all about Expressivity
1.4.1. Clarity of Intent
1.4.2. Expressivity's all about Well-Designed Abstractions

1.5. When do we need a DSL
1.5.1. The Advantages
1.5.2. The Disadvantages

1.6. DSL - What's in it for Non-Programmers?

1.7. Summary
1.8. Reference


Chapter 2. Domain Specific Languages in the Wild


2.1. A Motivating Example
2.1.1. Setting up the Common Vocabulary
2.1.2. The First Java Implementation
2.1.3. Externalize the Domain with XML
2.1.4. Groovy - a more Expressive Implementation Language
2.1.4.1. Executing the Groovy DSL

2.2. Classification of DSLs
2.2.1. Internal DSL Patterns - Commonality and Variability
2.2.1.1. Smart APIs, Fluent Interfaces
2.2.1.2. Code Generation through Runtime Meta-programming
2.2.1.3. Code Generation through Compile time Meta-programming
2.2.1.4. Explicit Abstract Syntax Tree manipulation
2.2.1.5. Pure Embedding of Typed Abstractions
2.2.2. External DSL Patterns - Commonality and Variability
2.2.2.1. Context driven String Manipulation
2.2.2.2. Transforming XML to Consumable Resource
2.2.2.3. Non-textual Representations
2.2.2.4. Mixing DSL with Embedded Foreign Code
2.2.2.5. Parser Combinator based DSL Design

2.3. Choosing DSL Implementations - Internal OR External

2.4. The Meta in the DSL
2.4.1. Runtime Meta-Programming in DSL Implementation
2.4.2. Compile time Meta-Programming in DSL Implementation

2.5. Lisp as the DSL

2.6. Summary
2.7. Reference


Chapter 3. DSL Driven Application Development


3.1. Exploring DSL Integration

3.2. Homogeneous Integration
3.2.1. Java 6 Scripting Engine
3.2.2. A DSL Wrapper
3.2.3. Language Specific Integration Features
3.2.4. Spring based Integration

3.3. Heterogeneous Integration with External DSLs
3.4. Handling Exceptions
3.5. Managing Performance

3.6. Summary
3.7. Reference


Chapter 4. Internal DSL Implementation Patterns


4.1 Building up your DSL Toolbox

4.2 Embedded DSL - Patterns in Meta-programming
4.2.1 Implicit Context and Smart APIs
4.2.2 Dynamic Decorators using Mixins
4.2.3 Hierarchical Structures using Builders
4.2.4 New Additions to your Toolbox

4.3 Embedded DSL - Patterns with Typed Abstractions
4.3.1 Higher Order Functions as Generic Abstractions
4.3.2 Explicit Type Constraints to model Domain logic
4.3.3 New Additions to your Toolbox

4.4 Generative DSL - Boilerplates for Runtime Generation
4.5 Generative DSL - one more tryst with Macros

4.6 Summary
4.7 References

Monday, October 05, 2009

DSLs in Action now in MEAP

My book DSLs in Action (see sidebar) is now available in MEAP (Manning Early Access Program). I have planned it to be one totally for the real world DSL implementers. It starts with a slow paced introduction to abstraction design, discusses principles for well-designed abstractions and then makes a deep dive to the world of DSL based development.

The first part of the book focuses on usage of DSLs in the real world and how you would go about setting up your DSL based development environment. The second part is focused entirely on implementation techniques, patterns and idioms and how they map to the various features offered by today's programming languages. The book is heavily biased towards the JVM. The three most discussed languages are Scala, Ruby and Groovy, with some snippets of Clojure as well.

The book is still very much a WIP. Please send all of your feedbacks in the Author's Forum. It can only make the quality better.

Enjoy!

Sunday, October 04, 2009

Pluggable Persistent Transactors with Akka

NoSql is here. Yes, like using multiple programnming languages, we are thinking in terms of using the same paradigm with storage too. And why not? If we can use an alternate language to be more expressive for a specific problem, why not use an alternate form of storage that is a better fit for your requirement?

More and more projects are using alternate forms of storage for persistence of the various forms of data that the application needs to handle. Of course relational databases have their very own place in this stack - the difference is that people today are not being pedantic about their use. And not using the RDBMS as the universal hammer for every nail that they see in the application.

Consider an application that needs durability for transactional data structures. I want to model a transactional banking system, basic debit credit operations, with a message based model. But the operations have to be persistent. The balance needs to be durable and all transactions need to be persisted on the disk. It doesn't matter what structures you store underneath - all I need is some key/value interface that allows me to store the transactions and the balances keyed by the transaction id. I don't even need to bother what form of storage I use at the backend. It can be any database, any key-value store, Terracotta or anything. Will you give me the flexibility to make the storage pluggable? Well, that's a bonus!

Enter Akka .. and its pluggable persistence layer that you can nicely marry to its message passing actor based interface. Consider the following messages for processing debit/credit operations ..


case class Balance(accountNo: String)
case class Debit(accountNo: String, amount: BigInt, failer: Actor)
case class MultiDebit(accountNo: String, amounts: List[BigInt], failer: Actor)
case class Credit(accountNo: String, amount: BigInt)
case object LogSize



In the above messages, the failer actor is used to report fail operations in case the debit fails. Also we want to have all of the above operations as transactional, which we can make declaratively in Akka. Here's the basic actor definition ..

class BankAccountActor extends Actor {
  makeTransactionRequired
  private val accountState = 
    TransactionalState.newPersistentMap(MongoStorageConfig())
  private val txnLog = 
    TransactionalState.newPersistentVector(MongoStorageConfig())
  //..
}



  • makeTransactionRequired makes the actor transactional

  • accountState is a persistent Map that plugs in to a MongoDB based storage, as is evident from the config parameter. In real life application, this will be further abstracted from a configuration file. Earlier I had blogged about the implementation of the MongoDB layer for Akka persistence. accountState offers the key/value interface that will be used by the actor to maintain the durable snapshot of all balances.

  • txnLog is a persistent vector, once again backed up by a MongoDB storage and stores all the transaction logs that occurs in the system


Let us now look at the actor interface that does the message receive and process the debit/credit operations ..

class BankAccountActor extends Actor {
  makeTransactionRequired
  private val accountState = 
    TransactionalState.newPersistentMap(MongoStorageConfig())
  private val txnLog = 
    TransactionalState.newPersistentVector(MongoStorageConfig())

  def receive: PartialFunction[Any, Unit] = {
    // check balance
    case Balance(accountNo) =>
      txnLog.add("Balance:" + accountNo)
      reply(accountState.get(accountNo).get)

    // debit amount: can fail
    case Debit(accountNo, amount, failer) =>
      txnLog.add("Debit:" + accountNo + " " + amount)
      val m: BigInt =
      accountState.get(accountNo) match {
        case None => 0
        case Some(v) => {
          val JsNumber(n) = v.asInstanceOf[JsValue]
          BigInt(n.toString)
        }
      }
      accountState.put(accountNo, (- amount))
      if (amount > m)
        failer !! "Failure"
      reply(- amount)

    //..
  }
}


Here we have the implementation of two messages -

  • Balance reports the current balance and

  • Debit does a debit operation on the balance


Note that the interfaces that these implementations use is in no way dependent on the MongoDB specific APIs. Akka offers a uniform key/value API set across all supported persistent storage. And each of the above pattern matched message processing fragments offer transaction semantics. This is pluggability!

Credit looks very similar to Debit. However, a more interesting use case is the MultiDebit operation that offers a transactional interface. Just like your relational database's ACID semantics, the transactional semantics of Akka offers atomicity over this message. Either the whole MultiDebit will pass or it will be rollbacked.


class BankAccountActor extends Actor {

  //..
  def receive: PartialFunction[Any, Unit] = {

    // many debits: can fail
    // demonstrates true rollback even if multiple puts have been done
    case MultiDebit(accountNo, amounts, failer) =>
      txnLog.add("MultiDebit:" + accountNo + " " + amounts.map(_.intValue).foldLeft(0)(+ _))
      val m: BigInt =
      accountState.get(accountNo) match {
        case None => 0
        case Some(v) => BigInt(v.asInstanceOf[String])
      }
      var bal: BigInt = 0
      amounts.foreach {amount =>
        bal = bal + amount
        accountState.put(accountNo, (- bal))
      }
      if (bal > m) failer !! "Failure"
      reply(- bal)
    
    //..
  }
}



Now that we have the implementation in place, let's look at the test cases that exercise them ..

First a successful debit test case. Note how we have a separate failer actor that reports failure of operations to the caller.

@Test
def testSuccessfulDebit = {
  val bactor = new BankAccountActor
  bactor.start
  val failer = new PersistentFailerActor
  failer.start
  bactor !! Credit("a-123", 5000)
  bactor !! Debit("a-123", 3000, failer)
  val b = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n) = b
  assertEquals(BigInt(2000), BigInt(n.toString))

  bactor !! Credit("a-123", 7000)
  val b1 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n1) = b1
  assertEquals(BigInt(9000), BigInt(n1.toString))

  bactor !! Debit("a-123", 8000, failer)
  val b2 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n2) = b2
  assertEquals(BigInt(1000), BigInt(n2.toString))
  assertEquals(7, (bactor !! LogSize).get)
}


And now the interesting MultiDebit that illustrates the transaction rollback semantics ..

@Test
def testUnsuccessfulMultiDebit = {
  val bactor = new BankAccountActor
  bactor.start
  bactor !! Credit("a-123", 5000)
  val b = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n) = b
  assertEquals(BigInt(5000), BigInt(n.toString))

  val failer = new PersistentFailerActor
  failer.start
  try {
    bactor !! MultiDebit("a-123", List(500, 2000, 1000, 3000), failer)
    fail("should throw exception")
  } catch { case e: RuntimeException => {}}

  val b1 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n1) = b1
  assertEquals(BigInt(5000), BigInt(n1.toString))

  // should not count the failed one
  assertEquals(3, (bactor !! LogSize).get)
}


In the snippet above, the balance remains at 5000 when the debit fails while processing the final amount of the list passed to MultiDebit message.

Relational database will always remain for the use case that it serves the best - persistence of data that needs a true relational model. NoSQL is gradually making its place in the application stack for the complementary set of use cases that need a much loosely coupled model, a key/value database or a document oriented database. Apart from easier manageability, another big advantage using these databases is that they do not need big ceremonious ORM layers between the application model and the data model. This is because what you see is what you store (WYSIWYS), there is no paradigm mismatch that needs to be bridged.

Sunday, September 27, 2009

The Thrush combinator in Scala

In his book To Mock a Mockingbird, Raymond Smullyan teaches combinatory logic using songbirds in a forest. He derives important results combining various combinators, all using the birds of the enchanted forest. Combinators are an effective tool in designing abstractions with functional programming principles. They are reusable units, make you code very concise, without losing on the expressivity. More than the implementation, the names can go a long way in establishing a common vocabulary of programming idioms and techniques. In an earlier post, I talked about Kestrel, and its implementation in Scala for handling side-effects in an abstraction.

In this post, I look at Thrush, a permuting combinator. A Thrush is defined by the following condition: Txy = yx. Thrush reverses the order of evaluation. Raganwald talks about Thrush and its implementations in Ruby in an excellent post quite some time back.

Why would you want to reverse the order of evaluation in a computation ? Well, if you value readability of your code, and have been cringing at how a mix of expression oriented pipelines and nested function calls can make your code less readable, then a Thrush may be for you.

Consider the Ruby example that Raganwald discusses in his first example of Thrush implementation.

lambda { |x| x * x }.call((1..100).select(&:odd?).inject(&:+))

The argument to the proc is a pipeline expression that reads nicely from left to right : get the list of numbers from 1 to 100, select the odd ones and add them up. What the proc does is it finds the square of its input number. But the proc invocation is a function call, which, though is the last in sequence to be executed, has to be the first one that you read. You can find the Ruby implementation in Raganwald's blog that transforms the above code to a left-to-right pipeline expression using Thrush.

Let's try to see how we can do the same in Scala ..

In Scala, I can write the above as ..

((x: Int) => (* x))((1 to 100).filter(% 2 != 0).foldLeft(0)(_+_))

Almost the same as the Ruby code above, and has the similar drawback in readability.

Let's define the Scala Thrush combinator ..

case class Thrush[A](x: A) {
  def into[B](g: A => B): B = {
    g(x)
  }
}


Immediately we can write the above invocation as ..

Thrush((1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _))
  .into((x: Int) => x * x)


A very simple combinator, a permuting one that pushes the function to where it belongs in the line of readability. If you want to be more succinct, commit the sin of defining an implicit for your use case ..

implicit def int2Thrush(x: Int) = Thrush(x)

and immediately the above transforms to ..

(1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _)
  .into((x: Int) => x * x)


Does it read better ?

In fact with this implicit definition, you can go chaining into all the way ..

(1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _)
  .into((x: Int) => x * x)
  .into(* 2)


While designing domain APIs that need to be expressive, this technique can often come very handy. Here's an example that uses the Thrush combinator to ensure a clean pipeline expression flowing into a piece of DSL.

//..
accounts.filter(_ belongsTo "John S.") 
        .map(_.calculateInterest)
        .filter(> threshold)
        .foldLeft(0)(+ _)
        .into {x: Int =>
          updateBooks journalize(Ledger.INTEREST, x)
        }
//..

Sunday, September 13, 2009

Misconstrued Language Similarity Considered Harmful

Very frequently I come across posts of the form Language X for Language Y programmers. It's not that there is anything wrong with them, but, more often than not, the underlying tone of such posts is to highlight some apparent (and often misconstrued) similarities between the two languages.

Objects in Java and processes in Erlang have some similarity in the sense that both of them abstract some state of your application. But that's where the similarity ends - the rest is all gaping differences. Objects in a class oriented OO language like Java and processes in a concurrency oriented language like Erlang are miles apart in philosophy, usage, implementation and granularity. Read Joe Armstrong's Why OO sucks for details. But for the purpose of this post, the following statement from Joe suffices to nip in the bud any logic that claims objects in Java are similar to processes in Erlang ..

"As Erlang became popular we were often asked "Is Erlang OO" - well, of course the true answer was "No of course not" - but we didn't to say this out loud - so we invented a serious of ingenious ways of answering the question that were designed to give the impression that Erlang was (sort of) OO (If you waved your hands a lot) but not really (If you listened to what we actually said, and read the small print carefully)."

Similarity breeds Contentment

It's always comforting to have a base of similarity. It's human nature to force find a similar base and transform one's thought process with respect to it. With programming languages, it works only if the two languages share the same philosophy and implement similar ideas. But you can never learn a new language by pitching it against your own favorite language and identifying apparent similarities. It is these apparent similarities that tend to influence how you think of the idioms of the new language and you will be misled into believing and practising something that will lead you to the path of antipatterns. Consider static typing and dynamic typing - a debate that has possibly been beaten to death. But when you are making the change, learn to think in the new paradigm. It's foolish to think in terms of concrete types in a dynamically typed setting. Think in terms of the contracts that the abstraction implemnents and organize your tests around them. Ola Bini wrote a fantastic post on this in response to a twitter discussion that originated from me.

The starting point should always be the philosophy and the philosophical differences that the two languages imbibe. Haskell typeclasses may seem similar in many respects to polymorphism in object-oriented languages. In fact it is more similar to parametric polymorphism, while the most dominant form of polymorphism in OO is subtype polymorphism. And Haskell being a functional language does not support subtyping.

This post relates pattern matching in Erlang and conditionals in Java in the same vein. It's true both of them offer some form of dispatch in program control. But the more significant and idiomatic difference that matters in this context is between the conditional statements in Java and expressions in the functional setting of Erlang. It's this expression based programming that influences the way you structure your code in Erlang. Instead of highlighting upfront the similarity of both constructs as means of flow control, emphasize on the difference in thought process that gives your program a different geometry. And finally, the most important use of pattern matching is in programming with algebraic data types, a whole new idiom that gets unveiled.

Ok, if you want to write a post on language X for prospective programmers of language Y, go ahead and highlight the differences in philosophy and idioms. And then try to implement your favorite feature from language Y in X. Misconstrued similarities often bias programmers new to language X the wrong way.

Sunday, September 06, 2009

Side-effects with Kestrel in Scala

Consider the following piece of logic that we frequently come across in codebases ..

  1. val x = get an instance, either create it or find it

  2. manipulate x with post-creation activities (side-effects)

  3. use x


Step 2 is only for some side-effecting operations, maybe on the instance itself or for some other purposes like logging, registering, writing to database etc. While working on the serialization framework sjson, I have been writing pieces of code that follows exactly the above pattern to create Scala objects out of JSON structures. Now if you notice the above 3 steps, step 2 looks like being a part of the namespace which calls the function that sets up x. But logically step 2 needs to be completed before we can use x. Which means that step 2 is also a necessary piece of logic that needs to be completed before we hand over the constructed instance x to the calling context.

One option of course is to make Step 2 a part of the function in Step 1. But this is not always feasible, since Step 2 needs access to the context of the caller namespace.

Let's look at an idiom in Scala that expresses the above behavior more succinctly and leads us into implementing one of the most popularly used objects in combinatory logic.

Consider the following method in Scala that creates a new instance of a class with the arguments passed to it ..


def newInstance[T](args: Array[AnyRef])(implicit m: Manifest[T]): T = {
  val constructor = 
    m.erasure.getDeclaredConstructors.first
  constructor.newInstance(args: _*).asInstanceOf[T]
}



I can use it like ..

newInstance[Person](Array("ghosh", "debasish"))

for a class Person defined as ..

case class Person(lastName: String, firstName: String)

It's often the case that I would like to have some operations on the new instance after its creation which will be pure side-effects. It may or may not mutate the new instance, but will be done in the context of the new instance. I can very well do that like ..

val persons = new ListBuffer[Person]()
val p = newInstance[Person](Array("ghosh", "debasish"))
persons += p  // register to the list
persons.foreach(mail("New member has joined: " + p)) // new member mail to all
//.. other stuff


It works perfectly .. but we can make the code more expressive if we can somehow be explicit about the context of the block of code that needs to go with every new instance of Person being created. Maybe something like ..

newInstance[Person](Array("ghosh", "debasish")) { p =>
  persons += p
  persons.foreach(mail("New member has joined: " + p))
  //.. other stuff
}


This clearly indicates that the side-effecting steps of adding to the global list of persons or sending out a mail to every member is also part of the creation process of the new person. The effect is the same as the earlier example, only that it delineates the context more clearly. Though at the end of it all, it returns only the instance that it creates.

Consider another example of a good old Java bean ..

class Address {
  private String street;
  private String houseNumber;
  private String city;
  private String zip;

  //..
  //.. getters and setters
}


Working with a reflection based library it's not uncommon to see code that instantiates the bean using the default constructor and then allow clients to set the instance up with custom values .. something like ..

var p = Person(..)
val pAddress =
  newInstance[Address](null) {=>
    a.setStreet("Market Street")
    a.setHouseNumber("B/102")
    a.setCity("San Francisco")
    a.setZip("98032")
    p.address = a
    p.mail("Your address has been changed to: " + a)
  }


Once again the block is only for side-effects, which can contain lots of other custom codes that depends on the caller's context. Make it more concise, DSLish using the object import syntax of Scala ..

var p = Person(..)
val pAddress =
  newInstance[Address](null) {=>
    import a._
    setStreet("Market Street")
    setHouseNumber("B/102")
    setCity("San Francisco")
    setZip("98032")
    p.address = a
    p.mail("Your address has been changed to: " + a)
  }


Looks like a piece of idiom that can be effective as part of your programming repertoire. Here is the version of newInstance that allows you to make the above happen ..


def newInstance[T](args: Array[AnyRef])(op: T => Unit)(implicit m: Manifest[T]): T = {
  val constructor = 
    m.erasure.getDeclaredConstructors.first
  val v = constructor.newInstance(args: _*).asInstanceOf[T]
  op(v)
  v
}



Looking carefully at newInstance I realized that it is actually the Kestrel combinator that Raymond Smullyan explains so eloquently in his amazing book To Mock a Mockingbird. A bird K is called a Kestrel if for any birds x and y, (Kx)y = x. And that's exactly what's happening with newInstance. It does everything you pass onto the block, but ultimately returns the new instance that it creates. A nice way to plug in some side-effects. Reg has blogged about Kestrels in Ruby - the tap method in Ruby 1.9 and returning in Rails. These small code pieces may not be as ceremonious as the popularly used design patterns, but just as effective and provide deep insights into how our code needs to be structured for expressivity. Next time you discover any such snippet that you find useful for sharing, feel free to write a few lines of blog about it .. the community will love it ..