Monday, January 6, 2014

ORM hierarchy design in Scala

I recently started working heavily with Scala. Scala is a really expressive language based on both object oriented and functional programming paradigms. I often think of it as the 21st century evolution of Java.

One of the tasks I have been busy with is designing a hierarchy of ORM in Scala to interface with a data store. The nature of the data store is irrelevant. It can be either a relational database or a no-SQL solution. The issues that I am facing are the same for either solution.

The requirements of my ORM hierarchy are simple:
  1. Each entity must be modeled by an immutable and final class;
  2. Each entity should be easily convertible to a map from field name to field value;
  3. Each entity should contain a "static" field indicating the table name;
  4. It would be nice to have the ability to pattern-match on the entities.
Having immutable objects is one of the staples of Scala programming and it makes a lot of sense to have this as a requirement for a Scala based ORM entity. Immutable objects make it much easier to reason about code and ensure thread safety. On top of that, having final entities ensures additional safety.

Being able to convert an entity to a map from field names and field values makes it easy to write methods to store the entity in a denormalized fashion, i.e., using a No-SQL data store such as MongoDB or Riak. Moreover, as I will show in the following examples, it will simplify the code quite a lot.

The perfect candidates for ORM entities in Scala are case classes. Case classes are immutable, allow pattern matching, provide compiler generated accessors and compiler generated hashcode and equals method, which reduce the amount of boilerplate code and thus of error proneness, and not least, are serializable out of the box. The only tricky thing with case classes is the conversion to/from a map as per requirement 2. This can be done using reflection or using macros but it's not trivial.

There is however one big limitations to case classes which gets them out of the picture. The number of fields that a case class can contain is limited to 22. This is a design limitation that will be lifted with by Scala 2.11. However, before the 2.11 milestone releases are not binary compatible with 2.10, thus requiring all the dependent libraries to be recompiled for 2.11. This might not be an easy task for most. Although many would argue that this limitation can be worked around using nested field, most ORM designers know that it's very easy to have wide denormalized tables with more than 22 fields, especially in legacy systems.

Mainly for this reason and for requirement 2, I came up with a different solution to this problem. Let's assume to have a couple of entities as in this code snippet:

 sealed trait GenericEntity {  
  val id: Option[String]  
  val tableName: String  
  val fields: Map[String, Any]  
 }  
 sealed trait FirstEntity extends GenericEntity {  
  val tableName = "first"  
  val x: String  
  val y: Option[String]  
  val z: Option[Int]  
 }  
 sealed trait SecondEntity extends GenericEntity {  
  val tableName = "second"  
  val w: Option[FirstEntity]  
  val t: Float  
  val i: Option[String]  
 }  

The entities are modeled as sealed traits and define the methods as per requirements 2 and 3. The traits are sealed so that the actual implementation can only be defined in the same file (enforcing requirement 1).

Now to the implementation of the entities. I will define a companion object for the two entities in which the apply method is responsible to create concrete instances of the traits. The unapply method allows for pattern matching although that will only work for entities with less than 22 fields. Here's the implementation of the two companion objects:

  object FirstEntity {  
   def apply(id: Option[String], x: String, y: Option[String] = None, z: Option[Int] = None): FirstEntity = new FirstEntityImpl(id, x, y, z)  
   def unapply(f: FirstEntity) = Some((f.id, f.x, f.y, f.z))  
   private case class FirstEntityImpl(id: Option[String] = None, fields: Map[String, Any]) extends FirstEntity {  
    def this(id: Option[String], x: String, y: Option[String], z: Option[Int]) =  
     this(id, {  
      val m = collection.mutable.Map[String, Any]()  
      m("x") = x  
      addTo(m, "y", y)  
      addTo(m, "z", z)  
      Map() ++ m.toMap  
     })  
    val x: String = fields("x").asInstanceOf[String]  
    val y: Option[String] = fields.get("y").asInstanceOf[Option[String]]  
    val z: Option[Int] = fields.get("z").asInstanceOf[Option[Int]]  
   }  
  }  
  object SecondEntity {  
   def apply(id: Option[String], w: Option[FirstEntity] = None, t: Float, i: Option[String] = None): SecondEntity = new SecondEntityImpl(id, w, t, i)  
   private case class SecondEntityImpl(id: Option[String] = None, fields: Map[String, Any]) extends SecondEntity {  
    def this(id: Option[String], w: Option[FirstEntity], t: Float, i: Option[String]) =  
     this(id, {  
      val m = collection.mutable.Map[String, Any]()  
      m("t") = t  
      addTo(m, "w", w)  
      addTo(m, "i", i)  
      Map() ++ m.toMap  
     })  
    val w: Option[FirstEntity] = fields.get("w").asInstanceOf[Option[FirstEntity]]  
    val t: Float = fields("t").asInstanceOf[Float]  
    val i: Option[String] = fields.get("i").asInstanceOf[Option[String]]  
   }  
  }  
  private def addTo(map: collection.mutable.Map[String, Any], k: String, v: Option[Any]) = v.foreach(map(k) = _)  

The companion objects provide the concrete implementation of the entities. This is a simple case class which defines the entity id and a map of fields. The id field is there just for convenience but could be kept in the map itself instead (or be absent). The immutable map representing the whole entity is created at construction time. The entity fields (optional or mandatory) are then extracted from the map. One of the flaws with this design is that a cast is necessary at object creation. Another (minor) flaw is that the field keys show up twice in the code (this could be changed by defining them as private strings, leading to a little more safety although a little more verbose). With some leg work the apply/unapply methods could be automatically generated using a macro on the trait and a map from val name to field name.

Here's a full example of how to use these objects:

 object Main extends App {  
  val f: FirstEntity = FirstEntity(id = Some("123"), x = "asdf", z = Some(5324))  
  val g: FirstEntity = FirstEntity(id = Some("123"), x = "asdf", z = Some(5324))  
  assert(f == g)  
  assert(f.id.get == "123")  
  assert(f.x == "asdf")  
  assert(f.y.isEmpty)  
  assert(f.z.get == 5324)  
  assert(f.fields == Map("x" -> "asdf", "z" -> 5324))  
  val x: SecondEntity = SecondEntity(id = None, w = Some(f), t = 123.232f, i = Some("blah"))  
  // this line will cause: recursive value f needs type  
  // val z = SecondEntity(id = None, w = Some(f), t = 123.232f, i = Some("blah"))  
  val y: SecondEntity = SecondEntity(id = None, w = Some(g), t = 123.232f, i = Some("blah"))  
  assert(x == y)  
  assert(x.fields == Map("w" -> f, "t" -> 123.232f, "i" -> "blah"))  
  doSomething(f)  
  def doSomething(f: FirstEntity) {  
   f match {  
    case FirstEntity(Some(id), xx, Some(yy), Some(zz)) => println(s"$id, $xx, $yy, $zz")  
    case FirstEntity(Some(id), xx, None, Some(zz)) => println(s"$id, $xx, $zz")  
    case _ => println("not matched")  
   }  
  }  
  sealed trait GenericEntity {  
   val id: Option[String]  
   val tableName: String  
   val fields: Map[String, Any]  
  }  
  sealed trait FirstEntity extends GenericEntity {  
   val tableName = "first"  
   val x: String  
   val y: Option[String]  
   val z: Option[Int]  
  }  
  sealed trait SecondEntity extends GenericEntity {  
   val tableName = "second"  
   val w: Option[FirstEntity]  
   val t: Float  
   val i: Option[String]  
  }  
  object FirstEntity {  
   def apply(id: Option[String], x: String, y: Option[String] = None, z: Option[Int] = None): FirstEntity = new FirstEntityImpl(id, x, y, z)  
   def unapply(f: FirstEntity) = Some((f.id, f.x, f.y, f.z))  
   private case class FirstEntityImpl(id: Option[String] = None, fields: Map[String, Any]) extends FirstEntity {  
    def this(id: Option[String], x: String, y: Option[String], z: Option[Int]) =  
     this(id, {  
      val m = collection.mutable.Map[String, Any]()  
      m("x") = x  
      addTo(m, "y", y)  
      addTo(m, "z", z)  
      Map() ++ m.toMap  
     })  
    val x: String = fields("x").asInstanceOf[String]  
    val y: Option[String] = fields.get("y").asInstanceOf[Option[String]]  
    val z: Option[Int] = fields.get("z").asInstanceOf[Option[Int]]  
   }  
  }  
  object SecondEntity {  
   def apply(id: Option[String], w: Option[FirstEntity] = None, t: Float, i: Option[String] = None): SecondEntity = new SecondEntityImpl(id, w, t, i)  
   private case class SecondEntityImpl(id: Option[String] = None, fields: Map[String, Any]) extends SecondEntity {  
    def this(id: Option[String], w: Option[FirstEntity], t: Float, i: Option[String]) =  
     this(id, {  
      val m = collection.mutable.Map[String, Any]()  
      m("t") = t  
      addTo(m, "w", w)  
      addTo(m, "i", i)  
      Map() ++ m.toMap  
     })  
    val w: Option[FirstEntity] = fields.get("w").asInstanceOf[Option[FirstEntity]]  
    val t: Float = fields("t").asInstanceOf[Float]  
    val i: Option[String] = fields.get("i").asInstanceOf[Option[String]]  
   }  
  }  
  private def addTo(map: collection.mutable.Map[String, Any], k: String, v: Option[Any]) = v.foreach(map(k) = _)  
 }  

Using the companion object is convenient since it makes the pattern matching syntax more intuitive. It's possible to maintain the same syntax and have many different implementations of the same ORM hierarchies in the same file by encapsulating "pseudo" companion objects in their own ORM object container as in:

 // sealed trait classes definitions...  
 object RDBMSEntities {  
  object FirstEntity {  
   // apply function/case class for RDBMS  
  }  
  object SecondEntity {  
   // apply function/case class for RDBMS  
  }  
 }  
 object NoSQLEntities {  
  object FirstEntity {  
   // apply function/case class for NoSQL DB  
  }  
  object SecondEntity {  
   // apply function/case class for NoSQL DB  
  }  
 }  

And then the different implementations can be used by importing one or the other implementation, i.e., import RDBMSEntities._ or import NoSQLEntities._.

To summarize, while we wait for Scala 2.11 to be released and be popular enough to have most third party libraries compiled against it, the approach described can be a suitable solution to ORM design using Scala. More work is required to create macros that write a lot of boilerplate code for us, but the code in this post is a start. Follow lists of pros and cons.

Pros:
  • all entities are traits -> additional logic/mixins possible in the implementation;
  • compiler-generated equals and hashCode methods -> less boilerplate/errors;
  • pattern matching is possible (with less than 22 fields) -> nice to have;
  • immutable entities (case classes/immutable map) -> more safety;
  • the map can be used to easily store the entities in denormalized form, e.g., in MongoDB or RDBMS -> less boilerplate/errors, no need for annotations;
  • the object corresponding to each trait might be generated using a Scala macro (TODO!) -> less boilerplate/errors;
  • all entities are in fact sealed and can't be extended -> more security.
Cons
  • need to write: apply (and unapply), accessors in case class, additional constructor -> more boilerplate (unless macro-generated);
  • pattern matching impossible for entities with more than 22 fields -> not a deal breaker;
  • requires one cast per accessor in the case class creation -> more boilerplate/less safety.

No comments:

Post a Comment