23. 7. 2010

Specification pattern and complex queries over a repository

In the last post about repository pattern we have discussed the repository interface and it's usage. We now know that the repository acts as in-memory collection for the client and that it hides the complexity of loading and writing data to the persistent storage.

Our first simple prototype of the interface looked like this:

public interface IPostRepository
{
  void Insert(Post post);
  Post GetById(int id);
  void Update(Post post);
  void Delete(Post post);
}

But what about those complex queries we need to deal with? The solution for this problem in DDD manner is Specification pattern.

Specification pattern allows the client to specify which domain entities it wants. Usually from the repository for the entity. Specification pattern is domain pattern, that means the methods names and input/output types are strong with respect to domain and entities. In comparing to query pattern that is generalization of the specification pattern and belongs to category of generic design patterns. It is a good practice to implement it using fluent interface pattern.

Lets have a look on following example:

// domain entity
public class Post
{
  public virtual int Id { get; set; }
  public virtual string Title { get; set; }
  public virtual string Text { get; set; }
  public virtual DateTime Published { get; set; }
  IList<Comment> Comments { get; set; }
}

We have requirements from our functional specification from the customer that he wants to display all posts between certain dates. In other words, we wants Posts than were published after certain date and before certain date. Remember the later sentence and how it will nicely map to our fluent specification interface. This is also important that the business buys and developers share the same language to better understand each other.

Naive and wrong approach would be to extend the interface with a method that accepts two nullable DateTime parameters and returns back list of posts. This is what you will actually see in many examples and tutorials accross the internet these days. Again - don't do it, it is wrong. It is going against object oriented design (OOD) and SOLID principles.

What is wrong with naive approach. First of all, separation of concerns. You don't want to have dozens of methods in the interface for every single use case. Think about usage of such interface, about users of such interface. They would need to implement every single method. And not saying word about testing. In general, repository acts as in-memory collection of your entities, it should not have idea about the queries, about the business requirements or more correctly specifications you would like to do with it. Each specification should resides in a dedicated class, effectively follow Command pattern and all its benefits, like dependency injection. Why the heck you would change working repository implementation in order to implement new query? It just smells bad and it is really not natural.

Correct solution
The one of the right way to do it is make use of Service locator (or dependency injection) and Command patterns. Of course we have to extend the repository interface but this extension is one single method that provides the client with specifications that can filter the requested entities from the repository. Specifying what we want is done via Specification pattern.

On interface level we provide extension point as following:

public interface IPostRepository
{
  ...
 
  // extension point to get specification
  TSpecification Specify<TSpecification>() where TSpecification : class
}

The above method uses underlying service locator to locate configured specification in service locator (DI/IoC container). We have use of generics here so we get strongly typed result back. Lets have a look how such simple specification that uses fluent interface pattern could look like:

public interface IPostSpecification
{
  IPostSpecification PostsAfter(DateTime dateTime);
  IPostSpecification PostsBefore(DateTime dateTime);
  IList<Post> ToList();
}


Note that all methods contain strong domain oriented names. And return the same type, this is called fluent interface. Methods that finalizes the chain of specifications is ToList().

Connecting all interfaces together we get following usage:

// assume that some posts are already in the repository
var postRepository = IoC.Resolve<IPostRepository>();

var posts = postRepository.Specify<IPostSpecification>()
          .PostsAfter(DateTime.Now.AddDays(-100))
          .PostsBefore(DateTime.Now.AddDays(-50))
          .ToList();

We have used fluent interface to specify what we want and asked for the list of results. Note that the fluent interface allows us to specify just one date, or no date at all. The later would result in the list with all posts.

Also it is important to realize that we don't have to put all use cases into one specification. We can have as many specification interfaces as we like. Rememer divide and conquer paradigm? Here we are making use of it.

Now if we want to add new specification to our application, we won't touch the repository at all! We implement new interface, provide implementation with unit tests and configure it in service locator (IoC container). That's it. All is clean, well separated and easy to test.

Initialization of specification
Hawkeyed readers spotted it already. How is specification connected to repository. We can specify our requirements in nice fluent way but what is making the connection? Of course the specification needs to be initialized. This is done in the repository itself. When it resolves the specification from the service locator, the repository should intialize the specification with something that connects it with the collection where the entities are stored. This can be directly the repository itself, or the underlying unit of work instance.

Best way to do it is to provide specification interface with an Initialize method that takes unit of work as an argument. Doing it other way round is will work but it is not that flexible. In our simple example we will provide to the specification whole repository and leave the responsibility of the initialization to the specification itself. The specification then have to cast the repository to concrete type (e.g. NHibernatePostRepository and make use of ISession) and access the unit of work instance to filter requested entities.

So it will look like this:

public interface IPostSpecification
{
  ...
  void Initialize(IPostRepository repository);
}


The concrete implementation of specification is tightly coupled to concrete implementation of repository but this natural and ok. If possible, it can be improved with IQueryable interface. As usual the correct behaviour and configuration will be unsured by unit tests.

For generic implementation and example, make sure you check out generic repository project. Especially interfaces like ISpecification, ISpecificationResult and ISpecificationLocator.

SpecificationResult
Sooner or later when you start to using this approach, you will find out that some functionalities are repeating between specifications. Methods like ToList(), Single(), Take(int count), Skip(int count) will appear basically in all specifications. Remember don't repeat yourself (DRY) principle. The solution is easy and we need to pack the common functionality into the interface I call ISpecificationResult. The interfaces will change like this:

public interface IPostSpecification
{
  ...
  IPostSpecificationResult ToResult(); // replaces ToList and all common methods
}


public interface IPostSpecificationResult
{
  IPostSpecificationResult Take(int count);
  IPostSpecificationResult Skip(int count);
 
  IList<Post> ToList();
  Post Single();
}


Again you can see we are making use of fluent interface. Now all specification related to Posts can concentrate on the business needs and requirements and common functionality is clearly separated.

All of the above can be implemented in a generic way. The open source generic repository projects is aiming to provide you with all necessary interface and base implementations so you can focus just on your domain and the business rules. It is worth to check it out if you are looking for robust and clean design of your application.

22. 7. 2010

Repository pattern

What is repository pattern and why it makes sense to apply it in our application? Let's start with a short look into the history of software design to better understand the need for the pattern in these days.


History
In the old days, the programs were quite simple. User interface, logic and data access code was usually hosted in one file, project, in the end in an executable file. Complexity wasn't high, it somehow worked and everybody was happy. As requirements get more complex and programs bigger and bigger, we started realized that in order to rule the chaos, several rules and patterns has to be introduced. We started divided the functionality into logical pieces, decoupled those parts from each other. We discovered also importance of unit testing and good code is also easy to test. We've understood that it is good thing to divide user interface or views from the domain and the domain from data access code and the data storage. The repository pattern is helping exactly in the above scenarios.


Repository pattern is an interface between your domain model and data mapping functionality. It acts like an in-memory collection of domain entities (objects).


What does the above mean in real world? Consider this blog. Assume the data are stored in a SQL database. Lets say we are using Hibernate data mapper to save our domain model to the database. What is our domain model, our domain entities? Nothing else than classes like Blog, Post, Comment. So far so good. I take instance of lets say Comment, and using Hibernate I can very conveniently save it to the underlying data storage, in our example SQL database. Now, lets repeat the definition of the pattern. Interface between domain model (Post, Blog, Comment) and data mapper (NHibernate, in the end SQL database). Why we need to put something between domain and data mapper if everything works so nice?

When not to use it
If your domain model isn't very complex and your data mapper is designed to aid unit testing, then the benefit will be low in comparing to adding one level of complexity. It is important to remember that repository pattern adds another layer of indirection between the model and data mapper. However - we all know that even very simple one button application can grow to big enterprise applications.

Clean model
However if your model is bigger than a simple application, your application will benefit from applying repository pattern. Important thing is that the repository interface hides the implementation of a data mapper (3rd party or home grown). Repository acts as collection of domain entities (e.g. collection of Comments in a Post in a Blog). To your domain functionality and the other layers, only this interface will be visible. This ensure that your domain won't get poisoned with data mapper code that naturally does not belongs to the domain functionality. It helps you to keep things separated, keep the things under the control.

Unit testing
Additionally, working with the interface and not some concrete classes greatly helps unit testing. It is very easy to mock or implement dummy stub of the interface. Remember that code without test is legacy code.

Loosely coupled design
Last thing that is nice side effect but I consider it as good practice is to separate all (3rd party) tools from the application if possible. This allows we to change tool to different one. Want to switch from NHibernate to Entity framework in .NET world? Not very big problem. Of course, depends of the complexity, but still it gives you the confides that you can do it if it is really needed what whatever reason.

How does the interface look like
Repository is usually strongly domain oriented. In other words, we are usually working with the interface that works concrete domain entity and methods have strong domain names and input/output types. Imagine we have domain entites (classes) Blog, Post and Comment. We would like to read and write posts to the data storage (a database). Common term for this functionality is CRUD (create, read, update, delete). Every repository should support CRUD. Consider following C# code:

public class Post

{
  public int Id { get; set; }
  public string Text { get; set; }
  public IList Comments { get; set; }
}

public interface IPostRepository
{
  // create
  void Insert(Post post);
 
  // read
  Post GetById(int id);
 
  // update
  void Update(Post post);
 
  // delete
  void Delete(Post post);
}
 
The above interface is dedicated to Post entity and with this design you get strongly typing. The client of the repository works with the interface as with in-memory collection of data. It does not care (and does not want to) what is going behind the scene (translate method to sql language, create sql command, execute command, covert result to domain objects). This greatly simplifies design of the client and ensures good testability.


Domain oriented repository is suggested way of working with repository pattern, it helps with Dependency Injection (DI)/Inversion of Control (IoC) pattern and aids readability and strongly typing.

Usually the data mappers provides the above functionality in some form. The implementation of the interace in most cases just wraps the data mapper's methods to provide domain oriented functionality for the entity (e.g. inserting of new Post).

Complex queries
The above interface is nice and easy, but is also very simple and for sure won't satisfies our needs. Usually we need to do complex queries and the interface needs to support it. It means nothing else that extending repository interface with functionality, where the client can specify what entities it wants back. Of course the repository is limited by the functionality of the underlying data mapper but the good news is that most of the data mappers these days support complex queries. Repository just have to wrap it a domain oriented way, that means expose methods with strong domain names and types. It is called Specification pattern (domain oriented), in comparing to Query pattern (generic) that is usually implemented by the data mapper. But this is topic for another post.

Generic Repository
For those hungry minds who doesn't want to wait, you can check out Generic Repository .NET project at http://code.google.com/p/genericrepository/. It is a project that provides build in interfaces for repositories, unit of work, transactions and so on. The goal is to provide base classes for various data mappers like NHibernate, Linq2Sql, Entity Framework and no sql mappers like RavenDb, Mongo or CouchDB, so you don't have to re-implement same functionality over and over again in your projects (which causing also bugs and decreases quality of the product). Accessing data storage is with it very fast. Java, Ruby and other developers can benefit from observing the interfaces. Main entry is base generic repository interface for all strong domain repositories. Complex queries are handler using specification pattern. Testing projects provides simple example for usage and tests. More documentation on the project page.