7. 9. 2010

Generic Repository

In this article I'd like to introduce you into the open source project Generic Repository.

The need for Generic Repository
Repository pattern is important part of almost all (web) applications these days. You can see it spreaded out in many applications, tutorials, examples, videos and demos. Everybody is trying to solve one common problem and everybody is doing it differently. Is there a way to unify this?

Generic Repository is open source project that aims to unify implementation of repository pattern and the design of interfaces and queries. Additionaly it provides out of box the implementations for all most popular data mappers like NHibernate, Entity Framework, Linq2Sql or nosql mappers.

Generic repository is not a pattern. It is an unification of implementation of the repository pattern. It provides a base for all repositories you will need in your application. In this way you can easily reuse the code in different application as they will use the same design of the repositories. If other devs are already using generic repository, there is no learning curve as they already know the infrastructure, so it is very easy to adopt it. As a nice side benefit the generic repository provides you the way to change the data mapper. It is not trivial task but with unified interface you can do it without touching your business logic. It is even possible to integrate more than one data mappers and switch between them using one setting in config file.

Example of usage can be found below:
var customer = new Customer { Name = "Peter Bondra", Age = 37 };

var specificationLocator =
  this.IoC.Resolve<ISpecificationLocator>();


using ( var unitOfWork = this.IoC.Resolve<IUnitOfWorkFactory>().BeginUnitOfWork() )
{
  ICustomerRepository cr = this.IoC.Resolve<ICustomerRepository>(unitOfWork, specificationLocator);

  using ( var transaction = unitOfWork.BeginTransaction() )
  {
    cr.Insert(customer);
    transaction.Commit();
  }
}

There are many implementations in repository pattern out there, and most devs are learning it from the demo examples in the internet. One thing that I'd like to point out is the missunderstading of the pattern in respect to queries. I've seen a repository interface with hundrets of various methods to support all kind of queries the application required  - and yes, it was just demo app. Btw the implementation went directly from Microsoft. Most of the devs are adopting this wrong approach into their application, that are often enterprise (or will grow for sure one day to enterprise level).

Generic repository is solving this problem of many query methods in the interface with divide and conquer approach (and with a little help of inversion of control pattern).

ICustomerRepository customerRepository =
  this.IoC.Resolve<ICustomerRepository>(unitOfWork, specificationLocator);

IList<Customer> =
  customerRepository.Specify<ICustomerSpecification>()
  .NameStartsWith("Peter")
  .OlderThan(18)
  .ToResult()
  .Take(3)
  .ToList();

With unified way and out of box tested implementation you can speed up your development and concetrate on the business logic and rules, instead of fighting with repositories and data mappers. Of course you still will have to write your mappings (unless you use schemaless database like nosql Raven). But repository layer has been solved and tested for you and the effort (and the money) can go to other layers.

Check out the sources at http://code.google.com/p/genericrepository/ and let authors know your feedback. There is no one way to do things and if you have ideas for improvements or changes, you can commit a comment on the webside or by mail.

9. 8. 2010

Specification pattern versus Query object pattern

Both specification and query pattern are trying to answer simple question: what entities are satisfying the specified rules. Both functionality gives us the possibility, for example, to filter a collection of customers for those older than 18 years old.

What is the difference between specification and query pattern and when should I use former or latter one? The answer is as usual "it depends" and we will try to walk through the most common scenarios.

Design patterns (OOD) versus Domain patterns (DDD)
A pattern is a reusable solution for a problem in software design. The difference between the two groups is small and it is important to remember that the implementations of the patterns from the design patterns group are normally general. Name of the methods are generic, usually unrelated to the domain of the problem (although it is normal that the implementation is influensed by the domain and model to improve readability and maintainability). On the other side, the implementation of patterns from domain patterns group and domain oriented, implementation contains strong domain specific names and is tightly coupled to the specific domain. This ensures better readability, less need of comments and improves also communication between the devs, testers and business (it is very common that to get the decision from the business we need to clarify with them open questions that are often technical and sometimes involves also names of the methods or objects).

Specifications are usually builded over query objects in case of usage of a data mapper. This means specifications are the layer over the data mapper and it's queries.

The above is nice abstract but what it does mean in real world? Lets examinate both patterns and provide examples.

Query object pattern
Is an object that represents a database query. For example we don't have to go far away, every data mapper (e.g. NHibernate) contains an implementation of the pattern (e.g. NHibernate Criteria). With LINQ functionality in C# 3.0+ we can consider integrated query language as a sort of query object pattern.

The query object is then in the lower level translated to the language of data storage (e.g. SQL). The pattern belongs to the group of design patterns and you can query an object in whatever domain you like. You write generic queries to get your data, using generic AND, OR, LIKE, WHERE statements. Basically it is object oriented implementation of the data storage query language (e.g. SQL). Eventually you can wrap then into a class or other functionality, but this is the place where specification pattern comes to play.

Example (NHibernate Criteria):
var posts =
  session.CreateCriteria<Post>()
  .Add(Restrictions.Eq("Title","Specification pattern versus Query object pattern"))  
  .List<Post>();

Specification pattern
Also represents a database query but as the pattern belongs to the domain patterns group, you specify what entities you want to get from a data storage in domain oriented way. You write your specifications using strong domain oriented names that are selfexplanatory. Usually the implemenation also supports fluent interface to ensure better reusability.

Example
var posts = repository.Specify<IPostSpecification>()
  .WithTitle("Specification pattern versus Query object pattern")
  .ToList();

From the examples above it is clear that the functionality is more or less the same. And it is really true. Specification can be wrapper over a query object that gives you domain specific functionality over generic queries. Another benefit is better support for unit testing and creating stubs or mocks of used specifications.

If we take into account a data mapper that supports LINQ (NHibernate, Entity framework, Linq2Sql, etc), we can build our specifications over the linq functionality. A specification can contain more methods to improve reusability of common functionality:

ICustomerRepository customerRepository =  
  this.IoC.Resolve<ICustomerRepository>(
    unitOfWork
    , specificationLocator
  );

IList = customerRepository.Specify<ICustomerSpecification>()
  .NameStartsWith("Peter")
  .OlderThan(18)
  .ToResult()
  .Take(3)
  .ToList();

Complex queries can be wrapped to a single and dedicated specification (single responsibility principle - SRP) that is easy to test.

IList = customerRepository.Specify<IVeryComplexSpecification>().ToList();

Summary
Both patterns solve the problem of hiding specific data storage query. Query object pattern implementation usually contains general methods like AND, OR, Equal, etc to specify the query. Specification pattern implementation has domain focused method names. The implementation can be wrapper over query object implementation.

The specification pattern usually adds another layer of indirection to your model, when it is builded as wrapper over query object. This brings you another complexity.

On the other hand, the benefit is that the clients of the code (devs) benefit from better flexibility, reusability, readability and communication with business. And the testability is improved, because your code deals with specification in form of interface that is easy to mock.

Check out the open source project generic repository http://code.google.com/p/genericrepository/ for examples of specifications and their usage in generic way.

23. 7. 2010

Specification pattern and complex queries over a repository

In the last post about repository pattern we have discussed the repository interface and it's usage. We now know that the repository acts as in-memory collection for the client and that it hides the complexity of loading and writing data to the persistent storage.

Our first simple prototype of the interface looked like this:

public interface IPostRepository
{
  void Insert(Post post);
  Post GetById(int id);
  void Update(Post post);
  void Delete(Post post);
}

But what about those complex queries we need to deal with? The solution for this problem in DDD manner is Specification pattern.

Specification pattern allows the client to specify which domain entities it wants. Usually from the repository for the entity. Specification pattern is domain pattern, that means the methods names and input/output types are strong with respect to domain and entities. In comparing to query pattern that is generalization of the specification pattern and belongs to category of generic design patterns. It is a good practice to implement it using fluent interface pattern.

Lets have a look on following example:

// domain entity
public class Post
{
  public virtual int Id { get; set; }
  public virtual string Title { get; set; }
  public virtual string Text { get; set; }
  public virtual DateTime Published { get; set; }
  IList<Comment> Comments { get; set; }
}

We have requirements from our functional specification from the customer that he wants to display all posts between certain dates. In other words, we wants Posts than were published after certain date and before certain date. Remember the later sentence and how it will nicely map to our fluent specification interface. This is also important that the business buys and developers share the same language to better understand each other.

Naive and wrong approach would be to extend the interface with a method that accepts two nullable DateTime parameters and returns back list of posts. This is what you will actually see in many examples and tutorials accross the internet these days. Again - don't do it, it is wrong. It is going against object oriented design (OOD) and SOLID principles.

What is wrong with naive approach. First of all, separation of concerns. You don't want to have dozens of methods in the interface for every single use case. Think about usage of such interface, about users of such interface. They would need to implement every single method. And not saying word about testing. In general, repository acts as in-memory collection of your entities, it should not have idea about the queries, about the business requirements or more correctly specifications you would like to do with it. Each specification should resides in a dedicated class, effectively follow Command pattern and all its benefits, like dependency injection. Why the heck you would change working repository implementation in order to implement new query? It just smells bad and it is really not natural.

Correct solution
The one of the right way to do it is make use of Service locator (or dependency injection) and Command patterns. Of course we have to extend the repository interface but this extension is one single method that provides the client with specifications that can filter the requested entities from the repository. Specifying what we want is done via Specification pattern.

On interface level we provide extension point as following:

public interface IPostRepository
{
  ...
 
  // extension point to get specification
  TSpecification Specify<TSpecification>() where TSpecification : class
}

The above method uses underlying service locator to locate configured specification in service locator (DI/IoC container). We have use of generics here so we get strongly typed result back. Lets have a look how such simple specification that uses fluent interface pattern could look like:

public interface IPostSpecification
{
  IPostSpecification PostsAfter(DateTime dateTime);
  IPostSpecification PostsBefore(DateTime dateTime);
  IList<Post> ToList();
}


Note that all methods contain strong domain oriented names. And return the same type, this is called fluent interface. Methods that finalizes the chain of specifications is ToList().

Connecting all interfaces together we get following usage:

// assume that some posts are already in the repository
var postRepository = IoC.Resolve<IPostRepository>();

var posts = postRepository.Specify<IPostSpecification>()
          .PostsAfter(DateTime.Now.AddDays(-100))
          .PostsBefore(DateTime.Now.AddDays(-50))
          .ToList();

We have used fluent interface to specify what we want and asked for the list of results. Note that the fluent interface allows us to specify just one date, or no date at all. The later would result in the list with all posts.

Also it is important to realize that we don't have to put all use cases into one specification. We can have as many specification interfaces as we like. Rememer divide and conquer paradigm? Here we are making use of it.

Now if we want to add new specification to our application, we won't touch the repository at all! We implement new interface, provide implementation with unit tests and configure it in service locator (IoC container). That's it. All is clean, well separated and easy to test.

Initialization of specification
Hawkeyed readers spotted it already. How is specification connected to repository. We can specify our requirements in nice fluent way but what is making the connection? Of course the specification needs to be initialized. This is done in the repository itself. When it resolves the specification from the service locator, the repository should intialize the specification with something that connects it with the collection where the entities are stored. This can be directly the repository itself, or the underlying unit of work instance.

Best way to do it is to provide specification interface with an Initialize method that takes unit of work as an argument. Doing it other way round is will work but it is not that flexible. In our simple example we will provide to the specification whole repository and leave the responsibility of the initialization to the specification itself. The specification then have to cast the repository to concrete type (e.g. NHibernatePostRepository and make use of ISession) and access the unit of work instance to filter requested entities.

So it will look like this:

public interface IPostSpecification
{
  ...
  void Initialize(IPostRepository repository);
}


The concrete implementation of specification is tightly coupled to concrete implementation of repository but this natural and ok. If possible, it can be improved with IQueryable interface. As usual the correct behaviour and configuration will be unsured by unit tests.

For generic implementation and example, make sure you check out generic repository project. Especially interfaces like ISpecification, ISpecificationResult and ISpecificationLocator.

SpecificationResult
Sooner or later when you start to using this approach, you will find out that some functionalities are repeating between specifications. Methods like ToList(), Single(), Take(int count), Skip(int count) will appear basically in all specifications. Remember don't repeat yourself (DRY) principle. The solution is easy and we need to pack the common functionality into the interface I call ISpecificationResult. The interfaces will change like this:

public interface IPostSpecification
{
  ...
  IPostSpecificationResult ToResult(); // replaces ToList and all common methods
}


public interface IPostSpecificationResult
{
  IPostSpecificationResult Take(int count);
  IPostSpecificationResult Skip(int count);
 
  IList<Post> ToList();
  Post Single();
}


Again you can see we are making use of fluent interface. Now all specification related to Posts can concentrate on the business needs and requirements and common functionality is clearly separated.

All of the above can be implemented in a generic way. The open source generic repository projects is aiming to provide you with all necessary interface and base implementations so you can focus just on your domain and the business rules. It is worth to check it out if you are looking for robust and clean design of your application.

22. 7. 2010

Repository pattern

What is repository pattern and why it makes sense to apply it in our application? Let's start with a short look into the history of software design to better understand the need for the pattern in these days.


History
In the old days, the programs were quite simple. User interface, logic and data access code was usually hosted in one file, project, in the end in an executable file. Complexity wasn't high, it somehow worked and everybody was happy. As requirements get more complex and programs bigger and bigger, we started realized that in order to rule the chaos, several rules and patterns has to be introduced. We started divided the functionality into logical pieces, decoupled those parts from each other. We discovered also importance of unit testing and good code is also easy to test. We've understood that it is good thing to divide user interface or views from the domain and the domain from data access code and the data storage. The repository pattern is helping exactly in the above scenarios.


Repository pattern is an interface between your domain model and data mapping functionality. It acts like an in-memory collection of domain entities (objects).


What does the above mean in real world? Consider this blog. Assume the data are stored in a SQL database. Lets say we are using Hibernate data mapper to save our domain model to the database. What is our domain model, our domain entities? Nothing else than classes like Blog, Post, Comment. So far so good. I take instance of lets say Comment, and using Hibernate I can very conveniently save it to the underlying data storage, in our example SQL database. Now, lets repeat the definition of the pattern. Interface between domain model (Post, Blog, Comment) and data mapper (NHibernate, in the end SQL database). Why we need to put something between domain and data mapper if everything works so nice?

When not to use it
If your domain model isn't very complex and your data mapper is designed to aid unit testing, then the benefit will be low in comparing to adding one level of complexity. It is important to remember that repository pattern adds another layer of indirection between the model and data mapper. However - we all know that even very simple one button application can grow to big enterprise applications.

Clean model
However if your model is bigger than a simple application, your application will benefit from applying repository pattern. Important thing is that the repository interface hides the implementation of a data mapper (3rd party or home grown). Repository acts as collection of domain entities (e.g. collection of Comments in a Post in a Blog). To your domain functionality and the other layers, only this interface will be visible. This ensure that your domain won't get poisoned with data mapper code that naturally does not belongs to the domain functionality. It helps you to keep things separated, keep the things under the control.

Unit testing
Additionally, working with the interface and not some concrete classes greatly helps unit testing. It is very easy to mock or implement dummy stub of the interface. Remember that code without test is legacy code.

Loosely coupled design
Last thing that is nice side effect but I consider it as good practice is to separate all (3rd party) tools from the application if possible. This allows we to change tool to different one. Want to switch from NHibernate to Entity framework in .NET world? Not very big problem. Of course, depends of the complexity, but still it gives you the confides that you can do it if it is really needed what whatever reason.

How does the interface look like
Repository is usually strongly domain oriented. In other words, we are usually working with the interface that works concrete domain entity and methods have strong domain names and input/output types. Imagine we have domain entites (classes) Blog, Post and Comment. We would like to read and write posts to the data storage (a database). Common term for this functionality is CRUD (create, read, update, delete). Every repository should support CRUD. Consider following C# code:

public class Post

{
  public int Id { get; set; }
  public string Text { get; set; }
  public IList Comments { get; set; }
}

public interface IPostRepository
{
  // create
  void Insert(Post post);
 
  // read
  Post GetById(int id);
 
  // update
  void Update(Post post);
 
  // delete
  void Delete(Post post);
}
 
The above interface is dedicated to Post entity and with this design you get strongly typing. The client of the repository works with the interface as with in-memory collection of data. It does not care (and does not want to) what is going behind the scene (translate method to sql language, create sql command, execute command, covert result to domain objects). This greatly simplifies design of the client and ensures good testability.


Domain oriented repository is suggested way of working with repository pattern, it helps with Dependency Injection (DI)/Inversion of Control (IoC) pattern and aids readability and strongly typing.

Usually the data mappers provides the above functionality in some form. The implementation of the interace in most cases just wraps the data mapper's methods to provide domain oriented functionality for the entity (e.g. inserting of new Post).

Complex queries
The above interface is nice and easy, but is also very simple and for sure won't satisfies our needs. Usually we need to do complex queries and the interface needs to support it. It means nothing else that extending repository interface with functionality, where the client can specify what entities it wants back. Of course the repository is limited by the functionality of the underlying data mapper but the good news is that most of the data mappers these days support complex queries. Repository just have to wrap it a domain oriented way, that means expose methods with strong domain names and types. It is called Specification pattern (domain oriented), in comparing to Query pattern (generic) that is usually implemented by the data mapper. But this is topic for another post.

Generic Repository
For those hungry minds who doesn't want to wait, you can check out Generic Repository .NET project at http://code.google.com/p/genericrepository/. It is a project that provides build in interfaces for repositories, unit of work, transactions and so on. The goal is to provide base classes for various data mappers like NHibernate, Linq2Sql, Entity Framework and no sql mappers like RavenDb, Mongo or CouchDB, so you don't have to re-implement same functionality over and over again in your projects (which causing also bugs and decreases quality of the product). Accessing data storage is with it very fast. Java, Ruby and other developers can benefit from observing the interfaces. Main entry is base generic repository interface for all strong domain repositories. Complex queries are handler using specification pattern. Testing projects provides simple example for usage and tests. More documentation on the project page.