Thursday, August 31, 2006

Caching and O/R mapping

I was forwarded an article titled Why a cache in an O/R mapper doesn't make it fetch data faster and felt the need redirect the audience to some examples demonstrating the benefits of a well designed caching systems.

The author appears to be concerned you don't know if the data in the cache is dirty.  Although, fails to point at the reason for this uncertainty.  There are many reasons this could be the case, but why not provide an API or services for external applications, forcing them through you O/R framework (perhaps in an abstracted way), ensuring the cache is kept up to date.  I've read several articles (and my own tests with applications I've worked with) that demonstrate typical applications perform a greater percentage of reads than writes, further suggesting a cache is beneficial.

What’s a cache?

I’d like to refer to this definition, in particular “A cache stores recently-used information in a place where it can be accessed extremely fast.”. I disagree that the Identity Map (or uniqing) Frans refers to is necessarily the only cache used in O/R frameworks.

Caches and queries: more overhead than efficiency

Quoting from the author’s article,

So, when does this efficiency the myth talks about occur exactly?  Well, almost never.  In fact, using a cache is often less efficient.  I said almost, as there are situations where a cache can help, though these are minor or require a lot of consessions.

NHibernate (Hibernate ported to .NET) was developed with 2nd level caching at its core, allowing you to control the caching options per class, such as whether it writes-through or invalidates it's entry on a write to the database, which I think is important in demonstrating a cache’s benefits.  Additionally, NHibernate can cache query results as a list of IDs, furthering the benefit of its cache and increasing the hits.

The 2nd level cache is completely pluggable, so you have many options, including the ASP.NET cache. For even greater scalability, consider a distributed caching solution such as memcached or NCache. This is counter to the point ‘.NET’ doesn’t have support for cross process or server object awareness.True, it’s not built in to the framework, but solutions certainly exist.

I now want to direct you to this article, Hibernate: Truly Understanding the Second-Level and Query Caches, which goes in to all the detail you need to understand just how powerful a well designed O/R framework and caching subsystem can be.

In the author’s example, we could create a named NHibernate query, with appropriate parameters, represented in HQL as

IQuery query = session.CreateQuery("from Customer C inner join C.Orders O where C.Orders.size > 5 and O.ModifyDate > :date").SetDateTime("date", DateTime.Now.AddDays(-30));

For the cache to be effective and accurate, all changes to the database must notify the cache, so we know the entry in the cache is current, and therefore do not need to go to the database.

No comments: