Tutorial: Caching virtual attributes in the database

Ever since I released Searchlogic I’ve been getting a lot of really cool feedback. I love when people contact me with a unique way they used Searchlogic or a challenging / cool new feature they would like to see added. Challenges are what make programming fun, to see what you are capable of. So here is my challenge for the day:

I’ve been having a lot of conversations via email with people using Searchlogic, that go like the following:

Some person: Hi Ben, How do I order by an order’s total with Searchlogic?

Me: Cache the total in your database. Your idol, Ben

Some person: Before Searchlogic, I was just doing: Order.all(:order => “(quantity * price) ASC”). Is there a way to do this in Searchlogic?

Me: No, you shouldn’t order data like that. …long explanation of what to do…. Your idol, Ben

Some person: Thank you so much Ben, I can’t thank you enough for everything you have done for me. You truly are my idol. There should be a statue in your honor.

Maybe it didn’t go exactly like that, but you get the point. So instead of having to retype the “long explanation of what to do”, I can just link them to this post:

This can be simple or complicated depending on how you calculate your total. Let’s start with the simple:

Simple solution

Your total method looks something like to following. Dealing with only local attributes.

def total
    quantity * price
end

Your first thought might be to sort via the ruby sort method. This is wrong on so many levels, not to mention you already have a tool that is extremely good at sorting data: your database.

Awesome! We’ll just do something like this:

Order.all(:order => "(quantity * price) ASC")

Wrong again! This is bad for a number of reasons:

  1. Your code is no longer DRY. What if you want to add in tax later on down the road? You’ll have to do it in multiple places.
  2. (quantity * price) can not be indexed. The whole point of indexing is to sort.

Now you’re saying “WTF Ben?!?!?1!?11″. Relax, why not just cache the total in the database?

class Order < ActiveRecord::Base
    before_validation :cache_virtual_columns

    def total
        quantity * price
    end

    private
        def cache_virtual_columns
            self.total = total
        end
end

Now you can index the total column and order just like you would with any other column. Yay!

Don’t celebrate yet. Chances are your total probably isn’t that simple.

Advanced solution

Maybe your total method looks like:

def total
    line_items.sum(:total) + tax + shipping_cost
end

Oh no! What happens when the total of a line item changes? The total in the orders table is no longer accurate. Your total now depends on other models. This is where observers shine:

class OrderObserver < ActiveRecord::Observer
  observe :order, :line_item

  def before_validation(obj)
    case obj
    when Order
      obj.total = obj.total
    end
  end

  def after_save(obj)
    case obj
    when LineItem
      obj.order.update_attribute(:total, obj.order.total)
    end
  end
end

You could really set this up a number of ways. You could create a CacheObserver that handles all caching for all models. This could get messy and confusing really fast, which is why I recommend the above example. My philosophy on OO programming is that classes should be responsible for their self. If the Order class is calculating the total it should be responsible for keeping it up to date in the database too.

“But Ben, this looks messy and complicated”. I agree with you, to an extent, but there is no simple solution here, because there is no pattern. Your total method could be anything you want. For all I know, your total method could be outsourced to a team in India that calculates the total by hand.

Sometimes in programming you have to give up something to gain something. I feel like writing some simple code in an observer is worth having a cached value of the total in the database. It makes everything so much cleaner, easier, and simpler in your controllers and views. Models should do all of the heavy lifting anyways.

Obviously, this is not just limited to total. You could do this for pretty much any virtual attribute in your models.

What do you think? I am always interested in feedback.

  • Share/Save/Bookmark


12 Responses to “Tutorial: Caching virtual attributes in the database”

  1. Glenn says:

    My only problem with this approach is that it assumes your rails application is the only thing interacting with the database. It would be quite conceivable for some other dev/application to come along and update quantity and price, and there is no constraint in the DB to ensure that total is automatically updated. As a result, the data has become inconsistent and your rails app will be returning incorrect data.

    The Order.total(:order => "(quantity * price) ASC") style syntax never has that as a concern.

    Of course, if your rails app is the only interface to the database and will remain that way, the point is moot.

  2. Hongli says:

    Glenn: I can’t really agree with you there. Why not just document rules like this and make everybody who touch the database directly read the rules? It’s like API contracts: if you violate the precondition, then everything that goes wrong is your fault.

  3. Ben Johnson says:

    Glenn, thanks for your input, but I think your example demonstrates incorrect usage of rails. The whole point of models and rails robust support for resources is that everything has to travel through the app. That’s a big reason why rails is so great, because it makes doing this so easy. When you have another app that needs the same data, you interact with that data through a resource. If other apps are interacting directly with the database you have more serious problems, like making sure you data is consistent. What about making sure all data passes validations the other app is expecting it to pass? What about email notification and other "hooks"? Having multiple apps mess with the same data defies the basic principles behind rails. At that point everything needs to be at the database level to remain consistent, including validations, hooks, etc. Correct me if I’m wrong, but the proper way to share data between apps is through resources, not by sharing a database. The being said my above example should work fine, and the sky is the limit on how your data is managed, since all changes to the data have to pass through your app.

  4. jblanche says:

    The line
    obj.total = obj.total
    looks strange to me !
    Is it a typo ? If not, can you explain that ?

  5. nachokb says:

    jblanche,

    this snippet could be written as

    obj.total=(obj.total)

    notice that the method "total()" is custom defined and computes it (by way of "quantity * price"). The method "total=()" is the standard, non-overriden attribute writer.

    Even though the choice of their names is a little regretable, it’s valid.

    Perhaps a better alternative is naming the column/attribute in a more explicit way, e.g. "total_cache" so you would have "total_cache=()" and "total_cache()" methods, which are just attribute accessors, and "total()", which would compute it.

    What I still dislike is that there’s one concern which is not encapsulated: updating the cache.

    The Observer or Callback should mess only with WHEN to do it, not HOW. That should be delegated to the model. Something like "update_total_cache!()" would be my choice (leaving "total[=]" as the attribute).

    The only difference would be when you call "total()" and it still hasn’t been saved…

    nachokb

  6. Ben Johnson says:

    jblanch, nachokb is right. I am just updating the attribute variable behind the scenes so that ActiveRecord picks it up and saves it into the DB. It looks strange but its correct.

    nachokb, thanks for your input. You can set this up any way you want, this is just how I do it. In regards to a total_cache method. I’m not a big fan of doing that because when you are using an Order object you should not have any "insider" knowledge. You should just call obj.total and the Order class will do what it needs to do to make sure you get the correct value. Not overwriting total will return the value from the database, which could be stale, since quantity or price could change.

    Lastly, I agree on your comment with the observers, to an extent. But in my example the total method is calculating the total. Also, it’s one line of code. I don’t think its worth creating an entirely new method in the Order class for this. Another advantage of using an observer is that you can clean up and simplify your models.

  7. Interesting write-up but, far too complicated for something that belongs squarely in the domain of the database. Your line "(quantity * price) can not be indexed" is only true of toy databases that are basically SQL interfaces to a flat-file database. Here’s the solution in PostgreSQL:

    – create our table
    create table items (price int4, quantity int4);

    – index price * quantity
    create index items_total_idx on items ((price*quantity));

    – insert a bunch of values
    insert into items (price, quantity) values (1,2), (1,23), (134,2), (132,2), (1,234), (1,652);

    – behold the glory that is a real database
    rich=# explain select * from items order by price * quantity;
    QUERY PLAN
    ——————————————————————————
    Index Scan using items_total_idx on items (cost=0.00..12.36 rows=6 width=8)
    (1 row)

  8. Ben Johnson says:

    Rich, thanks for your comments. I agree, a nice database can be nice to use. But the benefits of the above approach extend beyond indexing. The MAIN purpose of the above approach is to keep your code DRY. The quantity * price example was just to make a point. I have yet to have a total method that was that simple. Have fun creating an index for:

    def total
    tickets.sum(:total) + packages.sum(:total) + suites.sum(:total) + tax + shipping_cost + promotion_discount
    end

    Then writing that equation in SQL all over the place for ordering, conditions, etc is just bad programming.

    Call me what you want, but all that I want my database to do is store and index data, nothing else. Anything beyond that should be in my models. Having multiple applications share the same database, or storing validations / hooks / triggers in the database is old school. Resources solve that problem, and resources let you do just about anything, because resources are written in ruby . How about emailing a receipt when a new order is created? There are a million blog posts and conversations about this and there are a million benefits to doing it this way.

  9. Glenn Gillen says:

    Rails does make things easy, saying that is the reason everything should travel through the app is naive at best though. There are countless examples I could provide where it is completely unrealistic to expect all communications to go via a REST interface. Databases aren’t just about storing data, they’re about doing so in an efficient fashion. Putting a REST interface over the top is killing that efficiency, to a level that is unacceptable for many low latency applications (high volume telephony immediately springs to mind).

    REST is just another re-incarnation of the SOA approach that came before it, albeit a much much much better implementation. Don’t let poor implementation decisions within ActiveRecord foster poor implementation decisions in your own project. Take a look at DataMapper for a framework that doesn’t make the same mistakes.

    Creating an application and believing that it will exist for all eternity as the sole gatekeeper of information will lead to problems. This logic belongs in the database, has existed there for a long time in every DB implementation I’ve used in production and is "old skool" for good reason. It’s the ONLY way to ensure data integrity.

  10. Ben Johnson says:

    Glenn, I appreciate your input. Let’s agree to agree that problems can be solved in a number of ways. Even with something like a high volume telephony system, the last thing I would want is add another layer of complexity and have multiple apps using the same database. Obviously this is an arbitrary project we are talking about, but it’s very easy to break complicated apps down into smaller parts making them easier to develop, more efficient, etc. Take setting up a credit card server and an e-commerce app. Obviously a much simpler example, but doing something like that makes things so much easier, cleaner, efficient, and more secure.

    Regardless, this is a little outside the scope of this post. The point of this post was to show that a virtual attribute can be "cached" in the database, which ultimately keeps the logic behind the attribute DRY, and makes writing queries extremely easy. Glenn, say you have a very complicated calculation for total and you did not take this approach, and you have 5 different applications all accessing the same database. How are you going to keep the logic behind the total DRY? How are you going to sort by it and have it be an efficient sort that is indexed? Write conditions on it: total > 100, etc. And do all of this across 5 different apps, that aren’t necessarily all written in ruby? Maybe the calculation behind the total requires something that you just can’t do in a database.

    The reason I called it "old school", is because it is old school. Programming languages, design styles, design patterns, and system integration have all advanced enough to offer an alternative to sharing a databases directly and putting a million rules, triggers, and hooks in your database. Plus databases are severely limited when you put them in a role that a programming language, like ruby, should be handling. So why give them that role in your application? In this day and age it just doesn’t make (as much) sense.

    With something like this it’s an all or nothing approach. You can’t say "I want the database to enforce everything" and then have another app keep the "total" value accurate. This compromises data integrity, and at this point you might as well move everything to the app controlling the "total" value and look at something like REST services.

    Lastly, here’s a brain twister for you: We can all agree that everything, in the programming world, is moving to a higher level. That is the definition of advancement, things get built on top of things that get built on top of other things, etc. Ruby is built on top of C, ActiveRecord is built on top of the various database gems, even searchgasm is built ontop of ActiveRecord, etc. If you really want to look at this from a unique angle, maybe one could say that REST services are built on top of databases and can be viewed as an "advancement" to databases. Obviously you are sacrificing performance, but give me an example where you aren’t.

    To drive my point home, if someone was to write a web app in C without considering alternatives like PHP and Ruby, I would call them old school. If someone was to share a database amongst multiple apps without considering an alternative like REST services, I would call them old school. Maybe that statement is a little premature, maybe it isnt.

  11. Glenn Gillen says:

    >Glenn, say you have a very complicated calculation for total and you did not take this approach, and you have 5 different
    >applications all accessing the same database. How are you going to keep the logic behind the total DRY? How are you
    >going to sort by it and have it be an efficient sort that is indexed? Write conditions on it: total > 100, etc"

    It’s called a view

    >To drive my point home, if someone was to write a web app in C without considering alternatives like PHP and Ruby, I would
    >call them old school. If someone was to share a database amongst multiple apps without considering an alternative like
    >REST services, I would call them old school. Maybe that statement is a little premature, maybe it isnt.

    Calling them old school would definitely be premature, as again there are a multitude of reasons where it would be a perfectly valid approach to write the web service in C (you should speak the 37 Signals guys to find out which bits of their systems they’ve had to implement in C because ruby was too slow). Likewise, REST should be no means EVER be considered a complete replacement for direct database access. Just because you can, doesn’t mean you should.

    I get the feeling your exposure or experience with databases is constrained to a fairly narrow set of use cases. "Plus databases are severely limited when you put them in a role that a programming language, like ruby, should be handling" doesn’t make any sense. You can program functions, triggers, and stored procedures at the DB level in Ruby, Python, C, Perl, etc. (depending on your vendor) just fine. You should take a deeper look someday, along with DataMapper, you’ll probably find that modern relational databases aren’t as restrictive or as ancient as you’ve been led to believe.

  12. Ben Johnson says:

    Glenn, I never said using C or resorting to the database was wrong. I was pretty clear on the fact that performance was sacrificed the higher level you get. Obviously if performance is a huge concern, then you shouldn’t use a higher level language / technology. In one of my recent projects performance was #1 priority. I got it working in ruby and rewrote all of the background processes / rake tasks in C. So I am very aware of situations that require lower level technologies.

    If you re-read my comments I never said writing in C or sharing a database was wrong. I just said it was old school without considering some newer alternatives like REST services. 5 years ago that alternative was not available. Hence the term "old school".

    I think we are taking about slightly different things. I am not talking about a single app using a single database, or even multiple apps using a database. I am talking about multiple apps all sharing data. My point was that using a database as the sole medium for sharing data is not a good idea. Databases are not designed to share data, they are designed to store it.

    Once you start sharing data through a database, there are all kind of fun hoops you get to jump through to ensure data integrity, because one app can’t trust another to validate data properly. Instead of creating a number of stored procedures, views, etc. Why not consider sharing the data via a REST service and let each app have its own database? You gain all of kinds of benefits, mainly a database / language agnostic API that allows for any kind of 3rd party integration, internally and externally. It makes it extremely easy to grow and create a group of apps that all work together efficiently. Things are much simpler, easier to scale, etc.. App A can modify and change its database any way it sees fit and not have to worry about how that will effect app B. Which leads me to my next point.

    Applications evolve and grow, and usually end up using data in a different way than it was initially intended. Having 2 apps evolving and growing with the same database just doesn’t make sense. One app might want to add fields that the other app doesn’t. Maybe app A adds a field that it requires to be present. Then adding that field into app B doesn’t make any sense. I can think of a million situations like this.

    Even if you share data via REST services it doesn’t mean it has to replace the database. Why not use it as a medium to sync data? Now the "total" column we have been talking about is controlled by a single app and then synced over to the other databases that your other apps are using.

    Lastly, I want to say I enjoy discussions like this and I appreciate you taking the time to come on here and share your thoughts. Hopefully you have not taken my responses as offensive and hopefully we both benefit from this discussion.