Improving on Related Entries

A little while back, I posted about how I was determining related entries for my site. That method worked, but once I redid my site and added my 250+ Flickr photos, it started to really slow down when finding related photos, because of the increase in tags and posts. The real issue was that I was doing most of the work in Ruby, when it really should have been done with SQL. So, I decided to rewrite it.

Note: If you haven’t looked at my previous entry on the subject, you might want to take a look at it, just for the general idea of what I’m trying to accomplish. Essentially, I’m trying to find related posts by comparing tags. Here’s my new Post#related code:

def related(limit=5)  
  return [] if tags.empty?

  join_array = tags.collect {|tag| "posts_tags.id = #{tag.id}"}

  tags_join = "AND (#{join_array.join(' OR ')})"

  self.class.find(:all,
                  :joins => "INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id" +
                            "INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id #{tags_join}",
                  :conditions => ["posts.id != ?", id], :group => "posts.id",
                  :order => "COUNT(*) DESC",
                  :limit => limit)
end

You can see, as I mentioned, that all the work is being done in the SQL now. I first create a list of tags from the current post, which I then feed into the query to search for other posts with similar tags. The SQL instructs the database to search for any posts with any of these tags, and then orders them based on how many tags match between the 2 posts. The SQL ended up being fairly complicated, with a lot of joins, but it’s now a whole lot faster, because I’m not creating a lot of overhead by dealing with the computation in Ruby. If you’re interested, here’s an example related entry query:

SELECT `posts`.* FROM `posts`
INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id
INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id
    AND (posts_tags.id = 695
        OR posts_tags.id = 192
        OR posts_tags.id = 195)
WHERE (posts.id != 4322) AND ( (`posts`.`type` = 'FlickrPhoto' ) )
GROUP BY posts.id
ORDER BY COUNT(*) DESC
LIMIT 5

Posted on April 13, 2009
Tagged with: , , , , , , ,

Leave A Comment

Ajax-loader