Entries tagged with “Tags”

Improving on Related Entries

A little while back, I posted about how I was determining related entries for my site. That method worked, but once I redid my site and added my 250+ Flickr photos, it started to really slow down when finding related photos, because of the increase in tags and posts. The real issue was that I was doing most of the work in Ruby, when it really should have been done with SQL. So, I decided to rewrite it.

Note: If you haven’t looked at my previous entry on the subject, you might want to take a look at it, just for the general idea of what I’m trying to accomplish. Essentially, I’m trying to find related posts by comparing tags. Here’s my new Post#related code:

def related(limit=5)  
  return [] if tags.empty?

  join_array = tags.collect {|tag| "posts_tags.id = #{tag.id}"}

  tags_join = "AND (#{join_array.join(' OR ')})"

  self.class.find(:all,
                  :joins => "INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id" +
                            "INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id #{tags_join}",
                  :conditions => ["posts.id != ?", id], :group => "posts.id",
                  :order => "COUNT(*) DESC",
                  :limit => limit)
end

You can see, as I mentioned, that all the work is being done in the SQL now. I first create a list of tags from the current post, which I then feed into the query to search for other posts with similar tags. The SQL instructs the database to search for any posts with any of these tags, and then orders them based on how many tags match between the 2 posts. The SQL ended up being fairly complicated, with a lot of joins, but it’s now a whole lot faster, because I’m not creating a lot of overhead by dealing with the computation in Ruby. If you’re interested, here’s an example related entry query:

SELECT `posts`.* FROM `posts`
INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id
INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id
    AND (posts_tags.id = 695
        OR posts_tags.id = 192
        OR posts_tags.id = 195)
WHERE (posts.id != 4322) AND ( (`posts`.`type` = 'FlickrPhoto' ) )
GROUP BY posts.id
ORDER BY COUNT(*) DESC
LIMIT 5

Posted on April 13, 2009 Leave a Comment
Tagged with: , , , , , , ,

Finding Related Entries Using Tags with Ruby on Rails

One of the cool features that I built into my new site is the “related” sidebar box on every entry and link. Using a not-so-sophisticated algorithm, my site automatically picks out other entries that seem to be related to the current entry, which hopefully helps readers navigate to my other content. It really wasn’t too difficult to implement, so I figured I’d go through my thought process and the code that makes it happen.

The Not So Fancy Algorithm

It took me a little while to come up with a way to determine if an entry is “related” that was both accurate and relatively efficient–I could have used some complicated tool that parses the content of my entries, but instead I decided use something that’s a little simpler: tags. To understand how it works, pretend I have three entries:

  • Entry A - Tagged with: Turkey, Roast Beef, Cheese, Bread
  • Entry B - Tagged with: Bread, Baking
  • Entry C - Tagged with: Turkey, Cheese, Bread, Lettuce

Let’s say we’re looking for entries related to Entry A. From just looking at B and C, it’s clear that C should be closest, as they A and C both have something to do with sandwiches, whereas B only talks about bread. To rank the entriesprogramatically, I first do a query to find entries with any of the tags from Entry A. Then, once I have that list, I sort the entries by how many of Entry A’s tags are used. So, in the above example, Entry C would have 3 matched tags, and Entry B would have 2. It’s not a perfect system, but so far, it seems to be working pretty well.

The Code

So, here’s the code that’s performing all the magic:

class Post < ActiveRecord::Base
  def related(limit=5)
    @related ||=
      returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
        posts.sort_by do |p|
          matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
          matched_tags.size
        end.reverse
      end
  end
end

The real meat of the method is in the returning block, so let’s take a look at that:

returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
  posts.sort_by do |p|
    matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
    matched_tags.size
  end.reverse
end

What’s happening here is I’m first searching for any entries tagged with the current entry’s tags (I’m using acts_as_taggable_on_steroids), then, with the data that’s returned, I use Ruby to sort the entries by the number of “matched” tags, which then gets returned from the method. Conventional wisdom suggests moving the matched tags part into SQL, since MySQL is more efficient than Ruby at handling data. However, I’m using relatively small sets of information, and I haven’t run into any performance issues yet.

Overall, this method’s working pretty well for me, but I’m sure as I accumulate more posts, I’ll need to refine it some. I’d really like to incorporate some sort of popularity ranking, based on number of comments and views, but that’s not something I’m too worried about at the present.

Posted on November 11, 2008 2 Comments
Tagged with: , , , , , ,