Finding Related Entries Using Tags with Ruby on Rails
One of the cool features that I built into my new site is the “related” sidebar box on every entry and link. Using a not-so-sophisticated algorithm, my site automatically picks out other entries that seem to be related to the current entry, which hopefully helps readers navigate to my other content. It really wasn’t too difficult to implement, so I figured I’d go through my thought process and the code that makes it happen.
The Not So Fancy Algorithm
It took me a little while to come up with a way to determine if an entry is “related” that was both accurate and relatively efficient–I could have used some complicated tool that parses the content of my entries, but instead I decided use something that’s a little simpler: tags. To understand how it works, pretend I have three entries:
- Entry A - Tagged with: Turkey, Roast Beef, Cheese, Bread
- Entry B - Tagged with: Bread, Baking
- Entry C - Tagged with: Turkey, Cheese, Bread, Lettuce
Let’s say we’re looking for entries related to Entry A. From just looking at B and C, it’s clear that C should be closest, as they A and C both have something to do with sandwiches, whereas B only talks about bread. To rank the entriesprogramatically, I first do a query to find entries with any of the tags from Entry A. Then, once I have that list, I sort the entries by how many of Entry A’s tags are used. So, in the above example, Entry C would have 3 matched tags, and Entry B would have 2. It’s not a perfect system, but so far, it seems to be working pretty well.
The Code
So, here’s the code that’s performing all the magic:
class Post < ActiveRecord::Base
def related(limit=5)
@related ||=
returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
posts.sort_by do |p|
matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
matched_tags.size
end.reverse
end
end
end
The real meat of the method is in the returning block, so let’s take a look at that:
returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
posts.sort_by do |p|
matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
matched_tags.size
end.reverse
end
What’s happening here is I’m first searching for any entries tagged with the current entry’s tags (I’m using acts_as_taggable_on_steroids), then, with the data that’s returned, I use Ruby to sort the entries by the number of “matched” tags, which then gets returned from the method. Conventional wisdom suggests moving the matched tags part into SQL, since MySQL is more efficient than Ruby at handling data. However, I’m using relatively small sets of information, and I haven’t run into any performance issues yet.
Overall, this method’s working pretty well for me, but I’m sure as I accumulate more posts, I’ll need to refine it some. I’d really like to incorporate some sort of popularity ranking, based on number of comments and views, but that’s not something I’m too worried about at the present.

2 Comments
owen 26 Nov 2008 at 10:28AM
I get what your doing but that ruby code looks awful compared to sql. I’ll have to look into that ruby stuff more before I understand it.
Thought about tags but could not see a real benefit to them.
Kyle 27 Nov 2008 at 12:00PM
Owen: I definitely agree–it would probably be better to put this in SQL rather than Ruby. However, I’m not too worried about it since it hasn’t been a huge performance hit yet. Once I get more tags and entries, I’ll likely need to change my approach.