Entries tagged with “Ruby”
- All (24)
- Entries (5)
- Links (19)
- Photos (0)
Improving on Related Entries
A little while back, I posted about how I was determining related entries for my site. That method worked, but once I redid my site and added my 250+ Flickr photos, it started to really slow down when finding related photos, because of the increase in tags and posts. The real issue was that I was doing most of the work in Ruby, when it really should have been done with SQL. So, I decided to rewrite it.
Note: If you haven’t looked at my previous entry on the subject, you might want to take a look at it, just for the general idea of what I’m trying to accomplish. Essentially, I’m trying to find related posts by comparing tags. Here’s my new Post#related code:
def related(limit=5)
return [] if tags.empty?
join_array = tags.collect {|tag| "posts_tags.id = #{tag.id}"}
tags_join = "AND (#{join_array.join(' OR ')})"
self.class.find(:all,
:joins => "INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id" +
"INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id #{tags_join}",
:conditions => ["posts.id != ?", id], :group => "posts.id",
:order => "COUNT(*) DESC",
:limit => limit)
end
You can see, as I mentioned, that all the work is being done in the SQL now. I first create a list of tags from the current post, which I then feed into the query to search for other posts with similar tags. The SQL instructs the database to search for any posts with any of these tags, and then orders them based on how many tags match between the 2 posts. The SQL ended up being fairly complicated, with a lot of joins, but it’s now a whole lot faster, because I’m not creating a lot of overhead by dealing with the computation in Ruby. If you’re interested, here’s an example related entry query:
SELECT `posts`.* FROM `posts`
INNER JOIN taggings posts_taggings ON posts_taggings.taggable_id = posts.id
INNER JOIN tags posts_tags ON posts_tags.id = posts_taggings.tag_id
AND (posts_tags.id = 695
OR posts_tags.id = 192
OR posts_tags.id = 195)
WHERE (posts.id != 4322) AND ( (`posts`.`type` = 'FlickrPhoto' ) )
GROUP BY posts.id
ORDER BY COUNT(*) DESC
LIMIT 5
Making Better Use of named_scope
In Ruby on Rails 2.1, a great little feature called named_scope was added that really makes complicated finds a whole lot easier. I’m going to walk through one way you can use scopes to clean up your code, and if you want more information, Ryan Daigle’s post is a good starting point.
The Situation
For a project I’m working on, users can submit reviews, and I needed a way to access a user’s friends’ reviews. At first, I considered adding a method like the following to my User model:
class User < ActiveRecord::Base
def friends_reviews
Review.find(:all,
:joins => "JOIN friendships ON user_id = #{self.id}",
:conditions => "friendships.friend_id = reviews.user_id"
)
end
end
Named Scopes to the Rescue
While this method would work fine, there are several issues. For one, what if I want to change parameters on the find, like limiting it to 5 entries, or sorting by date? I would have to add parameters to the method, and it would start to get complicated. Instead, I created a scope in my Review model:
class Review < ActiveRecord::Base
named_scope :by_friends_of, lambda { |user|
{
:joins => "JOIN friendships ON user_id = #{user.id}",
:conditions => "friendships.friend_id = reviews.user_id"
}
}
end
Now, to get the reviews I want, I can call Review.by_friends_of(user) and it will get me reviews by user’s friends. What’s even better, since it’s a scope, I can modify it. For instance, Review.by_friends_of(user).all(:limit => 10, :order => "created_at DESC") will limit it to 10 reviews, sorted by creation date. I can even use other scopes I might have created, like: Review.by_friends_of(user).published.all(:limit => 20), and so on.
Scopes are super useful, and recently I’ve started to use them for just about everything. Definitely try them out, they’ll make your code a lot more efficient and useful.
Finding Related Entries Using Tags with Ruby on Rails
One of the cool features that I built into my new site is the “related” sidebar box on every entry and link. Using a not-so-sophisticated algorithm, my site automatically picks out other entries that seem to be related to the current entry, which hopefully helps readers navigate to my other content. It really wasn’t too difficult to implement, so I figured I’d go through my thought process and the code that makes it happen.
The Not So Fancy Algorithm
It took me a little while to come up with a way to determine if an entry is “related” that was both accurate and relatively efficient–I could have used some complicated tool that parses the content of my entries, but instead I decided use something that’s a little simpler: tags. To understand how it works, pretend I have three entries:
- Entry A - Tagged with: Turkey, Roast Beef, Cheese, Bread
- Entry B - Tagged with: Bread, Baking
- Entry C - Tagged with: Turkey, Cheese, Bread, Lettuce
Let’s say we’re looking for entries related to Entry A. From just looking at B and C, it’s clear that C should be closest, as they A and C both have something to do with sandwiches, whereas B only talks about bread. To rank the entriesprogramatically, I first do a query to find entries with any of the tags from Entry A. Then, once I have that list, I sort the entries by how many of Entry A’s tags are used. So, in the above example, Entry C would have 3 matched tags, and Entry B would have 2. It’s not a perfect system, but so far, it seems to be working pretty well.
The Code
So, here’s the code that’s performing all the magic:
class Post < ActiveRecord::Base
def related(limit=5)
@related ||=
returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
posts.sort_by do |p|
matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
matched_tags.size
end.reverse
end
end
end
The real meat of the method is in the returning block, so let’s take a look at that:
returning self.class.find_tagged_with(tag_list, :conditions => ['posts.id != ?', self.id], :limit => limit) do |posts|
posts.sort_by do |p|
matched_tags = p.tags.find_all {|t| self.tags.include?(t)}
matched_tags.size
end.reverse
end
What’s happening here is I’m first searching for any entries tagged with the current entry’s tags (I’m using acts_as_taggable_on_steroids), then, with the data that’s returned, I use Ruby to sort the entries by the number of “matched” tags, which then gets returned from the method. Conventional wisdom suggests moving the matched tags part into SQL, since MySQL is more efficient than Ruby at handling data. However, I’m using relatively small sets of information, and I haven’t run into any performance issues yet.
Overall, this method’s working pretty well for me, but I’m sure as I accumulate more posts, I’ll need to refine it some. I’d really like to incorporate some sort of popularity ranking, based on number of comments and views, but that’s not something I’m too worried about at the present.
Posting to Brightkite using ActiveResource and REST
The other day, I came across Brightkite’s REST API. After taking a look at it, I decided it was the perfect opportunity to try out ActiveResource, the dead simple way to consume RESTful resources. In 20 lines, I was able to put together a simple script to find your most recent check-in on Brightkite and then post to that place.
A Quick Intro To Active Resource
First off, here’s an idea of just how easy it is to connect to a REST API using ActiveResource. Take, for instance, the “places” resource in Brightkite; these URLs all have the base http://brightkite.com/places. To interface with places, all it takes is this:
class Place < ActiveResource::Base
self.site = 'http://brightkite.com'
end
That’s it. To get all the places, just do Place.find(:all). To create a new place, all it takes is Place.new. Amazing, to say the least. There’s a lot more that’s possible, so I recommend you check out the ActiveResource page on the Rails wiki.
The code
Below are the 20 lines necessary to post a note to Brightkite through the API. Just change USERNAME and PASSWORD to your username and password and change the note text at the bottom, and you’re ready to go. While this is a pretty simple example, it shows just how powerful a well constructed REST API can be.
Follow Euro 2008 on Twitter
After my friend Paul asked me to text him updates from a Euro 2008 game while he was at work, I came up with the idea to set up a twitter bot that posted updates from Soccernet’s gamecasts. After a couple hours of work, using Hpricot and the Twitter gem, I was able to build a Ruby script that worked pretty well. I’m planning on posting a complete post on how I did it, but first I want to work some of the kinks out. There are three twitter accounts you can follow if you’re interested:
- @euro2008cup - All updates from the gamecast (about one post per minute in the game). Tweets are shortened with a link to the full update.
- @euro2008scores - Only updates from the gamecast that are marked as “alerts” (goals, for the most part). Like @euro2008cup, updates are shortened.
- @euro2008sms - This has the same updates as @euro2008scores, but updates longer than 140 characters span across multiple tweets. This is best if you want to follow on your phone, as you’ll get the full update instead of having to click on a link.
Hopefully these will help someone out. Like, I said, I’ll be posting a how to on making your own twitter bot and screen scraper, so keep an eye out for that.
