Query More in Rails with Arel and Virtual Attributes
This is my first ever (engineering) blog post! I hope to share my experiences in these posts so that they can help others. If they aren’t helpful to others, they’re still a collection of my experiences and that’s a good enough reason for me to write them. With that being said, feel free to give me any feedback (good or bad!) at suraj.gupta@uwaterloo.ca.
The problem
Recently, I had to solve a problem of the following form: suppose you have a Rails app with a User model and a Post model,
where each (blog) post belongs to a user and a user has many posts. Also, suppose each post has a num_likes
attribute (of type Integer)
which represents how many times that post was liked (think “claps” on Medium). Find the maximum number of likes a user
has gotten on any given blog post written by them, for every single user.
Immediately, you might be wondering why one might want to solve such a silly problem. Well, in this case, it could be a good indicator of how popular the user is. However you’d like to interpret it, the example here is only provided to materialize the solution. Realistically, this scenario can also be extended to more practical questions such as
Given a list of researchers \(R\), what is the maximum number of times any one of their publications has been cited, for each researcher in \(R\)?
and the solution discussed below would still apply!
The solution
The easiest way to solve the given exemplar problem in Rails would be to do something like
User.all.map{ |user| { user.id => user.posts.maximum(:num_likes) } }
but this will cause an N+1 problem of the form
SELECT `users`.`id` FROM `users`
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_1>
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_2>
...
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_n>
We can solve this N+1 problem with a join query. In MySQL, this might look like
SELECT `users`.`id`, MAX(`posts`.`num_likes`) AS `max_num_likes_on_posts`
FROM `users`
LEFT OUTER JOIN `posts` ON `posts`.`author_id` = `users`.`id`
GROUP BY `users`.`id`
In Rails, we can write this concisely as
User.all.left_joins(:posts).group(User.arel_table[:id]).maximum(Post.arel_table[:num_likes])
while will return a hash where the keys are user IDs and the values are the maximum number of likes that user has gotten on any given blog post written by them. This is exactly what we want!
Well, not really. In the context of a Rails app, we might want to persist this max_num_likes_on_posts
attribute
as part of the instantiated User objects in memory. This is useful if the methods that will use this attribute
consume a User ActiveRecord relation as an argument.
Or, you may want to serialize the entire User record along with this computed attribute. For example, on the index page for users, we might want to show this computed attribute along with all the other attributes of a user. To achieve this, we could execute two separate queries
SELECT `users`.* FROM `users`
- the join query we discussed above
but this doesn’t solve the problem of persisting the attribute with the rest of the User object.
Also, this requires two roundtrips to the database. We’d like to be as efficient as possible and only make one database roundtrip to reduce latency for the end user.
Changing schema and caching
To persist this attribute and fetch everything in only one database query, we could cache the max_num_likes_on_posts
attribute onto the users
table.
But this solution requires a migration. And more importantly, this solution begs the question of whether or not
such an attribute warrants a first-class column on the users
table.
If it does, we have to carefully maintain the cached value. Whenever a blog post written by author \(a\) is liked,
we have to recompute the max_num_likes_on_posts
for author \(a\).
We also have to perform this computation whenever a blog post written by author \(a\) is deleted.
We could maintain this state with before_*
and after_*
callbacks, but nonetheless, this solution is certainly complex.
However, a simple select *
query on the users
table now allows us to fetch the data we need in a single query.
If you go with this solution, the performance bottleneck is now on updating a blog post,
rather than querying the max_num_likes_on_posts
per user.
Almost always, the use case won’t justify such a migration so let’s discuss another solution which
also persists the max_num_likes_on_posts
attribute and only uses a single query without any changes to the schema.
Using Arel and virtual attributes
Let’s first observe that we can query all the existing user attributes as well as the max_num_likes_on_posts
attribute
with a single query by select all the existing user attribute from the users
table and subsequently
joining the aforementioned query as a subquery to get the max_num_likes_on_posts
attribute.
That sounds a little complicated so let’s see what this query could look like in SQL
SELECT `users`.*, `max_subquery`.`max_num_likes_on_posts`
FROM `users`
INNER JOIN (
SELECT `users`.`id` AS `user_id`, MAX(`posts`.`num_likes`) AS `max_num_likes_on_posts`
FROM `users`
LEFT OUTER JOIN `posts` ON `posts`.`author_id` = `users`.`id`
GROUP BY `users`.`id`
) `max_subquery` ON `max_subquery`.`user_id` = `users`.`id`
Great! We can now fetch all the attributes of a user along with the max number of likes they’ve gotten on any post, in a single query!
But, how do we write this in Rails? More specifically, how do we write such a subquery in Rails, and then how do we capture
the max_num_likes_on_posts
attribute as part of a Rails model since this attribute is not a column on the users
table?
To achieve this in Rails, we need two tools: Arel and virtual attributes.
Arel (A Relational Algebra)
You may have already seen Arel show up in this blog post! Recall the following piece of code
User.all.left_joins(:posts).group(User.arel_table[:id]).maximum(Post.arel_table[:num_likes])
For the sake of simplicity, I previously omitted the explanation of arel_table
.
But before we talk about what arel_table
is and how to write the query in Arel, let’s quickly discuss what Arel really is.
Arel is a library used by Rails to power ActiveRecord. All your favourite ActiveRecord methods are essentially wrappers around Arel methods. Arel internally represents a raw SQL query as an abstract-syntax-tree that one can incrementally build, providing the power of SQL at your fingertips, all in Rails. Queue evil laughter.
So, User.arel_table
is an Arel representation of the users
table in our database.
Rather than boring you with the details of Arel, I’ll instead refer you to
this excellent guide
and
this cheat sheet
that will do a much better job at explaining what Arel is and getting you started with it.
I’ll also refer you to the arel-helpers
gem
(and related docs)
that helps get rid of some of the boilerplate code that comes with using Arel.
Assuming you are now familar with Arel, let’s write the above SQL query in Rails (without using arel-helpers
)
users = User.arel_table
posts = Post.arel_table
subquery = users
.project(users[:id].as('user_id'))
.project(posts[:num_likes].maximum.as('max_num_likes_on_posts'))
.join(posts, Arel::Nodes::OuterJoin).on(posts[:author_id].eq(users[:id]))
.group(users[:id])
.as('max_subquery')
subquery_join = users
.join(subquery, Arel::Nodes::InnerJoin).on(subquery[:user_id].eq(users[:id]))
.join_sources
User.all
.select(users[Arel.star])
.select(subquery[:max_num_likes_on_posts])
.from(users)
.joins(subquery_join)
The last expression is bridging the Arel abstraction with ActiveRecord so that we can actually query the database (recall that before this last statement, we were just building up an abstract query and so we never actually queried the database).
Now that we’ve written the query in Arel, let’s talk about capturing the max_num_likes_on_posts
attribute
and persisting it in a User object. And to solve this problem, we’ll use virtual attributes.
Virtual attributes
A “virtual attribute” is just a plain old class attribute in Ruby.
You can attach these attributes to your Rails model as well (after all, a Rails model is a Ruby class). It’s as simple as
# app/models/user.rb
class User < ActiveRecord::Base
...
attribute :max_num_likes_on_posts
...
end
In the context of a Rails model, these attributes are known as “virtual attributes” because they do not represent any persisted data in your database. Their value and lifetime is directly tied to the User object in memory.
Putting it all together
Now that we have the necessary tools to solve the problem, let’s quickly discuss where this code might exist before showing the implementation.
You can execute such a query anywhere that you have a User ActiveRecord relation.
I personally think such logic is best housed in the model as a scope
.
A scope
is a great way to execute any queries on an ActiveRecord relation. Although a scope
is often used to filter down a set of records, it can also be used to join relations, reorder the relation with an order clause
or … add columns to the select clause!
Here’s how I would write the scope
# app/models/user.rb
class User < ActiveRecord::Base
...
INVALID_MAX_NUM_LIKES_ON_POSTS = -1
MAX_NUM_LIKES_ON_POSTS_KEY = :max_num_likes_on_posts
attribute MAX_NUM_LIKES_ON_POSTS_KEY, :integer, default: INVALID_MAX_NUM_LIKES_ON_POSTS
scope :with_max_num_likes_on_posts, -> {
users = User.arel_table
posts = Post.arel_table
subquery = users
.project(users[:id].as('user_id'))
.project(posts[:num_likes].maximum.as(MAX_NUM_LIKES_ON_POSTS_KEY.to_str))
.join(posts, Arel::Nodes::OuterJoin).on(posts[:author_id].eq(users[:id]))
.group(users[:id])
.as('max_subquery')
subquery_join = users
.join(subquery, Arel::Nodes::InnerJoin).on(subquery[:user_id].eq(users[:id]))
.join_sources
select(users[Arel.star])
.select(subquery[MAX_NUM_LIKES_ON_POSTS_KEY].as(MAX_NUM_LIKES_ON_POSTS_KEY.to_str))
.from(users)
.joins(subquery_join)
}
...
end
Note: I would recommend using an appropriate default
in the domain like I’ve done above so that you can easily
check if you are dealing with a user record that has a valid max_num_likes_on_posts
before actually using it.
I chose -1
as an appropriate default because nil
is already used to represent a user who doesn’t have any blog posts
and 0
already represents a user who has one or more blog posts but has yet to receive likes.
And finally, you can use this scope when, for example, serializing a set of users
# app/controllers/user_controller.rb
class UserController < ActionController::Base
...
def index
@users = User.all.with_max_num_likes_on_posts # query everything once
# @users[k].max_num_likes_on_posts => <max_num_likes_on_posts_for_user_k> without executing a query
end
...
end
The takeaway
Although this post focused on finding the max of an attribute on an association to a particular model and persisting that data, it can be abstracted to persist any arbitrary queried data on a Rails model for the lifetime of the instantiated records. And this is all thanks to Arel and virtual attributes!
I hope this blog post helps you efficiently solve some of your querying problems in Rails.
We discussed a couple of advanced Rails and SQL concepts here so please feel free to shoot me an email at suraj.gupta@uwaterloo.ca if you have any questions about anything discussed in this post!
The obvious meta joke
@users = [ <User :id="surajg" :max_num_likes_on_posts=0 ] # because this blog doesn't support "likes" 😆
so feel free to share it instead!