Query More in Rails with Arel and Virtual Attributes

This is my first ever (engineering) blog post! I hope to share my experiences in these posts so that they can help others. If they aren’t helpful to others, they’re still a collection of my experiences and that’s a good enough reason for me to write them. With that being said, feel free to give me any feedback (good or bad!) at suraj.gupta@uwaterloo.ca.


The problem

Recently, I had to solve a problem of the following form: suppose you have a Rails app with a User model and a Post model, where each (blog) post belongs to a user and a user has many posts. Also, suppose each post has a num_likes attribute (of type Integer) which represents how many times that post was liked (think “claps” on Medium). Find the maximum number of likes a user has gotten on any given blog post written by them, for every single user.

Immediately, you might be wondering why one might want to solve such a silly problem. Well, in this case, it could be a good indicator of how popular the user is. However you’d like to interpret it, the example here is only provided to materialize the solution. Realistically, this scenario can also be extended to more practical questions such as

Given a list of researchers \(R\), what is the maximum number of times any one of their publications has been cited, for each researcher in \(R\)?

and the solution discussed below would still apply!


The solution

The easiest way to solve the given exemplar problem in Rails would be to do something like

User.all.map{ |user| { user.id => user.posts.maximum(:num_likes) } }

but this will cause an N+1 problem of the form

SELECT `users`.`id` FROM `users`
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_1>
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_2>
...
SELECT MAX(`posts`.`num_likes`) FROM `posts` WHERE `posts`.`author_id` = <user_id_n>

We can solve this N+1 problem with a join query. In MySQL, this might look like

SELECT `users`.`id`, MAX(`posts`.`num_likes`) AS `max_num_likes_on_posts`
FROM `users`
LEFT OUTER JOIN `posts` ON `posts`.`author_id` = `users`.`id`
GROUP BY `users`.`id`

In Rails, we can write this concisely as

User.all.left_joins(:posts).group(User.arel_table[:id]).maximum(Post.arel_table[:num_likes])

while will return a hash where the keys are user IDs and the values are the maximum number of likes that user has gotten on any given blog post written by them. This is exactly what we want!

Well, not really. In the context of a Rails app, we might want to persist this max_num_likes_on_posts attribute as part of the instantiated User objects in memory. This is useful if the methods that will use this attribute consume a User ActiveRecord relation as an argument.

Or, you may want to serialize the entire User record along with this computed attribute. For example, on the index page for users, we might want to show this computed attribute along with all the other attributes of a user. To achieve this, we could execute two separate queries

  1. SELECT `users`.* FROM `users`
  2. the join query we discussed above

but this doesn’t solve the problem of persisting the attribute with the rest of the User object.

Also, this requires two roundtrips to the database. We’d like to be as efficient as possible and only make one database roundtrip to reduce latency for the end user.


Changing schema and caching

To persist this attribute and fetch everything in only one database query, we could cache the max_num_likes_on_posts attribute onto the users table. But this solution requires a migration. And more importantly, this solution begs the question of whether or not such an attribute warrants a first-class column on the users table.

If it does, we have to carefully maintain the cached value. Whenever a blog post written by author \(a\) is liked, we have to recompute the max_num_likes_on_posts for author \(a\). We also have to perform this computation whenever a blog post written by author \(a\) is deleted. We could maintain this state with before_* and after_* callbacks, but nonetheless, this solution is certainly complex.

However, a simple select * query on the users table now allows us to fetch the data we need in a single query. If you go with this solution, the performance bottleneck is now on updating a blog post, rather than querying the max_num_likes_on_posts per user.

Almost always, the use case won’t justify such a migration so let’s discuss another solution which also persists the max_num_likes_on_posts attribute and only uses a single query without any changes to the schema.


Using Arel and virtual attributes

Let’s first observe that we can query all the existing user attributes as well as the max_num_likes_on_posts attribute with a single query by select all the existing user attribute from the users table and subsequently joining the aforementioned query as a subquery to get the max_num_likes_on_posts attribute. That sounds a little complicated so let’s see what this query could look like in SQL

SELECT `users`.*, `max_subquery`.`max_num_likes_on_posts` 
FROM `users`
INNER JOIN (
  SELECT `users`.`id` AS `user_id`, MAX(`posts`.`num_likes`) AS `max_num_likes_on_posts`
  FROM `users`
  LEFT OUTER JOIN `posts` ON `posts`.`author_id` = `users`.`id`
  GROUP BY `users`.`id`
) `max_subquery` ON `max_subquery`.`user_id` = `users`.`id` 

Great! We can now fetch all the attributes of a user along with the max number of likes they’ve gotten on any post, in a single query! But, how do we write this in Rails? More specifically, how do we write such a subquery in Rails, and then how do we capture the max_num_likes_on_posts attribute as part of a Rails model since this attribute is not a column on the users table?

To achieve this in Rails, we need two tools: Arel and virtual attributes.

Arel (A Relational Algebra)

You may have already seen Arel show up in this blog post! Recall the following piece of code

User.all.left_joins(:posts).group(User.arel_table[:id]).maximum(Post.arel_table[:num_likes])

For the sake of simplicity, I previously omitted the explanation of arel_table. But before we talk about what arel_table is and how to write the query in Arel, let’s quickly discuss what Arel really is.

Arel is a library used by Rails to power ActiveRecord. All your favourite ActiveRecord methods are essentially wrappers around Arel methods. Arel internally represents a raw SQL query as an abstract-syntax-tree that one can incrementally build, providing the power of SQL at your fingertips, all in Rails. Queue evil laughter.

So, User.arel_table is an Arel representation of the users table in our database. Rather than boring you with the details of Arel, I’ll instead refer you to this excellent guide and this cheat sheet that will do a much better job at explaining what Arel is and getting you started with it. I’ll also refer you to the arel-helpers gem (and related docs) that helps get rid of some of the boilerplate code that comes with using Arel.

Assuming you are now familar with Arel, let’s write the above SQL query in Rails (without using arel-helpers)

users = User.arel_table
posts = Post.arel_table

subquery = users 
  .project(users[:id].as('user_id'))
  .project(posts[:num_likes].maximum.as('max_num_likes_on_posts'))
  .join(posts, Arel::Nodes::OuterJoin).on(posts[:author_id].eq(users[:id]))
  .group(users[:id])
  .as('max_subquery')

subquery_join = users
  .join(subquery, Arel::Nodes::InnerJoin).on(subquery[:user_id].eq(users[:id]))
  .join_sources

User.all
  .select(users[Arel.star])
  .select(subquery[:max_num_likes_on_posts])
  .from(users)
  .joins(subquery_join)

The last expression is bridging the Arel abstraction with ActiveRecord so that we can actually query the database (recall that before this last statement, we were just building up an abstract query and so we never actually queried the database).

Now that we’ve written the query in Arel, let’s talk about capturing the max_num_likes_on_posts attribute and persisting it in a User object. And to solve this problem, we’ll use virtual attributes.

Virtual attributes

A “virtual attribute” is just a plain old class attribute in Ruby.

You can attach these attributes to your Rails model as well (after all, a Rails model is a Ruby class). It’s as simple as

# app/models/user.rb
class User < ActiveRecord::Base
  ...
  attribute :max_num_likes_on_posts
  ...
end

In the context of a Rails model, these attributes are known as “virtual attributes” because they do not represent any persisted data in your database. Their value and lifetime is directly tied to the User object in memory.


Putting it all together

Now that we have the necessary tools to solve the problem, let’s quickly discuss where this code might exist before showing the implementation.

You can execute such a query anywhere that you have a User ActiveRecord relation. I personally think such logic is best housed in the model as a scope. A scope is a great way to execute any queries on an ActiveRecord relation. Although a scope is often used to filter down a set of records, it can also be used to join relations, reorder the relation with an order clause or … add columns to the select clause!

Here’s how I would write the scope

# app/models/user.rb
class User < ActiveRecord::Base
  ...
  INVALID_MAX_NUM_LIKES_ON_POSTS = -1
  MAX_NUM_LIKES_ON_POSTS_KEY = :max_num_likes_on_posts 

  attribute MAX_NUM_LIKES_ON_POSTS_KEY, :integer, default: INVALID_MAX_NUM_LIKES_ON_POSTS 

  scope :with_max_num_likes_on_posts, -> {
    users = User.arel_table
    posts = Post.arel_table

    subquery = users 
      .project(users[:id].as('user_id'))
      .project(posts[:num_likes].maximum.as(MAX_NUM_LIKES_ON_POSTS_KEY.to_str))
      .join(posts, Arel::Nodes::OuterJoin).on(posts[:author_id].eq(users[:id]))
      .group(users[:id])
      .as('max_subquery')

    subquery_join = users 
      .join(subquery, Arel::Nodes::InnerJoin).on(subquery[:user_id].eq(users[:id]))
      .join_sources

    select(users[Arel.star])
      .select(subquery[MAX_NUM_LIKES_ON_POSTS_KEY].as(MAX_NUM_LIKES_ON_POSTS_KEY.to_str))
      .from(users)
      .joins(subquery_join)
  }
  ...
end

Note: I would recommend using an appropriate default in the domain like I’ve done above so that you can easily check if you are dealing with a user record that has a valid max_num_likes_on_posts before actually using it. I chose -1 as an appropriate default because nil is already used to represent a user who doesn’t have any blog posts and 0 already represents a user who has one or more blog posts but has yet to receive likes.


And finally, you can use this scope when, for example, serializing a set of users

# app/controllers/user_controller.rb
class UserController < ActionController::Base 
  ...
  def index
    @users = User.all.with_max_num_likes_on_posts # query everything once
    # @users[k].max_num_likes_on_posts => <max_num_likes_on_posts_for_user_k> without executing a query
  end
  ...
end

The takeaway

Although this post focused on finding the max of an attribute on an association to a particular model and persisting that data, it can be abstracted to persist any arbitrary queried data on a Rails model for the lifetime of the instantiated records. And this is all thanks to Arel and virtual attributes!


I hope this blog post helps you efficiently solve some of your querying problems in Rails.

We discussed a couple of advanced Rails and SQL concepts here so please feel free to shoot me an email at suraj.gupta@uwaterloo.ca if you have any questions about anything discussed in this post!


The obvious meta joke

@users = [ <User :id="surajg" :max_num_likes_on_posts=0 ] # because this blog doesn't support "likes" 😆

so feel free to share it instead!

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora