GraphQL Tutorial: Handling Aggregations with GraphQL Ruby

Pierre Marris, Software Engineer in the Student Relationships team, shows us how to handle aggregations with GraphQL Ruby.

GraphQL is great for straightforward CRUD operations, but it can be tricky when implementing non-standard queries. For example, handling operations that aren’t querying for a field on a GraphQL object that corresponds to a column on a database table. Aggregation is one of those cases where it’s not immediately obvious as to how to translate a relatively easy task in an ORM in a REST API to GraphQL. You can add a field to a model for the aggregation, but that has the risk of N+1 queries if you’re querying for multiple objects.

The first time we had a need for aggregation in GraphQL was to add a field containing a count to one of our GraphQL objects. On Handshake, students can leave reviews for jobs they’ve worked at, and we want to aggregate all of those reviews to show a count of how many there are for each employer. This will be used in a view where multiple employers are returned at once, so we also had to avoid making N+1 queries for the reviews.

Sounds simple enough, right?

Fortunately, there actually is a simple approach using Lazy Execution in GraphQL Ruby combined with the BatchLoader gem (Or one of the other gems that does the same thing.) The BatchLoader prevents N+1 queries from being made for the reviews for each employer, and returns an object back to GraphQL Ruby that can be lazily resolved.

This approach is applicable when you’re specifically trying to avoid making N+1 queries. If that’s not a concern, you can skip all of this and make the query directly with ActiveRecord as with any other field on a GraphQL object.

How it’s done

The first step is to create a new BatchLoader subclass:

require "batch-loader"
class SizeBatchLoader < BatchLoader::GraphQL
def sync
# The sync method in the superclass will return an ActiveRecord Relation with all of the objects that have been queried for – all this has class has to do is get the number of those objects
view raw 17_1.rb hosted with ❤ by GitHub

The file can be anywhere in your GraphQL project. Then, add this line into your GraphQL Schema class to tell GraphQL Ruby that when a SizeBatchLoader object is returned, it should be lazily resolved by calling its sync method:

lazy_resolve(CountBatchLoader, :sync)
view raw 17_2.rb hosted with ❤ by GitHub

The last step is to use the new BatchLoader subclass in the field on a GraphQL object where it’s required. If you’re already familiar with how the BatchLoader is used, this won’t be very different. Our reviews are stored in a JobReview model with the only relevant column for this case being employer_id. This is what we’ll be using the BatchLoader to query for in this example:

def job_reviews_count
SizeBatchLoader.for(object[:id]).batch do |employer_ids, loader|
# Query for all of the JobReviews for the employers we need, and only select the one column that will be used
job_reviews = employer_ids)
employer_ids.each do |employer_id|
# Filter the full set of results for each employer. This is what will be passed to the sync method in the new BatchLoader we created., job_reviews.filter { |job_review| job_review.employer_id == employer_id })
view raw 17_3.rb hosted with ❤ by GitHub

This approach isn’t without its downsides.

You may have noticed that we’re calling size on the result in our BatchLoader subclass and not count; this is because we can’t use a COUNT query in the database if we want to also avoid N+1 queries. We have to load all of the results into memory from a single query to count them using the BatchLoader.

This is a classic tradeoff between time complexity versus space complexity.

In our case, the maximum number of reviews for any one employer is going to be in the thousands, and we’re only selecting a single Integer column from the database. It’s a reasonable amount of memory usage and outweighs the cost of having to perform a separate query to count the reviews for each employer.

Photo by Christopher Gower on Unsplash