Skip to main content

SQL Patterns in Rails: What They Mean and How to Express Them Safely

Not every SQL pattern maps 1:1 to ActiveRecord. This page gives the translation and the gotchas.

Pattern-first approach: Start with the SQL pattern you need, understand its purpose, then choose the safest Rails implementation — whether that's ActiveRecord, Arel, or raw SQL.

Semi-Join (EXISTS)

Purpose: Find records in table A where a related record exists in table B, without returning duplicate rows from A.

SQL Pattern

SELECT * FROM users
WHERE EXISTS (
  SELECT 1 FROM posts
  WHERE posts.user_id = users.id
  AND posts.published = true
)

✓ Safe ActiveRecord Translation

User.where(id: Post.where(published: true).select(:user_id))

This generates an IN subquery which PostgreSQL optimizes similarly to EXISTS. No duplicates possible.

⚠️ Problematic Alternative

User.joins(:posts).where(posts: { published: true })

Problem: Returns duplicate User rows if a user has multiple published posts. You'd need .distinct, which can hide performance issues.

When to Verify

Check .to_sql to confirm it's using IN (subquery) or EXISTS, not a JOIN. Run both queries and compare row counts — they should match exactly.

Anti-Join (NOT EXISTS)

Purpose: Find records in table A where NO related record exists in table B.

SQL Pattern

SELECT * FROM users
WHERE NOT EXISTS (
  SELECT 1 FROM posts
  WHERE posts.user_id = users.id
)

Or equivalently with LEFT JOIN:

SELECT users.* FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE posts.id IS NULL

✓ Safe ActiveRecord Translations

Option 1: Subquery (clearer intent)

User.where.not(id: Post.select(:user_id))

Option 2: LEFT JOIN + NULL check

User.left_joins(:posts).where(posts: { id: nil })

Both work. The subquery version is usually clearer. Verify .to_sql and check that users with no posts appear in the result.

✗ Common Mistake

User.joins(:posts).where.not(posts: { id: nil })

Problem: This returns users who DO have posts (because INNER JOIN filters out users without posts). Exactly the opposite of what you want!

Deduping Joins Without DISTINCT Band-Aids

Problem: JOIN produces duplicate rows, so you slap .distinct on it. This works but can hide a bigger issue.

The Symptom

# Returns 1000 duplicate user rows from database, dedups in memory
User.joins(:posts).joins(:comments).distinct  # Cartesian explosion!

Run this without .distinct and check .count vs .distinct.count. If they differ by 10×+, you have a problem.

Better Approaches

1. Use Subqueries (Semi-Join)

User.where(id: Post.select(:user_id))
    .where(id: Comment.select(:user_id))

2. Use EXISTS for Complex Conditions

User.where('EXISTS (SELECT 1 FROM posts WHERE posts.user_id = users.id)')
    .where('EXISTS (SELECT 1 FROM comments WHERE comments.user_id = users.id)')

3. Only Use DISTINCT When Legitimately Needed

If you're counting or selecting specific columns and duplicates are semantically correct, .distinct is fine. Just verify the performance.

Keyset Pagination

Purpose: Paginate large result sets efficiently without OFFSET (which gets slower as you go deeper).

SQL Pattern

-- First page
SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 10

-- Next page (using last seen created_at and id)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-01 10:00:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10

ActiveRecord Translation

# First page
posts = Post.order(created_at: :desc, id: :desc).limit(10)

# Next page
last_post = posts.last
Post.where('(created_at, id) < (?, ?)', last_post.created_at, last_post.id)
    .order(created_at: :desc, id: :desc)
    .limit(10)

This stays fast even at page 1000 because it uses an index seek, not a scan + offset.

Index Requirement

You MUST have a composite index on (created_at DESC, id DESC) for this to work efficiently.

add_index :posts, [:created_at, :id], order: { created_at: :desc, id: :desc }

Top-N Per Group

Purpose: Get the top N records for each group (e.g., 5 most recent posts per user).

SQL Pattern (PostgreSQL/Modern SQL)

SELECT *
FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
  FROM posts
) ranked
WHERE rn <= 5

ActiveRecord Limitation

ActiveRecord has limited window function support (Rails 7+), but this specific pattern is awkward to express. You have two options:

Option 1: Raw SQL

sql = <<~SQL
  SELECT *
  FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM posts
  ) ranked
  WHERE rn <= 5
SQL

Post.find_by_sql(sql)

Option 2: N+1 in Ruby (for small N)

# Only acceptable if you have < 100 users
User.includes(:posts).each do |user|
  user.posts.order(created_at: :desc).limit(5)
end

Recommendation: Use raw SQL or create a database view for top-N-per-group queries. Don't fight ActiveRecord on this one.

CTEs (WITH Clauses)

Purpose: Break complex queries into readable, reusable parts. Also enables recursive queries.

SQL Pattern

WITH active_users AS (
  SELECT * FROM users WHERE active = true
),
recent_posts AS (
  SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT u.*, COUNT(p.id) as post_count
FROM active_users u
LEFT JOIN recent_posts p ON u.id = p.user_id
GROUP BY u.id

ActiveRecord Support

Rails 7.1+ added .with for CTEs, but it's limited and verbose for complex cases.

# Rails 7.1+
Post.with(
  active_users: User.where(active: true),
  recent_posts: Post.where('created_at > ?', 7.days.ago)
).from('active_users u')
  .joins('LEFT JOIN recent_posts p ON u.id = p.user_id')
  .group('u.id')
  .select('u.*, COUNT(p.id) as post_count')

Recommendation: For anything beyond trivial CTEs, use raw SQL or create a database view. The ActiveRecord syntax is harder to read than the SQL.

When to Use CTEs

  • • Recursive queries (org charts, threaded comments)
  • • Complex analytics with multiple aggregation steps
  • • Queries you want to be readable by future maintainers

Pattern Cheat Sheet Summary

✓ ActiveRecord Handles Well

  • • Semi-joins (EXISTS → WHERE id: subquery)
  • • Anti-joins (NOT EXISTS → WHERE.not)
  • • Keyset pagination
  • • Simple aggregations

✗ Use Raw SQL or Views For

  • • Window functions (ROW_NUMBER, etc.)
  • • Complex CTEs
  • • Top-N per group
  • • UNION, FULL OUTER JOIN

Related Resources

Examples

Copy-paste examples of these patterns with verification notes.

Performance Tips

Learn to diagnose slow queries and index problems.

Guide

Complete guide to safe SQL→ActiveRecord conversion.

Try the Converter

Convert your SQL patterns to ActiveRecord, then verify with .to_sql and this guide.