Semi-Join (EXISTS)
Purpose: Find records in table A where a related record exists in table B, without returning duplicate rows from A.
SQL Pattern
SELECT * FROM users
WHERE EXISTS (
SELECT 1 FROM posts
WHERE posts.user_id = users.id
AND posts.published = true
)
β Safe ActiveRecord Translation
User.where(id: Post.where(published: true).select(:user_id))
This generates an IN subquery which PostgreSQL optimizes similarly to EXISTS. No duplicates possible.
β οΈ Problematic Alternative
User.joins(:posts).where(posts: { published: true })
Problem: Returns duplicate User rows if a user has multiple published posts. You'd need .distinct, which can hide performance issues.
When to Verify
Check .to_sql to confirm it's using IN (subquery) or EXISTS, not a JOIN. Run both queries and compare row counts β they should match exactly.
Anti-Join (NOT EXISTS)
Purpose: Find records in table A where NO related record exists in table B.
SQL Pattern
SELECT * FROM users
WHERE NOT EXISTS (
SELECT 1 FROM posts
WHERE posts.user_id = users.id
)
Or equivalently with LEFT JOIN:
SELECT users.* FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE posts.id IS NULL
β Safe ActiveRecord Translations
Option 1: Subquery (clearer intent)
User.where.not(id: Post.select(:user_id))
Option 2: LEFT JOIN + NULL check
User.left_joins(:posts).where(posts: { id: nil })
Both work. The subquery version is usually clearer. Verify .to_sql and check that users with no posts appear in the result.
β Common Mistake
User.joins(:posts).where.not(posts: { id: nil })
Problem: This returns users who DO have posts (because INNER JOIN filters out users without posts). Exactly the opposite of what you want!
Deduping Joins Without DISTINCT Band-Aids
Problem: JOIN produces duplicate rows, so you slap .distinct on it. This works but can hide a bigger issue.
The Symptom
# Returns 1000 duplicate user rows from database, dedups in memory
User.joins(:posts).joins(:comments).distinct # Cartesian explosion!
Run this without .distinct and check .count vs .distinct.count. If they differ by 10Γ+, you have a problem.
Better Approaches
1. Use Subqueries (Semi-Join)
User.where(id: Post.select(:user_id))
.where(id: Comment.select(:user_id))
2. Use EXISTS for Complex Conditions
User.where('EXISTS (SELECT 1 FROM posts WHERE posts.user_id = users.id)')
.where('EXISTS (SELECT 1 FROM comments WHERE comments.user_id = users.id)')
3. Only Use DISTINCT When Legitimately Needed
If you're counting or selecting specific columns and duplicates are semantically correct, .distinct is fine. Just verify the performance.
Keyset Pagination
Purpose: Paginate large result sets efficiently without OFFSET (which gets slower as you go deeper).
SQL Pattern
-- First page
SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 10
-- Next page (using last seen created_at and id)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-01 10:00:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10
ActiveRecord Translation
# First page
posts = Post.order(created_at: :desc, id: :desc).limit(10)
# Next page
last_post = posts.last
Post.where('(created_at, id) < (?, ?)', last_post.created_at, last_post.id)
.order(created_at: :desc, id: :desc)
.limit(10)
This stays fast even at page 1000 because it uses an index seek, not a scan + offset.
Index Requirement
You MUST have a composite index on (created_at DESC, id DESC) for this to work efficiently.
add_index :posts, [:created_at, :id], order: { created_at: :desc, id: :desc }
Top-N Per Group
Purpose: Get the top N records for each group (e.g., 5 most recent posts per user).
SQL Pattern (PostgreSQL/Modern SQL)
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM posts
) ranked
WHERE rn <= 5
ActiveRecord Limitation
ActiveRecord has limited window function support (Rails 7+), but this specific pattern is awkward to express. You have two options:
Option 1: Raw SQL
sql = <<~SQL
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM posts
) ranked
WHERE rn <= 5
SQL
Post.find_by_sql(sql)
Option 2: N+1 in Ruby (for small N)
# Only acceptable if you have < 100 users
User.includes(:posts).each do |user|
user.posts.order(created_at: :desc).limit(5)
end
Recommendation: Use raw SQL or create a database view for top-N-per-group queries. Don't fight ActiveRecord on this one.
CTEs (WITH Clauses)
Purpose: Break complex queries into readable, reusable parts. Also enables recursive queries.
SQL Pattern
WITH active_users AS (
SELECT * FROM users WHERE active = true
),
recent_posts AS (
SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT u.*, COUNT(p.id) as post_count
FROM active_users u
LEFT JOIN recent_posts p ON u.id = p.user_id
GROUP BY u.id
ActiveRecord Support
Rails 7.1+ added .with for CTEs, but it's limited and verbose for complex cases.
# Rails 7.1+
Post.with(
active_users: User.where(active: true),
recent_posts: Post.where('created_at > ?', 7.days.ago)
).from('active_users u')
.joins('LEFT JOIN recent_posts p ON u.id = p.user_id')
.group('u.id')
.select('u.*, COUNT(p.id) as post_count')
Recommendation: For anything beyond trivial CTEs, use raw SQL or create a database view. The ActiveRecord syntax is harder to read than the SQL.
When to Use CTEs
- β’ Recursive queries (org charts, threaded comments)
- β’ Complex analytics with multiple aggregation steps
- β’ Queries you want to be readable by future maintainers
Pattern Cheat Sheet Summary
β ActiveRecord Handles Well
- β’ Semi-joins (EXISTS β WHERE id: subquery)
- β’ Anti-joins (NOT EXISTS β WHERE.not)
- β’ Keyset pagination
- β’ Simple aggregations
β Use Raw SQL or Views For
- β’ Window functions (ROW_NUMBER, etc.)
- β’ Complex CTEs
- β’ Top-N per group
- β’ UNION, FULL OUTER JOIN
Related Resources
Examples
Copy-paste examples of these patterns with verification notes.
Performance Tips
Learn to diagnose slow queries and index problems.
Guide
Complete guide to safe SQLβActiveRecord conversion.
Try the Converter
Convert your SQL patterns to ActiveRecord, then verify with .to_sql and this guide.