Semi-Join (EXISTS)
Purpose: Find records in table A where a related record exists in table B, without returning duplicate rows from A.
SQL Pattern
SELECT * FROM users
WHERE EXISTS (
SELECT 1 FROM posts
WHERE posts.user_id = users.id
AND posts.published = true
)
✓ Safe ActiveRecord Translation
User.where(id: Post.where(published: true).select(:user_id))
This generates an IN subquery which PostgreSQL optimizes similarly to EXISTS. No duplicates possible.
⚠️ Problematic Alternative
User.joins(:posts).where(posts: { published: true })
Problem: Returns duplicate User rows if a user has multiple published posts. You'd need .distinct, which can hide performance issues.
When to Verify
Check .to_sql to confirm it's using IN (subquery) or EXISTS, not a JOIN. Run both queries and compare row counts — they should match exactly.
Anti-Join (NOT EXISTS)
Purpose: Find records in table A where NO related record exists in table B.
SQL Pattern
SELECT * FROM users
WHERE NOT EXISTS (
SELECT 1 FROM posts
WHERE posts.user_id = users.id
)
Or equivalently with LEFT JOIN:
SELECT users.* FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE posts.id IS NULL
✓ Safe ActiveRecord Translations
Option 1: Subquery (clearer intent)
User.where.not(id: Post.select(:user_id))
Option 2: LEFT JOIN + NULL check
User.left_joins(:posts).where(posts: { id: nil })
Both work. The subquery version is usually clearer. Verify .to_sql and check that users with no posts appear in the result.
✗ Common Mistake
User.joins(:posts).where.not(posts: { id: nil })
Problem: This returns users who DO have posts (because INNER JOIN filters out users without posts). Exactly the opposite of what you want!
Deduping Joins Without DISTINCT Band-Aids
Problem: JOIN produces duplicate rows, so you slap .distinct on it. This works but can hide a bigger issue.
The Symptom
# Returns 1000 duplicate user rows from database, dedups in memory
User.joins(:posts).joins(:comments).distinct # Cartesian explosion!
Run this without .distinct and check .count vs .distinct.count. If they differ by 10×+, you have a problem.
Better Approaches
1. Use Subqueries (Semi-Join)
User.where(id: Post.select(:user_id))
.where(id: Comment.select(:user_id))
2. Use EXISTS for Complex Conditions
User.where('EXISTS (SELECT 1 FROM posts WHERE posts.user_id = users.id)')
.where('EXISTS (SELECT 1 FROM comments WHERE comments.user_id = users.id)')
3. Only Use DISTINCT When Legitimately Needed
If you're counting or selecting specific columns and duplicates are semantically correct, .distinct is fine. Just verify the performance.
Keyset Pagination
Purpose: Paginate large result sets efficiently without OFFSET (which gets slower as you go deeper).
SQL Pattern
-- First page
SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 10
-- Next page (using last seen created_at and id)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-01 10:00:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10
ActiveRecord Translation
# First page
posts = Post.order(created_at: :desc, id: :desc).limit(10)
# Next page
last_post = posts.last
Post.where('(created_at, id) < (?, ?)', last_post.created_at, last_post.id)
.order(created_at: :desc, id: :desc)
.limit(10)
This stays fast even at page 1000 because it uses an index seek, not a scan + offset.
Index Requirement
You MUST have a composite index on (created_at DESC, id DESC) for this to work efficiently.
add_index :posts, [:created_at, :id], order: { created_at: :desc, id: :desc }
Top-N Per Group
Purpose: Get the top N records for each group (e.g., 5 most recent posts per user).
SQL Pattern (PostgreSQL/Modern SQL)
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM posts
) ranked
WHERE rn <= 5
ActiveRecord Limitation
ActiveRecord has limited window function support (Rails 7+), but this specific pattern is awkward to express. You have two options:
Option 1: Raw SQL
sql = <<~SQL
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM posts
) ranked
WHERE rn <= 5
SQL
Post.find_by_sql(sql)
Option 2: N+1 in Ruby (for small N)
# Only acceptable if you have < 100 users
User.includes(:posts).each do |user|
user.posts.order(created_at: :desc).limit(5)
end
Recommendation: Use raw SQL or create a database view for top-N-per-group queries. Don't fight ActiveRecord on this one.
CTEs (WITH Clauses)
Purpose: Break complex queries into readable, reusable parts. Also enables recursive queries.
SQL Pattern
WITH active_users AS (
SELECT * FROM users WHERE active = true
),
recent_posts AS (
SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT u.*, COUNT(p.id) as post_count
FROM active_users u
LEFT JOIN recent_posts p ON u.id = p.user_id
GROUP BY u.id
ActiveRecord Support
Rails 7.1+ added .with for CTEs, but it's limited and verbose for complex cases.
# Rails 7.1+
Post.with(
active_users: User.where(active: true),
recent_posts: Post.where('created_at > ?', 7.days.ago)
).from('active_users u')
.joins('LEFT JOIN recent_posts p ON u.id = p.user_id')
.group('u.id')
.select('u.*, COUNT(p.id) as post_count')
Recommendation: For anything beyond trivial CTEs, use raw SQL or create a database view. The ActiveRecord syntax is harder to read than the SQL.
When to Use CTEs
- • Recursive queries (org charts, threaded comments)
- • Complex analytics with multiple aggregation steps
- • Queries you want to be readable by future maintainers
Pattern Cheat Sheet Summary
✓ ActiveRecord Handles Well
- • Semi-joins (EXISTS → WHERE id: subquery)
- • Anti-joins (NOT EXISTS → WHERE.not)
- • Keyset pagination
- • Simple aggregations
✗ Use Raw SQL or Views For
- • Window functions (ROW_NUMBER, etc.)
- • Complex CTEs
- • Top-N per group
- • UNION, FULL OUTER JOIN
Related Resources
Examples
Copy-paste examples of these patterns with verification notes.
Performance Tips
Learn to diagnose slow queries and index problems.
Guide
Complete guide to safe SQL→ActiveRecord conversion.
Try the Converter
Convert your SQL patterns to ActiveRecord, then verify with .to_sql and this guide.