- Home
- Blog
- Ruby & Rails Core
- ActiveRecord Query Looked Clean. Generated SQL Was Horrific. Learned to Check EXPLAIN Always.
ActiveRecord Query Looked Clean. Generated SQL Was Horrific. Learned to Check EXPLAIN Always.
Why beautiful Ruby code generates terrible SQL—and the EXPLAIN ANALYZE habit that prevents performance disasters
“User.includes(:posts).where(posts: { published: true }). Looks innocent. Generates death queries. Check the SQL.”
Rails is seductive: chain a few scopes, enjoy readable Ruby, and ship. But the database doesn’t care how pretty your method chain looks. It cares about the plan. In one production incident last quarter, a single page render ran an elegant ActiveRecord chain that ballooned into a 1.9 million–row LEFT OUTER JOIN with a DISTINCT on top. p95 API latency jumped from 320ms to 2.7s for 48k daily active users. After 40 minutes of triage, we rewrote the query, added an index, and the same endpoint fell to 210ms p95 with DB CPU dropping from 68% → 39%.
This post shows exactly how to build the habit that prevents this: always run EXPLAIN (and when safe, EXPLAIN ANALYZE) for non‑trivial queries. We’ll translate common Rails chains into their SQL, inspect the plan, and fix the real problems: joins, row explosions, sorting, and missing indexes.
The pretty chain that hurts: includes + where
The code (Rails 7.1+, Ruby 3.2):
# Intention: get active users who have at least one published post
# Why this is here: looks like preloading posts to avoid N+1 and filter by posts
users = User
.active
.includes(:posts) # we think: eager load to avoid N+1
.where(posts: { published: true }) # we think: filter by associated rows
.order('users.last_sign_in_at DESC')
.limit(50)
What actually happens (simplified):
SELECT DISTINCT "users".*
FROM "users"
LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"
WHERE "users"."status" = 'active' AND "posts"."published" = TRUE
ORDER BY users.last_sign_in_at DESC
LIMIT 50;
The includes turns into a LEFT OUTER JOIN because you referenced the association in where. Rails adds DISTINCT to de-duplicate user rows. That DISTINCT usually implies a sort or hash aggregate over all joined rows before the LIMIT. On a 1:N relationship, that’s a row multiplier.
Plan symptoms (abridged EXPLAIN):
Sort (cost=... rows=1,900,000) → HashAggregate (…)
→ Hash Join
→ Seq Scan on posts (rows=1,200,000)
→ Index Scan using users_status_idx on users (rows=180,000)
Key insight:
includesis for loading,joinsis for filtering. Mixing them often forcesLEFT OUTER JOIN+DISTINCT—a classic footgun.
Fix 1: filter with joins + merge and preload separately
# Why: separate concerns—use JOIN to filter, PRELOAD to avoid N+1
users = User
.active
.joins(:posts)
.merge(Post.published) # keeps the Post predicate in the ON/WHERE
.preload(:posts) # performs a second, small query to load posts
.order(last_sign_in_at: :desc)
.limit(50)
Generated SQL (simplified):
SELECT "users".*
FROM "users"
INNER JOIN "posts" ON "posts"."user_id" = "users"."id" AND "posts"."published" = TRUE
WHERE "users"."status" = 'active'
ORDER BY users.last_sign_in_at DESC
LIMIT 50;
Result: no row explosion, no DISTINCT. On our incident, p95 query time dropped from 1.6s → 140ms. API p95 went 2.7s → 210ms. DB CPU fell 68% → 39%.
Always start with the plan: EXPLAIN vs EXPLAIN ANALYZE
-
EXPLAINshows the planner’s predicted cost and row counts—safe to run anywhere. -
EXPLAIN ANALYZEexecutes the query and returns actual timings/rows. Use in staging, locally, or in prod only during maintenance on read‑only/limited queries.
Rails console helpers:
# Why: see the actual SQL and check it into PRs when plans regress
scope = User.joins(:posts).merge(Post.published)
puts scope.to_sql
# For quick sanity checks in dev
ActiveRecord::Base.logger = Logger.new($stdout)
psql in staging:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT users.id
FROM users
JOIN posts ON posts.user_id = users.id AND posts.published = TRUE
WHERE users.status = 'active'
ORDER BY users.last_sign_in_at DESC
LIMIT 50;
What to look for:
- Seq Scan on big tables where an index scan should exist.
-
HashAggregate / Sort over millions of rows—often caused by
DISTINCT. -
Row estimates wildly wrong (e.g., estimated 1k, actual 1.2M): consider
ANALYZE, extended stats, or rewritten predicates.
“Beautiful Ruby is irrelevant if the plan is bad. Read the plan first; refactor Ruby second.”
Common Rails→SQL footguns and how to fix them
1) includes + filter ⇒ LEFT OUTER JOIN + DISTINCT
Use instead: joins(...).merge(...) + preload.
Trade-off: two queries (one for users, one for preloaded posts). That’s fine when the first query is selective (≤ 50–200 rows). If you need thousands, consider streaming or pagination with Pagy.
2) distinct to hide duplicates ⇒ unexpected sort
If you must deduplicate, prefer EXISTS (semi‑join) when you only need the parent rows.
# Why: EXISTS avoids row multiplication and DISTINCT sort
users = User
.active
.where(
Post.where(published: true)
.where("posts.user_id = users.id")
.arel.exists
)
.order(last_sign_in_at: :desc)
Plan hint: Index Only Scan on posts(user_id, published) and Bitmap Heap Scan can appear here, which are cheaper than DISTINCT over a join.
3) Sorting on associated columns without an index
Ordering by posts.created_at DESC while filtering on posts? Add a covering index.
-- Why: supports join + filter + sort in one index walk
CREATE INDEX CONCURRENTLY idx_posts_user_id_published_created_at
ON posts (user_id, published, created_at DESC);
On our incident, this single index cut the planner’s sort from ~900k rows down to 50.
4) Chaining scopes that hide ORs and functions
Rails scopes that embed LOWER(email) or OR conditions can kill index usage.
# Why: push work to indexed columns; avoid functions on the left side
scope :email_eq, ->(e) { where('email = ?', e.downcase) }
Trade-off: you may need a functional index if normalization isn’t possible:
CREATE INDEX CONCURRENTLY idx_users_lower_email ON users ((lower(email)));
5) Counting with includes
users.includes(:posts).count can produce a COUNT(DISTINCT users.id) + join. Prefer joins(:posts).distinct.count or where(...).select(:id).distinct.count or the EXISTS pattern.
A reproducible mini‑case: 2.1s → 140ms by changing the shape
Let’s reproduce the failure pattern on a dataset with ~1.2M posts and ~180k active users.
Bad:
# Why this fails: LEFT OUTER JOIN + DISTINCT + ORDER BY before LIMIT
User.active
.includes(:posts)
.where(posts: { published: true })
.order('users.last_sign_in_at DESC')
.limit(50)
.load_async # Rails 7.1 parallelizes, but doesn't fix the plan shape
Abridged plan:
Finalize GroupAggregate (actual time=2100..2200 ms)
-> Sort (rows=1,950,000) -- DISTINCT implied
-> Hash Join
-> Seq Scan on posts (rows=1,200,000)
-> Index Scan using users_status_idx on users (rows=180,000)
Better:
# Why it wins: INNER JOIN is selective; preload runs as a tiny second query
User.active
.joins(:posts)
.merge(Post.published)
.preload(:posts)
.order(last_sign_in_at: :desc)
.limit(50)
Plan after adding the index from earlier:
Limit (actual time=110..140 ms)
-> Index Scan using users_last_sign_in_at_idx on users
Filter: (status = 'active')
-> Nested Loop
-> Index Scan using idx_posts_user_id_published_created_at on posts
Outcome: p95 DB query 2.1s → 140ms, endpoint 1.8s → 220ms, worker saturation reduced (Sidekiq queue depth from 1,200 → 140 during peak).
We also saw RDS I/O reduced by ~35% during the peak hour. The change was a net negative diff of 9 lines.
Guardrails: make the plan visible in dev, CI, and prod
- PgHero: surfaces slow queries and missing indexes; budget 30 minutes weekly to clean top offenders.
-
Bullet: catches N+1 so you don’t reach for
includesdefensively everywhere. - Skylight/New Relic: APM trace shows which controller action owns the time; annotate PRs with screenshots.
- pg_stat_statements: enable it; it’s the source of truth for frequency and mean time.
Add a one‑liner safety net in specs:
# spec/support/explain_guard.rb
# Why: fails the build if a scope degrades into DISTINCT+JOIN on large tables
RSpec.configure do |config|
config.around(:each, :explain) do |ex|
sql = ex.metadata[:scope].to_sql
expect(sql).not_to match(/DISTINCT.+JOIN.+posts/i)
ex.run
end
end
Tag a few high‑risk scopes with :explain, scope: User.suspicious_scope to keep regressions out.
Production playbook (10 minutes max):
- Find the query in APM; copy SQL.
- Run
EXPLAINfirst. If needed and safe, runEXPLAIN (ANALYZE, BUFFERS)off‑peak. - Look for joins + distinct, big sorts, seq scans.
- Change shape first (JOIN/EXISTS), then indexes, then data access pattern.
When to drop to Arel or raw SQL
ActiveRecord is great for 80% of cases. The last 20% don’t deserve clever method chains.
# Why: explicit EXISTS with Arel keeps intent clear and plan stable
posts = Post.arel_table
users = User.arel_table
exists_published = posts
.project(Arel.sql('1'))
.where(posts[:user_id].eq(users[:id]).and(posts[:published].eq(true)))
User.active
.where(exists_published.exists)
.order(last_sign_in_at: :desc)
.limit(50)
Trade-off: readability for junior devs goes down; review burden goes up. Worth it for hot paths (> 5% of total request time or > 100 qps). For everything else, stick with conventional scopes.
“If you can’t explain the plan in a PR comment, you don’t understand the query yet.”
A mistake I won’t repeat
I once “optimized” a feed endpoint by sprinkling includes(:posts, :comments) everywhere. Bullet was green; N+1 disappeared; the page looked fast on my laptop. In prod, Postgres showed HashAggregate over 3.4M rows and a spill to disk (work_mem exhausted). We blew our p95 budget from 600ms → 2.4s and triggered auto‑scaling twice in 15 minutes. The fix was boring: replace includes with joins + merge, add an EXISTS semi‑join for presence checks, and create two composite indexes. Lesson: preloading is not filtering; EXPLAIN is.
Final Thoughts
- Start with the plan.
EXPLAINin PRs,EXPLAIN ANALYZEin safe environments. - Use
joins + mergeto filter; usepreload/includesto load. - Prefer
EXISTSoverDISTINCTwhen you only need to know if related rows exist. - Add covering indexes that match join + filter + sort in that order.
- Put guardrails in CI and track results in PgHero/APM.
Your Ruby can stay clean—just make sure the SQL isn’t horrific.
Was this article helpful?
Your feedback helps us improve our content
How We Verify Conversions
Every conversion shown on this site follows a strict verification process to ensure correctness:
- Compare results on same dataset — We run both SQL and ActiveRecord against identical test data and verify results match
-
Check generated SQL with
to_sql— We inspect the actual SQL Rails generates to catch semantic differences (INNER vs LEFT JOIN, WHERE vs ON, etc.) - Add regression tests for tricky cases — Edge cases like NOT EXISTS, anti-joins, and predicate placement are tested with multiple scenarios
- Tested on Rails 8.1.1 — All conversions verified on current Rails version to ensure compatibility
Last updated: February 22, 2026
Try These Queries in Our Converter
See the SQL examples from this article converted to ActiveRecord—and compare the SQL Rails actually generates.
Deep Dive into ActiveRecord
Raza Hussain
Full-stack developer specializing in Ruby on Rails, React, and modern JavaScript. 15+ years upgrading and maintaining production Rails apps. Led Rails 4/5 → 7 upgrades with 40% performance gains, migrated apps from Heroku to Render cutting costs by 35%, and built systems for StatusGator, CryptoZombies, and others. Available for Rails upgrades, performance work, and cloud migrations.
More on SQL to ActiveRecord
Read "Agile Web Development with Rails." Still Couldn't Write Queries. Needed Examples, Not Theory.
Books teach concepts. You need examples. See SQL vs ActiveRecord side-by-side, when to use scopes/Arel/SQL, and how to ship maintainable queries fast.
Taught SQL for 5 Years. Teaching ActiveRecord Broke All My Analogies. Had to Unlearn to Teach.
How to teach ActiveRecord to SQL developers: relations over strings, scopes, preloading vs joins, and when to use Arel/SQL—with concrete metrics and code.
DBA Learning Rails: Everything Felt Wrong. SQL Was Better. Then Maintenance Hell Changed My Mind.
From raw SQL to Rails ActiveRecord: when to use scopes, Arel, or SQL; how to kill N+1s; and why maintenance wins. Real numbers, trade-offs, and code.
Leave a Response
Responses (0)
No responses yet
Be the first to share your thoughts