ActiveRecord Query Looked Clean. Generated SQL

“User.includes(:posts).where(posts: { published: true }). Looks innocent. Generates death queries. Check the SQL.”

Rails is seductive: chain a few scopes, enjoy readable Ruby, and ship. But the database doesn’t care how pretty your method chain looks. It cares about the plan. In one production incident last quarter, a single page render ran an elegant ActiveRecord chain that ballooned into a 1.9 million–row LEFT OUTER JOIN with a DISTINCT on top. p95 API latency jumped from 320ms to 2.7s for 48k daily active users. After 40 minutes of triage, we rewrote the query, added an index, and the same endpoint fell to 210ms p95 with DB CPU dropping from 68% → 39%.

This post shows exactly how to build the habit that prevents this: always run EXPLAIN (and when safe, EXPLAIN ANALYZE) for non‑trivial queries. We’ll translate common Rails chains into their SQL, inspect the plan, and fix the real problems: joins, row explosions, sorting, and missing indexes.

The pretty chain that hurts: `includes` + `where`

The code (Rails 7.1+, Ruby 3.2):

# Intention: get active users who have at least one published post
# Why this is here: looks like preloading posts to avoid N+1 and filter by posts
users = User
  .active
  .includes(:posts)                   # we think: eager load to avoid N+1
  .where(posts: { published: true })  # we think: filter by associated rows
  .order('users.last_sign_in_at DESC')
  .limit(50)

What actually happens (simplified):

SELECT DISTINCT "users".*
FROM "users"
LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"
WHERE "users"."status" = 'active' AND "posts"."published" = TRUE
ORDER BY users.last_sign_in_at DESC
LIMIT 50;

The includes turns into a LEFT OUTER JOIN because you referenced the association in where. Rails adds DISTINCT to de-duplicate user rows. That DISTINCT usually implies a sort or hash aggregate over all joined rows before the LIMIT. On a 1:N relationship, that’s a row multiplier.

Plan symptoms (abridged EXPLAIN):

Sort (cost=... rows=1,900,000)  → HashAggregate (…)
  → Hash Join
      → Seq Scan on posts  (rows=1,200,000)
      → Index Scan using users_status_idx on users (rows=180,000)

Key insight: includes is for loading, joins is for filtering. Mixing them often forces LEFT OUTER JOIN + DISTINCT—a classic footgun.

Fix 1: filter with `joins` + `merge` and preload separately

# Why: separate concerns—use JOIN to filter, PRELOAD to avoid N+1
users = User
  .active
  .joins(:posts)
  .merge(Post.published)      # keeps the Post predicate in the ON/WHERE
  .preload(:posts)            # performs a second, small query to load posts
  .order(last_sign_in_at: :desc)
  .limit(50)

Generated SQL (simplified):

SELECT "users".*
FROM "users"
INNER JOIN "posts" ON "posts"."user_id" = "users"."id" AND "posts"."published" = TRUE
WHERE "users"."status" = 'active'
ORDER BY users.last_sign_in_at DESC
LIMIT 50;

Result: no row explosion, no DISTINCT. On our incident, p95 query time dropped from 1.6s → 140ms. API p95 went 2.7s → 210ms. DB CPU fell 68% → 39%.

Always start with the plan: `EXPLAIN` vs `EXPLAIN ANALYZE`

EXPLAIN shows the planner’s predicted cost and row counts—safe to run anywhere.
EXPLAIN ANALYZE executes the query and returns actual timings/rows. Use in staging, locally, or in prod only during maintenance on read‑only/limited queries.

Rails console helpers:

# Why: see the actual SQL and check it into PRs when plans regress
scope = User.joins(:posts).merge(Post.published)
puts scope.to_sql

# For quick sanity checks in dev
ActiveRecord::Base.logger = Logger.new($stdout)

psql in staging:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT users.id
FROM users
JOIN posts ON posts.user_id = users.id AND posts.published = TRUE
WHERE users.status = 'active'
ORDER BY users.last_sign_in_at DESC
LIMIT 50;

What to look for:

Seq Scan on big tables where an index scan should exist.
HashAggregate / Sort over millions of rows—often caused by DISTINCT.
Row estimates wildly wrong (e.g., estimated 1k, actual 1.2M): consider ANALYZE, extended stats, or rewritten predicates.

“Beautiful Ruby is irrelevant if the plan is bad. Read the plan first; refactor Ruby second.”

Common Rails→SQL footguns and how to fix them

1) `includes` + filter ⇒ `LEFT OUTER JOIN` + `DISTINCT`

Use instead: joins(...).merge(...) + preload.

Trade-off: two queries (one for users, one for preloaded posts). That’s fine when the first query is selective (≤ 50–200 rows). If you need thousands, consider streaming or pagination with Pagy.

2) `distinct` to hide duplicates ⇒ unexpected sort

If you must deduplicate, prefer EXISTS (semi‑join) when you only need the parent rows.

# Why: EXISTS avoids row multiplication and DISTINCT sort
users = User
  .active
  .where(
    Post.where(published: true)
        .where("posts.user_id = users.id")
        .arel.exists
  )
  .order(last_sign_in_at: :desc)

Plan hint: Index Only Scan on posts(user_id, published) and Bitmap Heap Scan can appear here, which are cheaper than DISTINCT over a join.

3) Sorting on associated columns without an index

Ordering by posts.created_at DESC while filtering on posts? Add a covering index.

-- Why: supports join + filter + sort in one index walk
CREATE INDEX CONCURRENTLY idx_posts_user_id_published_created_at
ON posts (user_id, published, created_at DESC);

On our incident, this single index cut the planner’s sort from ~900k rows down to 50.

4) Chaining scopes that hide ORs and functions

Rails scopes that embed LOWER(email) or OR conditions can kill index usage.

# Why: push work to indexed columns; avoid functions on the left side
scope :email_eq, ->(e) { where('email = ?', e.downcase) }

Trade-off: you may need a functional index if normalization isn’t possible:

CREATE INDEX CONCURRENTLY idx_users_lower_email ON users ((lower(email)));

5) Counting with `includes`

users.includes(:posts).count can produce a COUNT(DISTINCT users.id) + join. Prefer joins(:posts).distinct.count or where(...).select(:id).distinct.count or the EXISTS pattern.

A reproducible mini‑case: 2.1s → 140ms by changing the shape

Let’s reproduce the failure pattern on a dataset with ~1.2M posts and ~180k active users.

Bad:

# Why this fails: LEFT OUTER JOIN + DISTINCT + ORDER BY before LIMIT
User.active
    .includes(:posts)
    .where(posts: { published: true })
    .order('users.last_sign_in_at DESC')
    .limit(50)
    .load_async # Rails 7.1 parallelizes, but doesn't fix the plan shape

Abridged plan:

Finalize GroupAggregate  (actual time=2100..2200 ms)
  ->  Sort (rows=1,950,000)  -- DISTINCT implied
        ->  Hash Join
              ->  Seq Scan on posts (rows=1,200,000)
              ->  Index Scan using users_status_idx on users (rows=180,000)

Better:

# Why it wins: INNER JOIN is selective; preload runs as a tiny second query
User.active
    .joins(:posts)
    .merge(Post.published)
    .preload(:posts)
    .order(last_sign_in_at: :desc)
    .limit(50)

Plan after adding the index from earlier:

Limit  (actual time=110..140 ms)
  ->  Index Scan using users_last_sign_in_at_idx on users
        Filter: (status = 'active')
        ->  Nested Loop
              ->  Index Scan using idx_posts_user_id_published_created_at on posts

Outcome: p95 DB query 2.1s → 140ms, endpoint 1.8s → 220ms, worker saturation reduced (Sidekiq queue depth from 1,200 → 140 during peak).

We also saw RDS I/O reduced by ~35% during the peak hour. The change was a net negative diff of 9 lines.

Guardrails: make the plan visible in dev, CI, and prod

PgHero: surfaces slow queries and missing indexes; budget 30 minutes weekly to clean top offenders.
Bullet: catches N+1 so you don’t reach for includes defensively everywhere.
Skylight/New Relic: APM trace shows which controller action owns the time; annotate PRs with screenshots.
pg_stat_statements: enable it; it’s the source of truth for frequency and mean time.

Add a one‑liner safety net in specs:

# spec/support/explain_guard.rb
# Why: fails the build if a scope degrades into DISTINCT+JOIN on large tables
RSpec.configure do |config|
  config.around(:each, :explain) do |ex|
    sql = ex.metadata[:scope].to_sql
    expect(sql).not_to match(/DISTINCT.+JOIN.+posts/i)
    ex.run
  end
end

Tag a few high‑risk scopes with :explain, scope: User.suspicious_scope to keep regressions out.

Production playbook (10 minutes max):

Find the query in APM; copy SQL.
Run EXPLAIN first. If needed and safe, run EXPLAIN (ANALYZE, BUFFERS) off‑peak.
Look for joins + distinct, big sorts, seq scans.
Change shape first (JOIN/EXISTS), then indexes, then data access pattern.

When to drop to Arel or raw SQL

ActiveRecord is great for 80% of cases. The last 20% don’t deserve clever method chains.

# Why: explicit EXISTS with Arel keeps intent clear and plan stable
posts = Post.arel_table
users = User.arel_table
exists_published = posts
  .project(Arel.sql('1'))
  .where(posts[:user_id].eq(users[:id]).and(posts[:published].eq(true)))

User.active
    .where(exists_published.exists)
    .order(last_sign_in_at: :desc)
    .limit(50)

Trade-off: readability for junior devs goes down; review burden goes up. Worth it for hot paths (> 5% of total request time or > 100 qps). For everything else, stick with conventional scopes.

“If you can’t explain the plan in a PR comment, you don’t understand the query yet.”

A mistake I won’t repeat

I once “optimized” a feed endpoint by sprinkling includes(:posts, :comments) everywhere. Bullet was green; N+1 disappeared; the page looked fast on my laptop. In prod, Postgres showed HashAggregate over 3.4M rows and a spill to disk (work_mem exhausted). We blew our p95 budget from 600ms → 2.4s and triggered auto‑scaling twice in 15 minutes. The fix was boring: replace includes with joins + merge, add an EXISTS semi‑join for presence checks, and create two composite indexes. Lesson: preloading is not filtering; EXPLAIN is.

Final Thoughts

Start with the plan. EXPLAIN in PRs, EXPLAIN ANALYZE in safe environments.
Use joins + merge to filter; use preload/includes to load.
Prefer EXISTS over DISTINCT when you only need to know if related rows exist.
Add covering indexes that match join + filter + sort in that order.
Put guardrails in CI and track results in PgHero/APM.

Your Ruby can stay clean—just make sure the SQL isn’t horrific.

Was this article helpful?

Your feedback helps us improve our content

Be the first to vote!

Report an Issue

|

Contact Us

|

Send Feedback

How We Verify Conversions

Every conversion shown on this site follows a strict verification process to ensure correctness:

Compare results on same dataset — We run both SQL and ActiveRecord against identical test data and verify results match
Check generated SQL with to_sql — We inspect the actual SQL Rails generates to catch semantic differences (INNER vs LEFT JOIN, WHERE vs ON, etc.)
Add regression tests for tricky cases — Edge cases like NOT EXISTS, anti-joins, and predicate placement are tested with multiple scenarios
Tested on Rails 8.1.1 — All conversions verified on current Rails version to ensure compatibility

Last updated: February 22, 2026

Try These Queries in Our Converter

See the SQL examples from this article converted to ActiveRecord—and compare the SQL Rails actually generates.

SQL to ActiveRecord Ruby & Rails Core SQL Patterns

0

63

Deep Dive into ActiveRecord

Complete Guides

Database-Specific

Leave a Response

Responses (0)

No responses yet

Be the first to share your thoughts

R

Raza Hussain

Full-stack developer specializing in Ruby on Rails, React, and modern JavaScript. 15+ years upgrading and maintaining production Rails apps. Led Rails 4/5 → 7 upgrades with 40% performance gains, migrated apps from Heroku to Render cutting costs by 35%, and built systems for StatusGator, CryptoZombies, and others. Available for Rails upgrades, performance work, and cloud migrations.

💼 15 years experience 📝 34 posts

ActiveRecord Query Looked Clean. Generated SQL Was Horrific. Learned to Check EXPLAIN Always.

Why beautiful Ruby code generates terrible SQL—and the EXPLAIN ANALYZE habit that prevents performance disasters

The pretty chain that hurts: `includes` + `where`

Fix 1: filter with `joins` + `merge` and preload separately

Always start with the plan: `EXPLAIN` vs `EXPLAIN ANALYZE`

Common Rails→SQL footguns and how to fix them

1) `includes` + filter ⇒ `LEFT OUTER JOIN` + `DISTINCT`

2) `distinct` to hide duplicates ⇒ unexpected sort

3) Sorting on associated columns without an index

4) Chaining scopes that hide ORs and functions

5) Counting with `includes`

A reproducible mini‑case: 2.1s → 140ms by changing the shape

Guardrails: make the plan visible in dev, CI, and prod

When to drop to Arel or raw SQL

A mistake I won’t repeat

Final Thoughts

Was this article helpful?

How We Verify Conversions

Try These Queries in Our Converter

Deep Dive into ActiveRecord

Complete Guides

Database-Specific

Leave a Response

Responses (0)

Raza Hussain

More on SQL to ActiveRecord

Read "Agile Web Development with Rails." Still Couldn't Write Queries. Needed Examples, Not Theory.

Taught SQL for 5 Years. Teaching ActiveRecord Broke All My Analogies. Had to Unlearn to Teach.

DBA Learning Rails: Everything Felt Wrong. SQL Was Better. Then Maintenance Hell Changed My Mind.

The pretty chain that hurts: includes + where

Fix 1: filter with joins + merge and preload separately

Always start with the plan: EXPLAIN vs EXPLAIN ANALYZE

Common Rails→SQL footguns and how to fix them

1) includes + filter ⇒ LEFT OUTER JOIN + DISTINCT

2) distinct to hide duplicates ⇒ unexpected sort

3) Sorting on associated columns without an index

4) Chaining scopes that hide ORs and functions

5) Counting with includes

A reproducible mini‑case: 2.1s → 140ms by changing the shape

Guardrails: make the plan visible in dev, CI, and prod

When to drop to Arel or raw SQL

A mistake I won’t repeat

Final Thoughts

Was this article helpful?

How We Verify Conversions

Try These Queries in Our Converter

Deep Dive into ActiveRecord

Complete Guides

Database-Specific

Leave a Response

Responses (0)

Raza Hussain

More on SQL to ActiveRecord

Read "Agile Web Development with Rails." Still Couldn't Write Queries. Needed Examples, Not Theory.

Taught SQL for 5 Years. Teaching ActiveRecord Broke All My Analogies. Had to Unlearn to Teach.

DBA Learning Rails: Everything Felt Wrong. SQL Was Better. Then Maintenance Hell Changed My Mind.

The pretty chain that hurts: `includes` + `where`

Fix 1: filter with `joins` + `merge` and preload separately

Always start with the plan: `EXPLAIN` vs `EXPLAIN ANALYZE`

1) `includes` + filter ⇒ `LEFT OUTER JOIN` + `DISTINCT`

2) `distinct` to hide duplicates ⇒ unexpected sort

5) Counting with `includes`