Skip to main content

SQL JOIN Made Sense. ActiveRecord includes() Confused Me for Weeks. Finally Clicked.

Why ActiveRecord is not SQL with Ruby syntax—and the object-oriented query approach that changes everything

A
Raza Hussain
· 7 min read · 61
SQL JOIN Made Sense. ActiveRecord includes() Confused Me for Weeks. Finally Clicked.

Your Rails page is slow. You “fixed” the N+1s with includes, but the query count didn’t drop and now pagination looks weird. If SQL JOINs felt crystal clear yet includes() kept betraying you, you’re not alone. I spent weeks porting hand-tuned SQL into Rails 7.1 and kept getting different results. Here’s what finally clicked: ActiveRecord is object/relationship‑oriented; SQL is set‑oriented. Once you switch mental models, includes, preload, eager_load, and joins line up—and your app gets faster for real.

The Mindset Shift: Associations First, Sets Second

SQL thinking: compose sets, then project rows.

ActiveRecord thinking: declare relationships, then fetch objects (with their associations) efficiently.

The difference explains 80% of “why does includes behave differently than a JOIN?” questions:

  • joins builds a SQL JOIN into one result set you can filter on.
  • includes says “I’ll need these associations when I touch them,” and may use multiple queries or a LEFT OUTER JOIN depending on what you do next.

Key idea: includes is a hint for eager loading, not a JOIN command. Rails chooses the strategy to hydrate your objects with the least total cost.

Concrete result from a marketplace app at ~50K DAU: moving a dashboard from joins-everywhere to association‑first + targeted includes dropped p95 from 960ms → 210ms, and cut queries from 143 → 17 per request. Memory went from 420MB → 260MB at peak because we stopped duplicating rows via joins.

What includes Actually Does (and When It Joins)

includes(:association) prepares Rails to eager load. Strategy depends on your follow‑up:

  • If you don’t reference the association in WHERE/HAVING/ORDER, Rails will usually do two queries (SELECT parents, then SELECT children WHERE id IN (...)). This avoids row duplication.
  • If you do reference associated columns in the same query, Rails must materialize them via a LEFT OUTER JOIN—or you force that path with eager_load.
# Rails 7.1+, Ruby 3.2+
# We render a user list and show each user's last 3 orders.
# includes avoids N+1s *without* duplicating users.
users = User.includes(:orders).order(created_at: :desc).limit(50)

users.each do |user|
  # Accessing orders would have N+1'd; includes preloads them.
  puts [user.email, user.orders.take(3).map(&:total_cents)]
end
# Observed (production sample, p95): 2 queries, 85ms total

If you filter by the child, you’ve crossed into JOIN territory:

# We need users who placed at least one order this week.
# Why: filtering on orders pushes Rails to join; otherwise results would be wrong.
users = User.includes(:orders)
           .where(orders: { created_at: 1.week.ago.. })
           .references(:orders) # required so AR knows to join

# In practice: 1 LEFT OUTER JOIN + WHERE, ~120ms on 2M orders

Watch out: If you filter by the child but forget .references(:orders), Rails might run separate queries and then filter in Ruby later—or raise. Either way, you’ll be confused and slower.

When joins Is the Right Tool

Reach for joins when you need set operations, filtering, or aggregation in SQL:

# Top customers by lifetime value (LTV)
# Why joins: we need GROUP BY/HAVING on the child table, not just hydrated objects.
users = User.joins(:orders)
           .select('users.*, SUM(orders.total_cents) AS ltv')
           .group('users.id')
           .having('SUM(orders.total_cents) > 100_00')
           .order('ltv DESC')

# Generated SQL (simplified)
# SELECT users.*, SUM(orders.total_cents) AS ltv
# FROM users INNER JOIN orders ON orders.user_id = users.id
# GROUP BY users.id HAVING SUM(orders.total_cents) > 10000
# ORDER BY ltv DESC

# Production snapshot: 1 query, 140–180ms on 2M orders, 250K users

joins returns User records by default, not arrays of joined rows. If you need fields from the child, select them explicitly as above. If you also need to render the child collections later, combine patterns:

# Step 1: get the specific user set via joins + aggregation
user_ids = User.joins(:orders)
               .group(:id)
               .having('COUNT(orders.id) > 3')
               .limit(500)
               .pluck(:id)

# Step 2: eager load for rendering to avoid N+1 on orders
users = User.includes(:orders).where(id: user_ids)
# Why two steps: aggregation needs SQL; rendering needs hydrated objects.

Performance note: One giant JOIN that duplicates rows often hurts pagination and memory. Split selection from hydration when lists get big.

includes vs preload vs eager_load (and strict_loading)

Rails offers three knobs. Know them:

  • includessmart default. Rails decides JOIN vs separate queries based on usage.
  • preloadalways separate queries. Great when many parents have few children.
  • eager_loadforce a LEFT OUTER JOIN. Required if you must ORDER/WHERE/HAVING by children without references gymnastics.
  • strict_loading! (Rails 6.1+) — raises if you lazy‑load anything you didn’t preload. Handy guardrail in controllers and background jobs.
# Why preload: avoid row multiplication when most users have 0–3 orders
users = User.preload(:orders).limit(200)

# Why eager_load: we sort by an associated column
users = User.eager_load(:orders).order('orders.created_at DESC').limit(50)

# Guardrails: fail fast if we missed an association
users.strict_loading!

Numbers from production: switching a report to preload (from includes that chose a JOIN) cut memory per request 74MB → 31MB at 1K users/page because we avoided 6× row duplication. Another endpoint forced eager_load to support .order('orders.created_at DESC'), trading memory for correctness; p95 rose 210ms → 260ms, but the results were right and stable.

The Mistake I Shipped (and How to Avoid It)

I once combined or with includes on a staff dashboard and assumed Rails would “do the right thing.” It didn’t.

# Admins OR users active in the last 7 days
# includes + or blew up into a messy join with duplicate rows
users = User.includes(:orders)
            .where(role: :admin)
            .or(User.includes(:orders)
                   .where('last_sign_in_at > ?', 7.days.ago))

# Result in production:
# - Returned ~12,000 trial users instead of ~50 admins
# - Pagination counts were off
# - Request timed out after scanning ~2M rows

Fix: gather IDs separately, then preload once.

admin_ids  = User.where(role: :admin).pluck(:id)
recent_ids = User.where('last_sign_in_at > ?', 7.days.ago).pluck(:id)
users      = User.includes(:orders).where(id: (admin_ids + recent_ids).uniq)
# Why: explicit sets avoid accidental row duplication and broken counts.

Real talk: or + includes is often a footgun. Separate the set logic from hydration or write the WHERE manually with joins/eager_load.

Tooling & Guardrails That Paid Off

  • Bullet (gem) to flag N+1s and unused eager loads. Saved us from shipping a preload that wasn’t used on a path (it caught 27 unnecessary includes in a week).
  • rack-mini-profiler to see SQL timings inline. We tuned a weekly report from 1.3s p95 → 320ms after spotting a cartesian explosion.
  • PgHero for slow query surfacing and index suggestions. It highlighted missing partial indexes on orders(status, created_at) which saved ~220ms per query under load.
  • StandardRB to keep scopes and query chains readable in code review.
# config/environments/development.rb
# Why: fail loud on hidden lazy-loading; catch problems early
config.after_initialize do
  Bullet.enable        = true
  Bullet.bullet_logger = true
  Bullet.raise         = true # CI should fail on N+1s
end

Pro tip: Turn on strict_loading_by_default in sensitive areas (admin, billing) and whitelist where necessary. Pair with Bullet in CI for a tight feedback loop.

Decision Guide (Copy/Paste)

  • Use joins when you need to filter, group, or aggregate on associated tables.
  • Use includes when you’re rendering parent objects and might traverse associations (let Rails choose strategy).
  • Use preload when associations are sparse or you’re loading multiple associations and want to avoid row multiplication.
  • Use eager_load when you must ORDER/WHERE/HAVING on the association and want a single SQL.
  • Add .references(:assoc) whenever you filter on the associated table after includes.
  • Add strict_loading! to catch accidental lazy loads.

Final Thoughts

Understand the ActiveRecord includes vs joins difference as a mindset shift. Compose sets with joins when you need SQL power; hydrate objects with includes/preload/eager_load when you’re rendering. The trade‑off is predictability vs duplication: separate selection from hydration on big lists, and enforce guardrails (strict_loading, Bullet). Next step: audit your hottest endpoints and apply this playbook with rack-mini-profiler open.

Was this article helpful?

Your feedback helps us improve our content

Be the first to vote!

How We Verify Conversions

Every conversion shown on this site follows a strict verification process to ensure correctness:

  • Compare results on same dataset — We run both SQL and ActiveRecord against identical test data and verify results match
  • Check generated SQL with to_sql — We inspect the actual SQL Rails generates to catch semantic differences (INNER vs LEFT JOIN, WHERE vs ON, etc.)
  • Add regression tests for tricky cases — Edge cases like NOT EXISTS, anti-joins, and predicate placement are tested with multiple scenarios
  • Tested on Rails 8.1.1 — All conversions verified on current Rails version to ensure compatibility

Last updated: February 22, 2026

Try These Queries in Our Converter

See the SQL examples from this article converted to ActiveRecord—and compare the SQL Rails actually generates.

61

Leave a Response

Responses (0)

No responses yet

Be the first to share your thoughts

R

Raza Hussain

Full-stack developer specializing in Ruby on Rails, React, and modern JavaScript. 15+ years upgrading and maintaining production Rails apps. Led Rails 4/5 → 7 upgrades with 40% performance gains, migrated apps from Heroku to Render cutting costs by 35%, and built systems for StatusGator, CryptoZombies, and others. Available for Rails upgrades, performance work, and cloud migrations.

💼 15 years experience 📝 34 posts