- Home
- Blog
- Ruby & Rails Core
- SQL JOIN Made Sense. ActiveRecord includes() Confused Me for Weeks. Finally Clicked.
SQL JOIN Made Sense. ActiveRecord includes() Confused Me for Weeks. Finally Clicked.
Why ActiveRecord is not SQL with Ruby syntax—and the object-oriented query approach that changes everything
Your Rails page is slow. You “fixed” the N+1s with includes, but the query count didn’t drop and now pagination looks weird. If SQL JOINs felt crystal clear yet includes() kept betraying you, you’re not alone. I spent weeks porting hand-tuned SQL into Rails 7.1 and kept getting different results. Here’s what finally clicked: ActiveRecord is object/relationship‑oriented; SQL is set‑oriented. Once you switch mental models, includes, preload, eager_load, and joins line up—and your app gets faster for real.
The Mindset Shift: Associations First, Sets Second
SQL thinking: compose sets, then project rows.
ActiveRecord thinking: declare relationships, then fetch objects (with their associations) efficiently.
The difference explains 80% of “why does includes behave differently than a JOIN?” questions:
-
joinsbuilds a SQL JOIN into one result set you can filter on. -
includessays “I’ll need these associations when I touch them,” and may use multiple queries or a LEFT OUTER JOIN depending on what you do next.
Key idea:
includesis a hint for eager loading, not a JOIN command. Rails chooses the strategy to hydrate your objects with the least total cost.
Concrete result from a marketplace app at ~50K DAU: moving a dashboard from joins-everywhere to association‑first + targeted includes dropped p95 from 960ms → 210ms, and cut queries from 143 → 17 per request. Memory went from 420MB → 260MB at peak because we stopped duplicating rows via joins.
What includes Actually Does (and When It Joins)
includes(:association) prepares Rails to eager load. Strategy depends on your follow‑up:
- If you don’t reference the association in
WHERE/HAVING/ORDER, Rails will usually do two queries (SELECT parents, thenSELECT children WHERE id IN (...)). This avoids row duplication. - If you do reference associated columns in the same query, Rails must materialize them via a LEFT OUTER JOIN—or you force that path with
eager_load.
# Rails 7.1+, Ruby 3.2+
# We render a user list and show each user's last 3 orders.
# includes avoids N+1s *without* duplicating users.
users = User.includes(:orders).order(created_at: :desc).limit(50)
users.each do |user|
# Accessing orders would have N+1'd; includes preloads them.
puts [user.email, user.orders.take(3).map(&:total_cents)]
end
# Observed (production sample, p95): 2 queries, 85ms total
If you filter by the child, you’ve crossed into JOIN territory:
# We need users who placed at least one order this week.
# Why: filtering on orders pushes Rails to join; otherwise results would be wrong.
users = User.includes(:orders)
.where(orders: { created_at: 1.week.ago.. })
.references(:orders) # required so AR knows to join
# In practice: 1 LEFT OUTER JOIN + WHERE, ~120ms on 2M orders
Watch out: If you filter by the child but forget
.references(:orders), Rails might run separate queries and then filter in Ruby later—or raise. Either way, you’ll be confused and slower.
When joins Is the Right Tool
Reach for joins when you need set operations, filtering, or aggregation in SQL:
# Top customers by lifetime value (LTV)
# Why joins: we need GROUP BY/HAVING on the child table, not just hydrated objects.
users = User.joins(:orders)
.select('users.*, SUM(orders.total_cents) AS ltv')
.group('users.id')
.having('SUM(orders.total_cents) > 100_00')
.order('ltv DESC')
# Generated SQL (simplified)
# SELECT users.*, SUM(orders.total_cents) AS ltv
# FROM users INNER JOIN orders ON orders.user_id = users.id
# GROUP BY users.id HAVING SUM(orders.total_cents) > 10000
# ORDER BY ltv DESC
# Production snapshot: 1 query, 140–180ms on 2M orders, 250K users
joins returns User records by default, not arrays of joined rows. If you need fields from the child, select them explicitly as above. If you also need to render the child collections later, combine patterns:
# Step 1: get the specific user set via joins + aggregation
user_ids = User.joins(:orders)
.group(:id)
.having('COUNT(orders.id) > 3')
.limit(500)
.pluck(:id)
# Step 2: eager load for rendering to avoid N+1 on orders
users = User.includes(:orders).where(id: user_ids)
# Why two steps: aggregation needs SQL; rendering needs hydrated objects.
Performance note: One giant
JOINthat duplicates rows often hurts pagination and memory. Split selection from hydration when lists get big.
includes vs preload vs eager_load (and strict_loading)
Rails offers three knobs. Know them:
-
includes— smart default. Rails decides JOIN vs separate queries based on usage. -
preload— always separate queries. Great when many parents have few children. -
eager_load— force a LEFT OUTER JOIN. Required if you mustORDER/WHERE/HAVINGby children withoutreferencesgymnastics. -
strict_loading!(Rails 6.1+) — raises if you lazy‑load anything you didn’t preload. Handy guardrail in controllers and background jobs.
# Why preload: avoid row multiplication when most users have 0–3 orders
users = User.preload(:orders).limit(200)
# Why eager_load: we sort by an associated column
users = User.eager_load(:orders).order('orders.created_at DESC').limit(50)
# Guardrails: fail fast if we missed an association
users.strict_loading!
Numbers from production: switching a report to preload (from includes that chose a JOIN) cut memory per request 74MB → 31MB at 1K users/page because we avoided 6× row duplication. Another endpoint forced eager_load to support .order('orders.created_at DESC'), trading memory for correctness; p95 rose 210ms → 260ms, but the results were right and stable.
The Mistake I Shipped (and How to Avoid It)
I once combined or with includes on a staff dashboard and assumed Rails would “do the right thing.” It didn’t.
# Admins OR users active in the last 7 days
# includes + or blew up into a messy join with duplicate rows
users = User.includes(:orders)
.where(role: :admin)
.or(User.includes(:orders)
.where('last_sign_in_at > ?', 7.days.ago))
# Result in production:
# - Returned ~12,000 trial users instead of ~50 admins
# - Pagination counts were off
# - Request timed out after scanning ~2M rows
Fix: gather IDs separately, then preload once.
admin_ids = User.where(role: :admin).pluck(:id)
recent_ids = User.where('last_sign_in_at > ?', 7.days.ago).pluck(:id)
users = User.includes(:orders).where(id: (admin_ids + recent_ids).uniq)
# Why: explicit sets avoid accidental row duplication and broken counts.
Real talk:
or+includesis often a footgun. Separate the set logic from hydration or write theWHEREmanually withjoins/eager_load.
Tooling & Guardrails That Paid Off
-
Bullet (gem) to flag N+1s and unused eager loads. Saved us from shipping a
preloadthat wasn’t used on a path (it caught 27 unnecessary includes in a week). - rack-mini-profiler to see SQL timings inline. We tuned a weekly report from 1.3s p95 → 320ms after spotting a cartesian explosion.
-
PgHero for slow query surfacing and index suggestions. It highlighted missing partial indexes on
orders(status, created_at)which saved ~220ms per query under load. - StandardRB to keep scopes and query chains readable in code review.
# config/environments/development.rb
# Why: fail loud on hidden lazy-loading; catch problems early
config.after_initialize do
Bullet.enable = true
Bullet.bullet_logger = true
Bullet.raise = true # CI should fail on N+1s
end
Pro tip: Turn on
strict_loading_by_defaultin sensitive areas (admin, billing) and whitelist where necessary. Pair with Bullet in CI for a tight feedback loop.
Decision Guide (Copy/Paste)
- Use
joinswhen you need to filter, group, or aggregate on associated tables. - Use
includeswhen you’re rendering parent objects and might traverse associations (let Rails choose strategy). - Use
preloadwhen associations are sparse or you’re loading multiple associations and want to avoid row multiplication. - Use
eager_loadwhen you must ORDER/WHERE/HAVING on the association and want a single SQL. - Add
.references(:assoc)whenever you filter on the associated table afterincludes. - Add
strict_loading!to catch accidental lazy loads.
Final Thoughts
Understand the ActiveRecord includes vs joins difference as a mindset shift. Compose sets with joins when you need SQL power; hydrate objects with includes/preload/eager_load when you’re rendering. The trade‑off is predictability vs duplication: separate selection from hydration on big lists, and enforce guardrails (strict_loading, Bullet). Next step: audit your hottest endpoints and apply this playbook with rack-mini-profiler open.
Was this article helpful?
Your feedback helps us improve our content
How We Verify Conversions
Every conversion shown on this site follows a strict verification process to ensure correctness:
- Compare results on same dataset — We run both SQL and ActiveRecord against identical test data and verify results match
-
Check generated SQL with
to_sql— We inspect the actual SQL Rails generates to catch semantic differences (INNER vs LEFT JOIN, WHERE vs ON, etc.) - Add regression tests for tricky cases — Edge cases like NOT EXISTS, anti-joins, and predicate placement are tested with multiple scenarios
- Tested on Rails 8.1.1 — All conversions verified on current Rails version to ensure compatibility
Last updated: February 22, 2026
Try These Queries in Our Converter
See the SQL examples from this article converted to ActiveRecord—and compare the SQL Rails actually generates.
Deep Dive into ActiveRecord
Raza Hussain
Full-stack developer specializing in Ruby on Rails, React, and modern JavaScript. 15+ years upgrading and maintaining production Rails apps. Led Rails 4/5 → 7 upgrades with 40% performance gains, migrated apps from Heroku to Render cutting costs by 35%, and built systems for StatusGator, CryptoZombies, and others. Available for Rails upgrades, performance work, and cloud migrations.
More on Joins & Associations
SQL Certification on Resume. Rails Interview Failed. Knew Databases. Didn't Know ActiveRecord.
SQL cert on your resume but Rails interview still flopped? Learn the ActiveRecord skills interviews test—associations, eager loading, batching, and when to use raw SQL.
Read "Agile Web Development with Rails." Still Couldn't Write Queries. Needed Examples, Not Theory.
Books teach concepts. You need examples. See SQL vs ActiveRecord side-by-side, when to use scopes/Arel/SQL, and how to ship maintainable queries fast.
Taught SQL for 5 Years. Teaching ActiveRecord Broke All My Analogies. Had to Unlearn to Teach.
How to teach ActiveRecord to SQL developers: relations over strings, scopes, preloading vs joins, and when to use Arel/SQL—with concrete metrics and code.
Leave a Response
Responses (0)
No responses yet
Be the first to share your thoughts