You can think of select_related and prefetch_related as Django’s way of optimizing database queries when you’re dealing with relationships between your models. The most surprising thing is that neither of them actually optimizes your queries in the sense of making the SQL itself smarter; they optimize by reducing the number of queries you execute.
Let’s see this in action. Imagine you have two Django models, Author and Book, where an Author can have many Books.
from django.db import models
class Author(models.Model):
name = models.CharField(max_length=100)
def __str__(self):
return self.name
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, related_name='books', on_delete=models.CASCADE)
def __str__(self):
return self.title
Now, let’s say you want to fetch all books and display their titles along with their author’s names. Without any optimization, your Django code might look like this:
books = Book.objects.all()
for book in books:
print(f"Book: {book.title}, Author: {book.author.name}")
If you run this with Django Debug Toolbar or by inspecting the SQL queries, you’ll see that for every single Book object, Django executes a separate SELECT statement to fetch the related Author’s data. If you have 100 books, you’ll get 100 queries for books and then 100 additional queries for their authors. This is the N+1 query problem, and it’s a performance killer.
This is where select_related comes in. If you modify the query like this:
books = Book.objects.select_related('author').all()
for book in books:
print(f"Book: {book.title}, Author: {book.author.name}")
Django will now execute a single SELECT statement. It achieves this by performing a SQL JOIN operation. Instead of fetching books and then authors separately, it fetches books and their associated author data in one go. The SQL might look something like this:
SELECT
"book"."id", "book"."title", "book"."author_id",
"author"."id", "author"."name"
FROM "book"
INNER JOIN "author" ON ("book"."author_id" = "author"."id")
Notice how it selects columns from both book and author tables in a single query. When you access book.author.name, Django doesn’t need to hit the database again; it already has the author’s name from the joined result. select_related is efficient for "one-to-one" or "foreign-key" relationships because it uses a single SQL JOIN. It’s essentially a shortcut to a JOIN.
Now, what about prefetch_related? This is designed for "many-to-many" or "reverse foreign-key" relationships, where a JOIN in a single query can become unwieldy or impossible. Let’s say you want to fetch all authors and then, for each author, list all their books.
Without optimization:
authors = Author.objects.all()
for author in authors:
print(f"Author: {author.name}, Books: {[book.title for book in author.books.all()]}")
This again leads to the N+1 problem: one query for authors, and then one query for each author’s books.
Using prefetch_related:
authors = Author.objects.prefetch_related('books').all()
for author in authors:
print(f"Author: {author.name}, Books: {[book.title for book in author.books.all()]}")
Here’s where prefetch_related differs fundamentally from select_related. It doesn’t use a JOIN. Instead, it performs two separate queries:
- A query to fetch all the
Authorobjects. - A second query to fetch all the
Bookobjects related to those authors. This second query uses anINclause.
For example, if you had authors with IDs 1, 5, and 12, the queries would look roughly like this:
-- Query 1: Fetch authors
SELECT "author"."id", "author"."name" FROM "author";
-- Query 2: Fetch all books for those authors
SELECT "book"."id", "book"."title", "book"."author_id"
FROM "book"
WHERE "book"."author_id" IN (1, 5, 12);
Django then takes these results and "joins" them in Python memory. It iterates through the authors and, for each author, finds all the books from the second query whose author_id matches that author’s ID. This is why it’s called "prefetching" – it prefetches all related items in a separate, efficient query.
The one thing most people don’t fully grasp is how prefetch_related handles multiple related lookups. If you have Author.objects.prefetch_related('books', 'books__publisher'), Django will execute one query for authors, one query for all books, and then a third query for all publishers related to those books. It doesn’t try to be clever and combine them into fewer queries if the relationships are complex. It simply executes the minimum number of queries needed to gather all the data for each distinct prefetch target.
Understanding this distinction between JOINs (for select_related) and separate queries with Python-level assembly (for prefetch_related) is key to optimizing your Django ORM usage.
The next concept you’ll likely run into is how to handle even more complex relationships or how to combine these optimizations with other ORM features.