Django’s ORM lets you write database queries in Python, but "advanced" here means unlocking the database’s power without leaving Python, making your app faster and more expressive.
Let’s see how a Book and Author model might look.
from django.db import models
class Author(models.Model):
name = models.CharField(max_length=100)
birth_date = models.DateField(null=True, blank=True)
def __str__(self):
return self.name
class Book(models.Model):
title = models.CharField(max_length=200)
author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name='books')
published_date = models.DateField()
pages = models.IntegerField(default=0)
def __str__(self):
return self.title
Now, imagine you want to know how many books each author has written. The naive approach would be:
authors = Author.objects.all()
for author in authors:
book_count = author.books.count()
print(f"{author.name}: {book_count} books")
This works, but it’s inefficient. For N authors, it makes N+1 queries: one to get all authors, and then one query for each author to count their books. This is the N+1 query problem, and it’s a performance killer.
Annotations are the ORM’s answer. Instead of fetching authors and then counting books separately, you can tell Django to add the book count to each author object in a single query.
from django.db.models import Count
authors_with_book_counts = Author.objects.annotate(num_books=Count('books'))
for author in authors_with_book_counts:
print(f"{author.name}: {author.num_books} books")
This single annotate call generates SQL like:
SELECT "author"."id", "author"."name", "author"."birth_date", COUNT("book"."id") AS "num_books"
FROM "author"
LEFT OUTER JOIN "book" ON ("author"."id" = "book"."author_id")
GROUP BY "author"."id", "author"."name", "author"."birth_date"
ORDER BY "author"."id"
See how COUNT("book"."id") is right there in the SELECT clause? Django handles the join, the GROUP BY, and the aggregation, all in one database round trip. num_books is now an attribute directly on the Author object.
Aggregations are similar but operate on the entire queryset, not on individual objects within it. If you wanted to know the total number of books across all authors, or the average number of pages per book, you’d use aggregation.
from django.db.models import Avg, Sum, Count
total_books = Book.objects.count() # Simple aggregation
print(f"Total books: {total_books}")
average_pages = Book.objects.aggregate(avg_pages=Avg('pages'))
print(f"Average pages per book: {average_pages['avg_pages']:.2f}")
books_per_author = Author.objects.aggregate(avg_books_per_author=Avg('books')) # Avg on a related manager
print(f"Average books per author: {books_per_author['avg_books_per_author']:.2f}")
The aggregate function returns a dictionary. Notice how Avg('books') works on the related_name of the Author model. Django understands this and performs a subquery or a join to calculate the average number of books associated with each author.
Subqueries are where things get really powerful. They allow you to use the result of one query as part of another. A common use case is filtering. Let’s say you want to find authors who have written more than 5 books. You’ve already seen how to annotate:
from django.db.models import Count
authors_with_many_books = Author.objects.annotate(num_books=Count('books')).filter(num_books__gt=5)
This is often the most efficient way. However, if you needed to use the result of a subquery in a WHERE clause (e.g., in a more complex filtering scenario or when annotate alone isn’t sufficient), you’d use Subquery.
Let’s find books published after the author’s birth date. This isn’t directly annotatable.
from django.db.models import Subquery, OuterRef
# Find the birth date for the author of each book
# OuterRef('author_id') refers to the author_id of the book being processed in the outer query
books_published_after_birth = Book.objects.filter(
published_date__gt=Subquery(
Author.objects.filter(id=OuterRef('author_id')).values('birth_date')
)
)
The Subquery here is executed for each row of the outer Book query. OuterRef('author_id') is crucial; it tells Django that the id in the inner query (Author.objects.filter(id=OuterRef('author_id'))) should be linked to the author_id of the Book currently being considered by the outer query. The values('birth_date') ensures the subquery returns a single value (the birth date) for comparison. This generates SQL that might look something like:
SELECT ...
FROM "book"
WHERE "book"."published_date" > (
SELECT "author"."birth_date"
FROM "author"
WHERE "author"."id" = "book"."author_id"
LIMIT 1
)
This technique is invaluable when you need to compare a field in the outer query with a value derived from a related object within a filter condition, especially when that derived value isn’t easily obtained through annotate or aggregate. For instance, selecting books whose publication year is greater than the earliest publication year of any book by the same author.
The most surprising thing about Subquery is that it can be used not just in filter but also in annotate and values clauses, allowing you to bring scalar subquery results directly into your selected fields or use them for complex calculations. For example, annotating each book with the average page count of books by its author:
from django.db.models import Avg, Subquery, OuterRef
books_annotated_with_author_avg_pages = Book.objects.annotate(
author_avg_pages=Subquery(
Book.objects.filter(author_id=OuterRef('author_id')).aggregate(Avg('pages'))['pages__avg']
)
)
for book in books_annotated_with_author_avg_pages:
print(f'"{book.title}" by {book.author.name} (author avg pages: {book.author_avg_pages:.2f})')
This Subquery computes the average pages for the author of the current book. The aggregate(Avg('pages'))['pages__avg'] part is a common pattern to extract the single aggregated value from the dictionary returned by aggregate. The OuterRef('author_id') links the inner query to the outer Book’s author.
Understanding how annotate, aggregate, and Subquery (especially with OuterRef) translate to efficient SQL is key to mastering Django’s ORM. The next step is often exploring database functions and expressions for even more complex, database-side computations.