Greed: "Grabbing Every Relationship You Can"

In the previous article we discussed using entities as mere data containers. We mentioned that probably we started doing this because that’s how tutorials show us how to do it, or because that’s how the maker bundle generates them for us. This is probably another of those sins that has its origins on those causes: the sin of defining every possible relationship under the sun because we can.

Let’s think we are managing a library and we have Author and Book entities, but also we have Reservations for some books that are taken by Readers, and those reservations affect a book in a particular Library. In doctrine land — and keeping the domain model as simple as possible — this could look like this:

1
<?php
2

3
use Doctrine\Common\Collections\Collection;
4
use Doctrine\ORM\Mapping as ORM;
5

6
class Author
7
{
8
    /**
9
     * @param Collection<Books> $books
10
     */
11
    public function __construct(
12
        #[ORM\Id, ORM\Column(type: 'uuid')]
13
        public readonly Uuid $id,
14
        #[ORM\Column(type: 'string')]
15
        private(set) string $name,
16
        #[ORM\OneToMany(mappedBy: 'author')]
17
        private(set) Collection $books,
18
    ) {
19

20
    }
21
}
22

23
class Book
24
{
25
    /**
26
     * @param Collection<Reservation> $reservations
27
     */
28
    public function __construct(
29
        #[ORM\Id, ORM\Column(type: 'uuid')]
30
        public readonly Uuid $id,
31
        #[ORM\Column(type: 'string')]
32
        private(set) string $title,
33
        #[ORM\ManyToOne(inversedBy: 'books')]
34
        private(set) Author $author,
35
        #[ORM\ManyToMany(mappedBy: 'book')]
36
        private(set) Collection $reservations,
37
    ) {
38

39
    }
40
}
41

42
class Reader
43
{
44
    /**
45
     * @param Collection<Reservation> $books
46
     */
47
    public function __construct(
48
        #[ORM\Id, ORM\Column(type: 'uuid')]
49
        public readonly Uuid $id,
50
        #[ORM\Column(type: 'string')]
51
        private(set) string $name,
52
        #[ORM\OneToMany(mappedBy: 'reader')]
53
        private(set) Collection $reservations,
54
    ) {
55

56
    }
57
}
58

59
class Library
60
{
61
    public function __construct(
62
        #[ORM\Id, ORM\Column(type: 'uuid')]
63
        public readonly Uuid $id,
64
        #[ORM\Column(type: 'string')]
65
        private(set) string $name,
66
    ) {
67

68
    }
69
}
70

71
class Reservation
72
{
73
    public function __construct(
74
        #[ORM\Id, ORM\Column(type: 'uuid')]
75
        public readonly Uuid $id,
76
        #[ORM\ManyToOne(inversedBy: 'reservations')]
77
        private(set) Reader $reader,
78
        #[ORM\ManyToOne(inversedBy: 'reservations')]
79
        private(set) Book $book,
80
        #[ORM\ManyToOne(inversedBy: 'reservations')]
81
        private(set) Library $library,
82
        #[ORM\Column(type: 'datetime_immutable')]
83
        private(set) \DateTimeImmutable $dueDate,
84
    ) {
85

86
    }
87
}

At first glance this seems reasonable. We have mapped all the relationships we can think of. However, sooner than later we will realize that this is not a good idea.

To understand why, we need to understand how Doctrine loads relationship data. By default, it does it lazily. This means that when you fetch a Book from the database, the author and reservations collections are not loaded. They are loaded only when you access them. This is done using a feature called lazy loading and is implemented using a technique called proxy objects.

The problem is that if you are accessing these fields on a loop, lazy loading doesn’t help very much because it will issue one query for every field read you do. This is known as the N+1 problem. If you have 100 books, and you access the author of each book, you will end up doing 101 database queries. One for the books and one for each author. But that’s just a small part of the problem because it gets exponentially costly for each sub-nested field you access, so if you have a relationship tree of many levels, the number of queries can grow very quickly. This is a frequent occurrence when you are using Twig to render a table or a list, or when you are using Symfony Serializer to dump your entities into JSON.

So what can we do then? There is another mode of fetching called eager loading. This means that doctrine will load up the whole relationship tree as efficiently as possible, by reducing the number of queries. However, this is not a silver bullet either, because now you are over-fetching data and paying the cost of hydrating and instantiating all that data, even if you are not going to use it. If you have made everything a relationship, you are almost certainly over-fetching data, and it could lead to massive performance issues.

Strategies for Solving Under-Fetching and Over-Fetching

So, we have seen that under-fetching your data (using lazy loading) leads to the N+1 problem, and over-fetching your data (via eager loading) leads to potentially requesting a ton of data that you are not going to use, wasting massive resources in the process. The solution is simple right? Just fetch the data you need.

This is where I find loads of the criticism of ORMs comes from. It’s quite complicated to be selective on what you can do to load exactly the data that you want. Your ORM may not support this and get in your way, or the abstractions provided to do this may be too verbose or introduce some complexity.

However, let’s review what options Doctrine ORM gives us anyway.

Select what you need using DQL or QueryBuilder

This is fairly simple. You can map everything lazily, and then use DQL or the QueryBuilder class to join only the data you need for a particular view.

1
<?php
2

3
$qb = $entityManager->createQueryBuilder();
4
$books = $qb->select('b', 'a')  // Select book and author together
5
    ->from(Book::class, 'b')
6
    ->join('b.author', 'a')
7
    ->getQuery()
8
    ->getResult();

This is the most flexible option, but it’s also the most leaky one because it introduces database concerns into your domain layer.

For instance, if you are into DDD and your repositories are abstractions, then you need to communicate to your repository how you want to fetch the data. This means that you need to add a parameter to your repository method to indicate this fact to your repository implementation, and it looks terribly bad and leaky — an in-memory implementation would not need this information.

1
<?php
2

3
interface BookRepository
4
{
5
    public function findBooksByAuthor(Author $author): array;
6
}
7

8
class DoctrineBookRepository implements BookRepository
9
{
10
    public function allBooks(bool $withAuthor = true): array
11
    {
12
        $qb = $entityManager->createQueryBuilder();
13
        if ($withAuthor) {
14
            $qb->select('b', 'a')
15
            ->from(Book::class, 'b')
16
            ->join('b.author', 'a');
17
        } else {
18
            $qb->select('b')
19
            ->from(Book::class, 'b');
20
        }
21

22
        return $qb->getQuery()->getResult();
23
    }
24
}
25

26
class InMemoryBookRepository implements BookRepository
27
{
28
    public function allBooks(bool $withAuthor = true): array
29
    {
30
        // This parameter is completely ignored
31
    }
32
}

So while this “works,” it’s not the best solution: not only leaks implementation details into your domain layer, but it has the potential to become a maintenance nightmare if more and more of these boolean parameters are added.

Using Custom DTOs

This is somewhat a better option. You can map everything lazily, and then use DQL or the QueryBuilder class to select only the data you need and hydrate it into a DTO. This way you can keep your domain clean and your repositories abstract.

1
<?php
2

3
readonly class BookWithAuthorDetailsDto
4
{
5
    public function __construct(
6
        public readonly Uuid $bookId,
7
        public readonly Uuid $authorId,
8
        public readonly string $bookTitle,
9
        public readonly string $authorName,
10
    ) {
11
    }
12
}
13

14
class MyCustomReadModel
15
{
16
    public function findBooksByAuthor(AuthorId $authorId)
17
    {
18
        $qb = $entityManager->createQueryBuilder('books');
19

20
        return $qb->select('NEW BookWithAuthorDetailsDto(
21
            a.id as authorId,
22
            b.id as bookId,
23
            b.title as bookTitle,
24
            a.name as authorName,
25
        )')
26
        ->innerJoin('books.author', 'a')
27
        ->where('a.id = :authorId')
28
        ->setParameter('authorId', (string) $authorId)
29
        ->getQuery()
30
        ->getResult();
31
    }
32
}

But this is also a technique that must be used judiciously: these methods can quickly explode in complexity and in number, leaving your data access code hard to understand and maintain. Usually, if you go this route, you would probably want to separate this logic from the one living in your domain layer, since most likely you are doing this only for presentation reasons. But it’s better than the previous option for sure.

Also, this is not a silver bullet either: because you most likely want to leave this logic out of your standard Repository (because it doesn’t map an entity but a DTO), it becomes unnatural to use it in your domain layer – should you need to do it.

The Correct Approach

My preferred approach to all these complexities and tradeoffs is, by a mile, not map every association you can think of. And it’s not only my preferred solution, but even Doctrine best practices recommend this: constraint relationships as much as possible. This means you need to resist the urge of mapping every relationship you find and perform a conscious analysis of why you need to map a field as a relationship and whether the reasons you have are valid or not.

The key in making the right decision is to ask yourself, “Do I need this relationship to query data, or do I need it to enforce a business invariant?”

99.9% of the time, if you need a relationship because you want to query data, then you don’t really to map that relationship. If the only reason you have Reservations mapped as a relationship in Book is because you want to quickly get a list of all the reservations for a given Book then you don’t need a relationship; you need a repository method:

1
interface BookRepository
2
{
3
    public function findReservationsByBook(BookId $bookId): array
4
    {
5
        // Implementation details
6
    }
7
}

If you need a relationship because you want to enforce a business invariant, then you might need a relationship. I say might because even in that case you can consider (and should) other options, like pushing the constraint to the application service layer, because a careful consideration of the DDD Trilemma pushes you to do so.

But anyway, let’s pretend we want to enforce the invariant that a book cannot be reserved if there is a pending reservation already — which is a terrible example because I would never enforce this using a doctrine relationship, because of the DDD Trilemma mentioned above — but for the sake of the example bear with me.

In that case, you would want to map the relationship because you want to ensure that when you reserve a book, the book is not already reserved. You would want to do something like this:

1
class Book
2
{
3
    public function reserve(Reader $reader, Library $library, \DateTimeImmutable $dueDate): Reservation
4
    {
5
        if ($this->isReserved()) {
6
            throw new \DomainException('Book is already reserved');
7
        }
8

9
        $reservation = new Reservation($this, $reader, $library, $dueDate);
10
        $this->reservations->add($reservation);
11

12
        return $reservation;
13
    }
14

15
    public function isReserved(): bool
16
    {
17
        foreach ($this->reservations as $reservation) {
18
            if ($reservation->isActive()) {
19
                return true;
20
            }
21
        }
22

23
        return false;
24
    }
25
}

This is a complete and pure domain model — albeit not a very performant one depending on the number of reservations a book has.

So at the end of the day, I always come to the same conclusion when it comes to mapping relationships: even when you do need to enforce a business invariant, you should still ask yourself if there is another way to do it that doesn’t involve mapping a relationship.

In the end, in the projects I’ve worked on, and for the sake of simplicity and performance, my entities end up being scarcely mapped. This gives Doctrine the least amount of work and gives me maximum control. It also helps decouple the different concerns in my domain model. I found that the only relationships I end up using are ManyToOne eager loaded ones, and being very selective about them. I never use OneToMany or ManyToMany. In fact, none of my entity classes have reference to Doctrine\Common\Collections\Collection.

Okay, But What About The UI?

We all know the classic DDD rule that UI should not be a factor you should consider in designing your Domain Model, and how are you going to structure your code. While this is extremely good advice, in the real world we have our dear frontend engineers friends that are always keen to get as much data as possible in a single endpoint because they don’t like to manage state very much.

Apart from other techniques — like materialized views or projections as a specialized read model — if you do need to send the UI a big tree of data, you then need to construct it yourself. This is called the Presenter pattern: it’s basically a service that arranges data in a way that is easy to consume for the UI. It’s not a domain service, it’s not an application service, but rather is part of the infrastructure layer. It uses your repositories to manually hydrate custom trees of data that your UI needs.

For instance, in my repositories I have specialized methods that give me handy data structures that I can use in my presenters. Quite often, they look like this:

1
<?php
2

3
/**
4
 * Constructs a list of all the reservations of a Reader
5
 */
6
class ReservationsPresenter
7
{
8
    public function __construct(
9
        private readonly ReaderRepository $readers,
10
        private readonly Reservations $reservations,
11
        private readonly BookRepository $books,
12
    ) {
13

14
    }
15

16
    public function reservationsOfReader(ReaderId $readerId): array
17
    {
18
        $reader = $this->readers->ofId($readerId);
19

20
        /** @var Collection<Reservation> $reservations */
21
        $reservations = Collection::from($this->reservations->ofReader($reader->id));
22

23
        /** @var <string,Book> $books */
24
        $books = $this->books
25
            ->ofId(...$reservations->unique(static fn(Reservation $r) => $r->bookId))
26
            ->toMap(static fn(Book $b) => $b->id->toString());
27

28
        $view = [];
29
        foreach ($reservations as $reservation) {
30
            $view[] = [
31
                'id' => $reservation->id,
32
                'book_id' => $books[$reservation->bookId->toString()]->id->toString(),
33
                'book_title' => $books[$reservation->bookId->toString()]->title,
34
                'book_author' => $books[$reservation->bookId->toString()]->author->name,
35
                'due_date' => $reservation->dueDate->format('Y-m-d'),
36
            ];
37
        }
38

39
        return $view;
40
    }
41
}

Also, if you are into GraphQL, then you can bypass your Domain Repositories completely and just implement a GraphQL layer on top of Doctrine Dbal, that would give your UI and frontend engineers exactly what they want. But please be advised that GraphQL is most certainly not a silver bullet: it’s just another tool in your toolbox. There are massive tradeoffs involved in using GraphQL and you need to know what you are giving up to use it. For instance, GraphQL is pretty much uncacheable server side and very complex to cache (and invalidate) client side; and it almost always leads to unnecessary requests when frontend engineers use it — because of the way Apollo Client’s idiomatic model of colocating queries next to the components using them.

Conclusion

The main takeaway from this article is this: when dealing with ORM relationships, you are always choosing between convenience and flexibility. Mapping everything is very convenient, but it can lead to performance issues and unmaintainable code. Restraining your relationships to the bare minimum gives you full control, but it creates more manual work when you need to build specialized views of that data.

However, I would always choose the flexibility over the convenience. I’ve been bitten by my greed of mapping every relationship, thinking this makes my life easier, when in reality that greed will end up being my domain doom.