Developer Diary: Taming Doctrine’s 2000 Flushes

I’ve just started a new side project that will involve finding some creative solutions to tough problems. I thought it would be neat to share what I learn as I go, so here goes…

For my project I decided to use the Doctrine 2 ORM to manage my data layer. We also use this at work, so the biggest reason I chose this was to be able to learn more about Doctrine to help me in my job. But this decision also makes sense for my project because my entity relationships will likely be fairly straightforward for the most part and using an ORM will allow me to make a lot of progress very quickly without (I hope) causing me lots of trouble later on.

One of the odd little things about Doctrine is how it handles persistence. The rules are a bit complex and non-intuitive. Here’s a summary:

  • Changes are saved when you call flush() on the entity manager, and at no other time.
  • Objects that were already in the database that you made changes to will be updated at this time with no special instructions.
  • New objects will not be saved by default.
  • If you want a new object saved, you need to persist() it.
  • If something would have saved and you don’t want it to, you need to detach() it.
  • Objects that are linked to each other through association fields can cause problems when one part is created or deleted and one isn’t – this can cause you to error out nastily and close your entity manager.
  • You can avoid this by creating cascade rules in your model classes, but this can lead to updates happening that you didn’t intend if you aren’t careful!

Clear as mud, right? We kind of have a love and hate (sometimes hate and hate) relationship with Doctrine at work because the hardest 20% of things we need to do with it tend to be very hard and annoying. One of the annoying things is that when a function needs to update an object, it’s not always obvious if we should flush. If we flush, maybe we’re committing a temporary change that shouldn’t be committed. Or maybe our data is still in an inconsistent state and the entity manager will crash. If we don’t flush, maybe we’re creating a burden on the controller layer to remember to do it for us (which isn’t what our controller layer should care about). We’ve sometimes resorted to having a flush flag parameter to a function to tell it whether to flush or not. This is complexity that we’re better off without!

It tends to be easier to solve these problems by adding way more flushes than we strictly need, which is not good for performance. The data in my side project is simple, but there’s a lot of it, so I wanted to mitigate this problem. This is my first stab at a solution:

First of all, I’m not using Doctrine’s entity manager directly. I’ve created an abstraction layer called the Data Manager which my code calls to perform all entity manager actions. This not only gives me better options to tame Doctrine’s rough spots, but gives me a fighting chance to yank Doctrine out and replace it if I decide it’s causing me significant problems.

<?php

class BB_Manager_Data
{
     static protected $instance = null;
     static protected $em = null;
     ...

If I want to go old school and just flush, I have a function for that. If I just call it with no preliminaries, it’s just a pass through. But I’ve borrowed the idea of SQL transactions to handle chunks of related changes. If I call startTransaction(), the flush() function will do nothing. When I call commit(), flush() is reenabled, and then called. This makes the idea of transactions more explicit and allows me to be more flexible about how I commit changes without cluttering my method signatures. Here’s what the first version of my code looked like:

static protected $disableFlush = false;

static public function startTransaction()
{
     self::$disableFlush = true;
}

static public function commit()
{
     self::$disableFlush = false;
     self::flush();
}

static public function flush()
{
     if (!self::$disableFlush) {
          $em = self::getEntityManager();
          $em->flush();
     }
}

Ah, but Doctrine supports transactions! It’s just a matter of calling $em->getConnection()->beginTransaction(); to start and $em->getConnection()->commit(); to complete or $em->getConnection()->rollback(); to cancel. Until commit() is called, the flushes won’t actually flush. So another version of the above would be:

static public function startTransaction()
{
     $em = self::getEntityManager();
     $em->getConnection()->beginTransaction();
}

static public function commit()
{
     $em = self::getEntityManager();
     $em->getConnection()->commit();
}

static public function flush()
{
     $em = self::getEntityManager();
     $em->flush();
}

Or you could just use Doctrine’s transactional functions outside of an abstraction layer.

Which of these versions is better depends on how Doctrine implements its transactional functionality (which I don’t know yet) and what your particular needs are. My initial version may actually be more performant in some cases if Doctrine’s transactions are just pass-throughs to your DBMS’s transactions. But it may be a micro-optimization or none at all. I may dig a little further or do some A/B tests to gather some information about that.

I think the transactional capabilities are an underutilized tool in Doctrine 2′s toolbag that can provide useful flexibility, group related changes, and minimize database writes. And the transactional paradigm (whatever the implementation) goes well with many use cases to help tame those flushes and optimize database writes.

 

Comments

comments

Powered by Facebook Comments