Symfony 2 : Avoid doctrine’s memory leaks with commands

While running some specific long-time import processes through Doctrine by running a Symfony 2 command, I noticed some memory leaks.

I’m using MongoDB on top of doctrine.

For example:
First import : 22 MB used
Last import (100k+): 500+ MB used

Couple of tips:

  1. Call $dm->clear() regularly to clean up doctrine cache. But, do not forget that all objects in cache will be swept
  2. This is the REAL tip: add –no-debug to your command line
php app/console --no-debug my:bundle:command

The 1st tip helps a little bit, the 2nd one is very efficient. And it has another impact: it speeds up the execution.

SF2 : How to create a Service in 5 minutes ? Why ?

Let’s have a shortcut : a service is a set of specialized functions, packaged into a specific bundle. On a global approach, it’s like a dedicated task.

In my dev, I love service. Some of them I have written are for example in charge of:

  • SMS sending (using multiple providers, completely transparent)
  • Geocoding an address (using a cache, but also again trying multiple geocoding engines),
  • Processing emails,
  • Generating documents,

Let’s start

  1. Let’s create a new bundle (called AB3/ServiceBundle).
    php app/console generate:bundle

    Usually, I use “annotation” to declare the project.

  2. Create a new controller (MainController) in AB3/ServiceBundle/Controller directory with couple of methods
    namespace AB3\ServiceBundle\Controller;
    class ServiceController {
      public function funcOne() { return "funcOne"; }
      public function funcTwo() { return "funcTwo"; }
    }
    
  3. Here is the main and most important part: declare your service. By default, Symfony2 uses xml file, which I don’t like. So, let’s do it using annotation instead, which is more convenient IMO.
    In AB3\ServiceBundle\DependencyInjection, edit “AB3ServiceExtension.php” and replace

    $loader = new Loader\XmlFileLoader($container, new FileLocator(__DIR__.'/../Resources/config'));
    $loader->load('services.xml');

    by

    $loader = new Loader\YamlFileLoader($container, new FileLocator(__DIR__.'/../Resources/config'));
    $loader->load('services.yml');
  4. In AB3\ServiceBundle\Resources\config. Delete services.xml and create a new services.yml file instead, where we will declare the service entry points as below:
    parameters:
        my.service.class: AB3\ServiceBundle\Controller\CacheController
    
    services:
        my.service:
            class: "%gesloc.cache.class%"

    2 important notes:
    – Under parameters, we declare the class name.
    – But the real stuff is under services: there, we declare the service name (my.service) and the class it belongs to

  5. Then to consume your service from any external controller:
    public function indexAction() {
    [...]
        $this->get('my.service')->funcOne();
        $this->get('my.service')->funcTwo();
    [...]
    }

    And that’s all !!!

Symfony 2 : Using Beanstalk with “Command”

Synopsis
Installation, configuration end use (through a command line) of beanstalkd with Symfony 2.5+

1) Introduction

Beanstalk is a simple, fast work queue.

Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.

What I like with beanstalk : simple and highly efficient queue, easy to make a cluster of “processors”.

Where I use it: when I need to “transmit” information to a specific set “processors” which I don’t know if it is online or not, and when I want the information to be processed only once (“Point-to-Point” processing).

For “Point-to-MultiPoint” processing, I use message broker, such as “Mosquitto” or RabbitMQ, or “Redis” (even if I prefer Redis in its core mode : fast storage of key/value data

2) Installation

I assume beanstalk is installed. Otherwise, see http://kr.github.io/beanstalkd/

With Symfony2.5+, I use leezy/pheanstalk. To install it, as always, use composer.
In composer.json, add

{ ...
  "require": { 
    ...
    "leezy/pheanstalk-bundle": "2.*",
  }
}

Then,

composer update

it will install pheanstalk then leezy/pheanstalk service for Symfony2.
3) Configuration

In app/config.php, configure the service. Here, for example, “server1” has a beanstalkd instance running.

# leezy pheanstalk
leezy_pheanstalk:
    enabled: true
    pheanstalks:
        server1:
            server: localhost
            port: 11300
            timeout: 60

See Leezy/Pheanstalk for more informations on installation and configuration

4) Example within a “Command”

<?php
namespace AB3\GlobalBundle\Command;

use Symfony\Bundle\FrameworkBundle\Command\ContainerAwareCommand;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;


class BeanstalkCommand extends ContainerAwareCommand {

    const TUBE_NAME = "test"; // Tube to Watch
    const WATCH_TIMEOUT = 60; // Watch Tube expires after 60s
    // OutputInterface
    protected $output;
    

    protected function configure() {
        $this   ->setName('ab3:command:beanstalk')
                ->setDescription('Pull Job from Beanstalk Beanstalk');
        ;
    }
    
    protected function execute(InputInterface $input, OutputInterface $output) {
        $this->output = $output;
        $bsClient = $this->getContainer()->get("leezy.pheanstalk.server1");
        while(true) { // loop until end
            $job = $bsClient
                ->watch(self::TUBE_NAME)
                ->ignore('default')
                ->reserve(self::WATCH_TIMEOUT); 
            /** I do recommend to setup a timeout above. It avoids a tricky bug **/
           if ($job) { // something to do
             $this->processJobData($job->getData()); // process job data
             $bsClient->delete($job); // remove job
           }
        }
   }
    
   protected function processJobData($data) {
     $this->output->writeln($data);
   }
 }

Just couple of notes:
*) If job processing should take long, then you should remove the job before processing. So the

if (job) { ...}

becomes

if ($job) {
  $jobData = $job->getData();
  $bsClient->delete($job); // remove job
  this->processJobData($jobData);
}

Why? Mostly to remove the lock and the job. See “What does DEADLINE_SOON mean?”for more details.
*) I always put a “Watch” timeout to avoid a possible freeze bug (I got it a year ago).

THANKS TO…. All contributors of Beanstalkd, Pheanstalk, Leezy/Pheanstalk

Symfony 2 : A “parent” controller for performance – part 1

I’ve been using Symfony 2 (SF2) for a while (since 2.0 release in fact), especially with MongoDB.

Couple of things I have noticed:

  • Service “name” could change. If means, for example, that your ‘doctrine.blahbla.mongodb’ could become one day ‘doctrine.blah.mongo.db’. It happens between SF 2.1 and 2.6, meaning you have to rewrite access to “DocumentManager”
    (you know, the famous

    $dm = $this->get('doctrine.....mongodb')
  • Service retrieval is quite slow and could be speed up through “caching” and so on…
  • Translate some strings from controller piss me off because of the “$this->get(‘translator’)->trans(…) which is so long.

So, all my controllers extends from a “Parent One” instead of usual Symfony Controller, which caches services and provides basic functions to make the code more readable.

Here is it (couple of functions)

class AAController extends Controller {

  /**
   * Retrieve MongoDB Document Manager
   */
  public function getDM() {
    if (!isset($this->_dm)) $this->_dm = $this->get('doctrine....mongodb');
    return $this->_dm;
  }

  /**
   * Retrieve translator service
   */
   public void getTranslator() {
    if (!isset($this->_trans)) $this->_trans = $this->get('translator');
    return $this->_trans;
   }

   /**
    * Translate a string
    */
   public function trans($msg) {
     return $this->getTranslator()->trans($msg);
   }

}

Juste to give you an idea of performance improvements:

  1. Retrieving a service with “caching” is at least 10x times faster
  2. If service name changes, it is quite easy to upgrade.

And honestly, I wonder why SF framework is not caching access to service natively – Part 1

Symfony 2 : Redirect to Referer

A little tip/reminder to redirect to referer (ie incoming page) with Symfony 2.1+

This is something I often use after a “delete” action for example:

$referer = $this->getRequest()
                ->headers
                ->get('referer');
return $this->redirect($referer);

As I’m a little bit lazy, my global controller that any other controller extends, has this method:

class AAController extends Controller { ... 
  public function redirectToReferer() {   
    return $this->redirect(
              $this->getRequest()
                   ->headers
                   ->get('referer')
           );
  }
}

and usually, it is called from an action through…

  return $this->redirectToReferer();

MongoDB, Symfony2 and REST/JSON

Symfony2 is a powerful php framework, backed by doctrine ORM. MongoDB uses a  JSON/BSON-storage for its documents.

While using SF2 for REST/JSON requests, JSON exports of DB objects is often required. There are many ways to achieve that from an ORM PoV:

  • manual one by creating a toJson() and fromJson() methods for each Entity/Document
  • use a serializer

In SF2, I do recommend JMSSerializerBundle : it is simple to install, thanks to Composer, and very easy to use.

For example, to encode data within the controller:

$serializer = JMS\Serializer\SerializerBuilder::create()->build();
$jsonContent = $serializer->serialize($data, 'json');

At this point, $jsonContent is a String. And to send it as an answer:

 $response =  new Response($jsonContent);
 $response->headers->set('Content-Type', 'application/json');
 return $response;

And that’s all.
To be a little bit more efficient, we can create a new “JsonResponse” class

namespace AB3\ExampleBundle\Internal;

use Symfony\Component\HttpFoundation\Response;

class JsonResponse extends Response {

   public static function create($jsonContent) {
      $response =  new Response($jsonContent);
      $response->headers->set('Content-Type', 'application/json');
      return $response;
   }
}

From my daily usage, I’d rather put this ‘create’ method directly in a Root Controller that all my controllers extends, in order to simply common stuffs…

namespace AB3\ExampleBundle\Controller;

use Symfony\Component\HttpFoundation\Response;

class ARootController extends Controller {

   public function sendJsonOk($data) {
      $serializer = JMS\Serializer\SerializerBuilder::create()->build();
      $jsonContent = $serializer->serialize($data, 'json');
      $response =  new Response($jsonContent);
      $response->headers->set('Content-Type', 'application/json');
      return $response;
   }
}

Then, the call is quite simply for any action into a controller

class MyController extends ARootController {

  public function exampleAction() {
    $data = ....; // get some datas to send back to user
    return $this->sendJsonOk($data);
  }
}