Wednesday, April 04, 2012

Implementing Namespaces with Memcached and PHP

    Memcached is a tool (with PHP extensions) for storing key-value pairs of data in RAM and it is used for boost applications where persistent storage is needed and fetching data from datasource (file, database, internet...) is more expensive than fetching from RAM (from performance point of view).

    Despite its advantages and the fact that by using it your application loads data much faster when it is in cache, it lacks any kind of keyword data removal nor any other capability to do so.

    It may not sound as a real problem, that can be a real problem making your application return invalid data in some scenarios.

    Here I will provide some scenarios where memcached can lead to problems and will post some approaches to resolve this limitation.

Scenario 1

We have pagination in our system, and keep that pagination in cache for better performance. When deleting, inserting or altering new data, that pagination becomes obsolete, and may be already cached without the posibility of deleting it because, for example, we don't have pagination cache's key id from the function that adds, deletes, or alters its data.

First approach

 
    The idea behind that (in which my work is based) is to emulate namespaces by adding some keys with version, are the official pseudo-namespace proposal, which allows to workarround invalid data returning by emulating namespaces and appending a version to the key. This way, if you have foo stored in cache, you update namespace to automatically try to get foo+latest_version, which may force cache regeneration.

    This approach have some troubles though: You can't bind data to several namespaces at the same time! And that was the original reason I extended this idea to a new approach which can be explained below:

Scenario 2

    Imagine we have a database with two tables, one for accounts, and the other for storing friend relationship related between them.

Table Account Table Friendship
id INTEGER,
username TEXT,
...
PRIMARY KEY (id)
owner INTEGER,
friend INTEGER,
PRIMARY KEY (owner,friend),
FOREIGN KEY (owner) REFERENCES Account(id),
FOREIGN KEY (friend) REFERENCES Account(id)

    In this scenario, suppose account id 3 have these friend ids: 1,2,5,6,8. When asking database for friend, it will return array (1,2,5,6,8) we are going to put into cache, but what about if account id 2 is deleted, or have its id changed?

    Then, our cache is immediatelly invalid but the real data isn't (supposing we provided mechanisms for automatically updating foreign keys in DB!).


    I started to work on a class for memcache to address this problem (and possibly other issues) at the cost of having a bit of namespace data inside cache.

    My approach is to store all the keys a namespace contains as an array for allowing the system to delete all those keys when a namespace has expired.

    For example, in Scenario 2, when we add a new friend, we can bind it to owner account namespace as well as friend account namespace.

    This way, when something changes, we can expire one of those two namespaces and will delete related data from cache.

Code

    You can grab the source code (PHP) from github

Requirements

    In order to use this code you will need:
  • Memcached (of course!)
  • pecl-memcache PHP extension
  • A config file where $cache_enabled is set to TRUE
Usage example

    Check github for an usage example

Benchmarks

    I tested it in my personal server, with postgreSQL, session handling and storing in DB and PHP 5.4.1_RC1 with the following results:
    Sessions are cached when user logs in via my DBCache class.
Without cache hit:
~35 SQL queries
~0.37 seconds page generation
With cache hits:
~0 SQL queries
~0.04 seconds page generation

No comments: