Wednesday, April 04, 2012

Implementing Namespaces with Memcached and PHP

    Memcached is a tool (with PHP extensions) for storing key-value pairs of data in RAM and it is used for boost applications where persistent storage is needed and fetching data from datasource (file, database, internet...) is more expensive than fetching from RAM (from performance point of view).

    Despite its advantages and the fact that by using it your application loads data much faster when it is in cache, it lacks any kind of keyword data removal nor any other capability to do so.

    It may not sound as a real problem, that can be a real problem making your application return invalid data in some scenarios.

    Here I will provide some scenarios where memcached can lead to problems and will post some approaches to resolve this limitation.

Scenario 1

We have pagination in our system, and keep that pagination in cache for better performance. When deleting, inserting or altering new data, that pagination becomes obsolete, and may be already cached without the posibility of deleting it because, for example, we don't have pagination cache's key id from the function that adds, deletes, or alters its data.

First approach

 
    The idea behind that (in which my work is based) is to emulate namespaces by adding some keys with version, are the official pseudo-namespace proposal, which allows to workarround invalid data returning by emulating namespaces and appending a version to the key. This way, if you have foo stored in cache, you update namespace to automatically try to get foo+latest_version, which may force cache regeneration.

    This approach have some troubles though: You can't bind data to several namespaces at the same time! And that was the original reason I extended this idea to a new approach which can be explained below:

Scenario 2

    Imagine we have a database with two tables, one for accounts, and the other for storing friend relationship related between them.

Table Account Table Friendship
id INTEGER,
username TEXT,
...
PRIMARY KEY (id)
owner INTEGER,
friend INTEGER,
PRIMARY KEY (owner,friend),
FOREIGN KEY (owner) REFERENCES Account(id),
FOREIGN KEY (friend) REFERENCES Account(id)

    In this scenario, suppose account id 3 have these friend ids: 1,2,5,6,8. When asking database for friend, it will return array (1,2,5,6,8) we are going to put into cache, but what about if account id 2 is deleted, or have its id changed?

    Then, our cache is immediatelly invalid but the real data isn't (supposing we provided mechanisms for automatically updating foreign keys in DB!).


    I started to work on a class for memcache to address this problem (and possibly other issues) at the cost of having a bit of namespace data inside cache.

    My approach is to store all the keys a namespace contains as an array for allowing the system to delete all those keys when a namespace has expired.

    For example, in Scenario 2, when we add a new friend, we can bind it to owner account namespace as well as friend account namespace.

    This way, when something changes, we can expire one of those two namespaces and will delete related data from cache.

Code

    You can grab the source code (PHP) from github

Requirements

    In order to use this code you will need:
  • Memcached (of course!)
  • pecl-memcache PHP extension
  • A config file where $cache_enabled is set to TRUE
Usage example

    Check github for an usage example

Benchmarks

    I tested it in my personal server, with postgreSQL, session handling and storing in DB and PHP 5.4.1_RC1 with the following results:
    Sessions are cached when user logs in via my DBCache class.
Without cache hit:
~35 SQL queries
~0.37 seconds page generation
With cache hits:
~0 SQL queries
~0.04 seconds page generation

Decreasing production server downtime with kexec

    When managing a production server, one of the most important thing is the tradeoff between server downtime and keeping server's software updated.

    While most of the updates can be applied from little to no downtime, a kernel update is always problematic since it requires typically a full reboot, and a significant downtime. To prevent that, many servers do not issue kernel updates as often as they should, specially those cheap rented servers.

    On the other hand, there are servers which like to presume of having a high uptime. While that might look good, it is in fact, quite the opposite: a high uptime in a server means they might not have updated their server's software!

    So I will introduce kexec and a benchmark to show how it can reduce downtime by reducing reboot time. But first, let's look how a unix-like system boots and shutdowns.

    In a typical boot/shutdown action, this are (aproximatelly) the steps that will be made by the machine:

  • Boot
    1. BIOS stage
    2. Bootloader load
    3. Kernel load
    4. INIT
      1. Kernel init
      2. Hardware initialisation
      3. Checking and mounting partitions
    5. Start services
  • Shutdown
    1. Stop services
    2. Sync discs
    3. Unmount partitions
    4. Hardware stop
    5. Hardware power off
    By using kexec, some of those steps are skipped, since it will change kernel from a running system. These are (aproximatelly) the steps for kexec reboot:

  1. INIT
    1. Kernel init
    2. Checking and (re)mounting partitions
  2. Start services
    To prove that reboot time decreased I created a little bash script to measure downtime (testTime.sh) and tested in my personal server running a Gentoo system: 

    To use provided script, you must run it after apache have been stopped with:
time ./testTime.sh SERVER_WWW_URI 2&>1 > /dev/null
    The commands I used for this benchmark are (via SSH):
Normal Reboot: /etc/init.d/apache2 stop && echo "Now you can exec time measurement script" && reboot
kexec reboot: kexec -l KERNELIMAGE --reuse-cmdline && /etc/init.d/apache2 stop && echo "Now you can exec time measurement script" && kexec -e

    These are the results I got:
Full reboot:
real    1m21.996s
user    0m3.241s
sys     0m2.833s
kexec reboot:
real    0m31.415s
user    0m1.872s
sys     0m1.684s
    So to sum up, despite it still takes time to perform kernel update, it is reduced significantly, so for most servers out there, now that is not an excuse to have system not updated anymore!

Friday, March 16, 2012

Adobe against FOSS and talling people which software to use

    This company have done always very bad things leaving their users alone whenever they wanted to. I'll try to explain it a little better for anyone who has never heard some


  1. A very long ,long, long history of security vulnerabilities in its produts (flash) that made even Chrome to fall in such a serious issue like remote code execution. (Source: zdnet, Adobe ).
  2. They dropped 64bit architecture whenever they want leaving users either with an old version of their flash plugin or with nothing at all. A simple google search is enough to demonstrate this fact. Despite it seems not important, it really is, because if my whole system (linux was the first to have a complete 64bit working system by the way) is 64bit, Who are Adobe to tell me to install a 32bit navigator or to change my SO? Specially nowadays, when having more then 4 GiB of RAM is not that strange.
  3. Dropping Linux support for their Air framework, and stating that it could still be done if some of their partners code it.  (Source: phoronix)
  4. Dropping Linux Flash Player navigator plugin except Chrome (Source Adobe). Well Adobe, it is good that you work with a company (Chrome is open source, but don't forget it is run by a company anyway) to improve things, but it is unaceptable that you simply drop all the other navigators just because they don't want to work with you or because they won't accept your guideliness.

    I am sure there are many, many other reasons I can't recall now, but there are several things I can say for sure, and I want to share with the world (hoping Adobe could read this sometime):


  1. You, Adobe, are the perfect reason not to tie my future to a closed sourced plattform as a developer. Just because you are proven to do what you want without care even about your customers!
  2. You, Adobe, you are NO ONE to say what software should I run. You may offer all you have, but you are not that important to force me use 32bit, force me use Chrome, force me use any other OS, nor anything similar.
  3. As a company, you fail because I will not change, only because you offer a product (which I don't like) which main uses are for embedded video players. As a company, you should think that Linux, despite being a small % of your share, is still important because your valuable programmers will have complains from their people too, and because fortunatelly you have competence now: HTML5 so guess what: The only thing I lose with flash are video players, and that can be done wih HTML5 too, so who will lose?



Wednesday, March 07, 2012

About Facebook Antiprivacy Policy

    Facebook has always been very irresponsible in respecting people privacy and have done very bad things in the past in this matter.

    Some time ago, according to Mark Zuckerberg (Facebook's CEO) declarations [1], I deleted my facebook account forever and I won't be back until it changes (thing I see very unlikely). Let's explain this and why facebook is evil.

  1. It is proved that facebook creates ghost profiles [2] and retains deleted data without the owner consentiment. This is specially true when deleting an account, or when using the "find your friends in facebook" feature, in which you type your email address and your password, and it search your possible friends in that social network.
  2. The friends of friends feature is totally wrong: If in real life friends of my friends could not be my friends, then why in facebook they act as if they were?
  3. Related to previous one, if I set my account to be seen only by my friends, then, when I post a comment, I like some photo of one of my friends, automatically, that content is available to be seen by uncommon friends, and the worst thing: My friend has to settle this privacy option for me! 
  4. You explicitelly give permission to facebook in their TOS [3] to use without royalties any content you upload to facebook without an option to deny it. So if they want to use one of your personal pics for a cocacola's advertisement spot, then they use it, gain money with you, and you won't be able to complain (nor to see money).
  5. Many people are blaming google to index their name and last name because of facebook's account. This is not true. As a webmaster, I know that google respect your mechanisms to let it know to not crawl over some pages you may have. This way, I blame facebook because they simply don't want that to take place for pure economic interests (I remember all of you that facebook have a high page rank, and one of the reasons is having that much of data being indexed in google)
    There are even more reasons to hate facebook's privacy idea, just because their CEO do not value their people's privacy, so I ask all my readers: Do you still want to be in a place where you are not valued?
Unfortunatelly in USA, privacy's law are not that strong than the ones we have in EU. For example in Spain, in which LOPD (Ley Orgánica de Protección de Datos, or Data Protection Act translated to english).

    I am very pleased we have this law in Spain to ensure companies to treat my data correctly and to guarantee my rights. I explain it better, to allow you to understand why facebook can be considered illegal in Spain or in UE.

  1. LOPD says the data owner is not the company having the DB running, but instead, the data belongs to the owner the data refers to.
  2. You have some 4 undeniable rights:
    1. Access to ALL the data a company has about you.
    2. Alter any data to correct mistakes it can have.
    3. Permanent delete of partial or full data. Where permanent means that you can ask for it to be efectivelly deleted from their database, and not just be marked as invisible. What facebook does, they don't delete anything, just mark that data as invisible or something similar [4]
    4. You are granted to settle the visibility of your data even inside target's company. In other words, you can tell any company that it is forbidden to publish part (or all) your data even inside their webpage.
    The conclusion would be: I am against facebook for those reasons, and I encourage people to stop using facebook or even to delete their accounts until they realize that we have privacy and my data is mine.
It will not surprise me if some judge in the EU decides to give Facebook an ultimatum to change things, something that can be even worse that the ireland request to facebook [5].

Friday, September 23, 2011

Internet Explorer and its disrespect to W3C standards

    W3C has set since much time ago, several set of standards to rule rendering of webpages and to unify criteria among navigators.

    This time, I am not refering to how IE treats margins in a completelly different way than all the others, but I am refering at how internet explorer does not seem to look at the provided charset of an X/HTML page.

    But this is not the first time that Microsoft goes on his own way contrary to the world, let me mention a bit of history about encodings.

Some time ago, where there wasn't any formal definition for anything that went outside ASCII encoding, several models were proposed from International Organization for Standardization (ISO): For example, the ISO-8859-XX for european languages, and not forget the most used UTF-8 encoding.

Well, while all, and when I say all is all, the Operating Systems in the world adopted any (if not all) of those standard models, Microsoft came along with a new, different and incompatible charset: windows-xxxx character sets (for example windows-1252.

Not happy with being the only one who broke the international standarization rules, some time after came Internet Explorer, parsing and rendering X/HTML elements as it wants, sometimes it is correct, and sometimes it is not.

    But let's get back to main topic: What is wrong (this time) with Internet Explorer? One of the things I will comment here is a problematic one: it ignores provided character set and uses windows' default one! Despite this error seems not reproducible always.

    This (and all IE problems) are not easy to deal with, because one can think: then let people know and use safer and better navigator, but this won't be an accepted solution, so what are the problems IE cause?

  • It forces developers (despite following strict standards) to put an extra effort in developing, which translates in more money to spend on project.
  • Since it is the default on Windows, an unexperienced user (a potencial client) may use it and unknow other navigators
  • Much more problems that I will not write about now.
    Conclusion: Imagine a battery vendor (that also is bound to standards like size, voltage, etc...) and you need some batteries for your remote control of your TV.
Now you have to buy some new batteries because the old ones ran out, and discover that there is only one vendor who is not following those standards and the batteries you bought are a bit bigger and with more voltage than what you need. This is the question I want to ask to everybody:
Will you change your remote control to a new one (provided also by that vendor), or will you discard those batteries, keep your remote control and ask that vendor to follow standard to produce items as it should?

    Now, think about Internet Explorer and keep in mind previous example: Why in this case, we need to change our developed code (remote control) in order to work with Internet Explorer(batteries) instead of doing what makes sense: Just discard Internet Explorer and force Microsoft to program a quality navigator?

Think about it...

References:

Wednesday, August 17, 2011

Farewell KMail2

    I've been testing the new akonadi based KMail2 and I got dissapointed. My last version tested was from KDE 4.7.0 and now, I decided to change to another software for productivity reasons.

    The main reason for this change is that KMail fails to mark messages as read and it fails to differentiate read messages from unread (for example, like previous KMail did, having different colours for read and unread ones).

    I've already posted a bug against it (KDE bug #276893), but I didn't receive any feedback from developers as of today.

    So I finally decided posting an image wich will show this problem: KMail does not mark ANY message as read, so it is hard to keep your messages up to date, since I have to remember which messages I have read and which I have not.

    My last hope is that they take seriously this severe blockstopper bug as soon as possible... In the mean while, I will be using another software for managing my emails, so.... After several years... farewell KMail2

Monday, August 08, 2011

About SQL injection



    Despite this kind of attack is very old, it is still the primary source of database and website attacks so it really deserves a little speak here and everywhere. In an ideal world, maybe this will not worth a cent, but unfortunatelly we are not in an ideal world.

    Many people say that this is due to malicious people and/or hackers fault, but the guilt comes from both sides: bad programmers and bad people.

    I'll explain this kind of attack with an example (in PHP, but it affects SQL independently of language used).

    Suppose we have a login validation this way:
$sentence="SELECT COUNT(*) FROM Users WHERE username='".$_GET['user']."' AND pass='".$_GET['password']."'";
    (And later in your code check if result>0 in order to check if login was successfull)

    The problem here comes from not checking user inputs, in a good case, sentence will not be a special one, for example: (user input in bold) 

SELECT COUNT(*) FROM Users WHERE username='bob' AND pass='foo';


    But there is a problem, if input is not sanitized, this SQL sentence could be transformed and executed into many SQL sentences allowing a user, for example this can happen because of not sanitizing _GET array:
(1) SELECT COUNT(*) FROM Users WHERE username='bob' OR 1=1;--' AND pass='foo';
or
(2) SELECT COUNT(*) FROM Users WHERE username='bob'; DROP TABLE Users;--' AND PASS='foo';
(3) SELECT COUNT(*) FROM Users WHERE username='bob'; SELECT creditcardnumber FROM Users;--' AND pass='foo';


    Those two sentences with user input marked as bold are self explanatory, but you can see how 1 sentence can be modified to allow you to login with an account which is not yours(1), to destroy data(2) or to gather other kind of sensitive information(3).

    Maybe you are thinking that if this problem is old, it does not worth to speak of it: ERROR!
From time to time, a huge company(Sony) or government agency (CIA) are attacked using this method. This leads me to think: How poor their hired programmers work, and how little do companies value our personal and sensitive data!

    So as this problem has been proved to be relevant and important, let's investigate some solutions.
  • Prepared Statements and parametrized queries:
    This seems the best solution: A SQL sentence is no longer crafted by join two or more strings, and SQL keywords are splitted from data. (and also, real sentence is not sent over the network every time it is executed). An example of prepared statement could be: (in postgres)
SELECT COUNT(*) FROM Users WHERE username=$1 AND pass=$2

    This time, postgres already know what type are $1 and $2 and they are parsed accordingly, but the more important fact, $1 and $2 parameters will be treated as a complete string, eliminating the posibility of crafting another sentence from this one. So in this case a malicious input will be treated like:
SELECT COUNT(*) FROM Users WHERE username='bob''; DROP TABLE Users;--' AND pass='foo';

    In this case, as all username will be treated as a whole parameter, this sentence will likelly return 0 rows as a result instead of harming data.
Try to AVOID SQL crafted strings in your source code and use prepared statements!

  • Sanitize inputs: While this approach is good enough, I recomment to apply in conjunction with previous one. It consists on checking and escaping any non standard character and convert them to avoid SQL sentence breakage. (In case you can't avoid crafting sentence)
$input=pg_escape_string($_GET['username']); and apply it to SQL crafted string, this way, any ' or " character that could break your SQL sentence will be quoted disallowing it to harm your sentence.


    To sum up: Even governments and big companies do not care enough their databases to spend a little time to check source code to avoid this kind of problems.