Closing the users browser connection whilst keeping your php script running

written by jedi on October 7th, 2008 @ 11:11 PM

Here is an old, but still relevant trick for PHP coders: Closing the users browser connection whilst keeping your php script running.

Lesser-known PHP vulnerabilities

written by jedi on September 24th, 2008 @ 02:11 PM

Stefan Esser published his Slide from the Zend Conference 2008 covering various common vulnerabilities in PHP applications and in PHP itself.

It's definitely worth a read.

Next Internet Explorer to pass the Acid 2 test ?

written by jedi on December 20th, 2007 @ 12:29 AM

According to the IE Blog, Internet Explorer 8 passed the Acid 2 test - more on this.

It will obviously take years before it is released and people actually leave IE 6 and 7 for it, but still, it shows that Microsoft seems to be on the right track.

So rejoy, web developpers, maybe some day it will finally be possible to write XHTML/CSS code without ugly tricks in order to work around IE bugs and limitations. Maybe. There's hope.

Easily embed PHP within C++

written by jedi on December 16th, 2007 @ 07:07 PM

PHP is a jerky joke when it comes to writing standalone servers. A language like C++ is way more efficient for that kind of task, even through the pure web development may still use PHP.

However, sharing data between both languages means that in order to make a change, both versions have to be kept in sync.

Facebook has a nice answer to that: they embed the PHP interpreter in their C++ apps, as you would do with Lua. The C++ app just loads PHP files and then, PHP functions can be called almost as if they were C++ functions.

Facebook recently released a great BSD-licensed library that makes this task really trivial: PHPEmbed.

Bob Jenkins super fast hash for PHP

written by jedi on December 4th, 2007 @ 02:11 AM

PHP's built-in functions like md5() perform really poorly when it comes to using the hash for hash tables or for uniform load balancing.

Paul Hsieh's hash is a very fast and popular hash function for that kind of job.

Last year, Bob Jekins compared a dozen of hash functions and designed its own (actually the third revision of it), that performs as fast as Paul Hsieh's with less collisions

To make a long story short: this is probably the most efficient hash function known yet, as long as it is not used for cryptographic purpose. And it's a pity that PHP doesn't implement it.

It's why I really wanted to spend a few minutes in order to create a PHP extension implementing that hash function.

Download the Jenkins hash extension for PHP and give it a try.

Preliminary benchmarks show that the Jenkins hash, as in the PHP extension, outperforms MD5 by 4 times, while still remaining a pretty good unique identifier.

Optimizing require_once with xend

written by jedi on November 27th, 2007 @ 06:51 PM

Although things are getting better with PHP 5.2, file inclusion (require_once) has always been slow with PHP.

Libraries like Zend Framework, AdoDB or Smarty are designed have their source code is divided into multiple files, in a very clean way. The drawback is that in order to use them, PHP has to included files that include other files, that include other files... and it introduces a noticeable startup delay. And it even happens with accelerators like Xcache.

An obvious workaround is to concatenate every required file into a single file. Several people noticed that the Zend Framework was from 30% to 3 times faster that way.

There's an interesting project that automates that task : the Xend PHP extension.

Xend finds require_once() statements, inserts the content where it should be and then it automatically saves the one-file version into a new PHP file.

Just look at the benchmarks.

The Xend extension seems to be an easy way to speed up a lot of PHP applications without any code change. Unfortunately it's still at early stage (at least on OpenBSD, it's as stable as nitroglycerine).

By the way, another tiny project that can be interesting is the Dgx's PHP shrinker. Shrinking the source code will reduce the time needed by the accelerator (Xcache, APC...) in order to build the opcode cache for the file.

PHP 5.2.5 + Suhosin + FastCGI = unstable trio

written by jedi on November 20th, 2007 @ 11:44 PM

Today, I upgraded a loaded server running vBulletin to PHP 5.2.5. That server is running OpenBSD-current, PHP-fastcgi from ports with Suhosin enabled as in default configuration and Lighttpd.

That setup has been very stable for 2 years.

But the upgrade to PHP 5.2.5 was a complete failure. PHP crashed with segmentation faults almost after every request on memory deallocation.

After disabling Xcache, libpuzzle and almost every module, it kept crashing over and over again.

Reverting to 5.2.4 immediately fixed the issue.

I didn't investigate that bug yet, moreover it doesn't happen with every script. Anyway, unless you absolutely need the fixes that were applied between PHP 5.2.4 and 5.2.5, maybe it'd better to wait for the next release. Or at least, if you want to upgrade, prepare a rollback procedure first.

Swiftiply: boost your framework-driven web applications

written by jedi on September 29th, 2007 @ 10:07 PM

It's not a new project, but if you never heard about it, have a look at Swiftiply :

"Scaling your web applications should be easy. Start small, then when you need more capacity, just add it. Another process. Another machine. More capacity, instantly. Without additional configuration or software restarts.

That is what you get with Swiftiply.

Swiftiply is a backend agnostic clustering proxy for web applications that is specifically designed to support HTTP traffic from web frameworks. It is a very fast, narrowly targetted clustering proxy. In back to back comparisons of Swiftiply to HAProxy, Swiftiply reliably outperforms HAProxy (tested using IOWA, Rails, and Ramaze backend processes) and, depending on your web framework, you may not even need to put a traditional web server into your architecture at all.

Swiftiply is a clustering proxy server for web applications. What makes it different from other clustering proxies, however, is that it expects the backend processes to connect to it. That is, the backend processes are clients of the Swiftiply server, as are the browsers out in userland. The advantage of this is that it permits the back ends to maintain a persistent connection with the proxy server, which eliminates socket setup/teardown costs. And even more importantly than that, it permits backend processes to be started up or shut down without requiring any notification or configuration of the proxy. So, if more capacity is needed, all one needs to do is start the processes. It will immediately be available and will begin to be utilized."

I finally tested it with Rails and it works as advertised. Performance is immediately doubled, and it's a breeze to install. Swiftiply rocks.

PHP : notes about integers

written by jedi on September 18th, 2007 @ 11:06 PM

Here's a classical scenario. You get an identifier as $_POST['id'] and you need to check that the value is actually a PHP integer value, that has just been converted into a string because everything becomes a string in the $_POST[] array.

is_int() obviously doesn't work, as $_POST['id'] is a string.

is_numeric() is also plenty wrong. is_numeric() is not designed to check whether the string contains only digits, neither it is designed to check whether it is something that would fit into a PHP integer value.

PHP's is_numeric() relies on Zend Engine's is_numeric_string() function. A great deal of PHP odd behaviors depend on that internal function, like those described in that article.

Here's what is_numeric() actually does :

  • it skips leading spaces, tabs and \r, \n, \t, \v and \f characters.
  • it then skips any leading + or -, but it then bugs out if the first characters after the spaces are "0x" or "0X".
  • if there's no + or -, but "0x" or "0X", it understands that the rest should be hexadecimal digits.
  • it then skips leading zeros.
  • if it's not in hex mode, it looks for '.', 'e', 'E' and '+' or '-' after the 'e' and 'E'. If a '.' is found, it understands that it is in a floating-point number context.
  • by default, it is in "integer mode". But depending on the compiler, if more than 10 or 19 digits are found, it compares subtrings in order to eventually switch to the floating-point mode.

Don't rely on is_numeric() if what you actually want is to check whether a string contains something like "8928", ie. a pure PHP integer, casted as a string. is_numeric() is designed to return TRUE if the string looks like a constant, regardless of the base and the type.


is_numeric("4E2") = TRUE
is_numeric("\r\n\r\n\t\f0X0") = TRUE
is_numeric("     0xDeadBeef") = TRUE
is_numeric(str_repeat("9", 9999)) = TRUE     (way out of bounds for a PHP integer object)

If you want to check that a string contains a casted integer, here's a way to do it:


if ($v === (string) (int) $v) { ... }

Also, don't forget that integer objects have minimal and maximal values in PHP. Actually, the limits are the same as the one of the "signed long" type of your compiler. Unlike Ruby that automatically switches to big (infinite) numbers, if there's an arithmetic overflow with PHP, the result is undefined. Since integers are always signed within PHP, the result is really undefined.

Casting a string into an integer can obviously give very different results:

$a = "10293847569";
if (is_numeric($a)) {
  $b = (int) $a;
  $c = (int) 2147483648;
  echo "[$a] != [$b] != [$c]\n";
}

Sample result:


[10293847569] != [2147483647] != [1703912977]

The value you get for $b the upper limit of an integer value. If you application mixes types in order to reach a single attribute, this can be the root of weird bugs.

In order to know the upper and lower limits of integer values, PHP provides two constants : PHP_INT_MAX and PHP_INT_LOW. So, before multiplying two numbers, you can check whether an overflow would occur that way:


if (PHP_INT_MAX / $a < $b) {
  throw new Exception("Arithmetic overflow");
}
$c = $a * $b;

Why PHP is a mess

written by jedi on August 6th, 2007 @ 09:33 PM

Some pretty good links about the PHP language and how retardated it is :

While some of these documents were written years ago, everything is still true, and even worse.

Playing with Ruby 1.8, Ruby 1.9 and PHP

written by jedi on July 8th, 2007 @ 01:05 AM

Today, I wanted to give a try to the latest Ruby 1.9 snapshot.

After disabling the set_thread_priority() call, it compiles and installs fine on OpenBSD. My test host is an Athlon 64 3400, running OpenBSD-current/amd64.

Before all, I wanted to benchmark it against Ruby 1.8.

Here's a simple and stupid test script I wrote, just to have something that iterates over arrays, calls methods, use a class-scoped counter, and does common tests:

class String
  @@counter = 0

  def testfunc2
    array = self.split
    str = ""
    array.each do |word|
      str << word unless word.empty? or @@counter < 0
      @@counter += 1
    end
  end

  def testfunc!
    self << "abc "
    testfunc2.join("-")
  end
end

str = "initial string"
5000.times { str.testfunc! }

Very simple. It takes a string, it adds "abc " to that string, it transforms it into an array of words, it iterates over every word and add each word to a new string with two useless tests by the way, it increments a class-wide counter, it returns the array of words, then all these words are joined by a dash. That happens 5000 times.

Here we go for the bench:

  • Ruby 1.8: 0m23.06s
  • Ruby 1.9: 0m11.65s

That's pretty cool. Ruby 1.9 is more than twice as fast as Ruby 1.8 here!

Out of curiosity, I wanted to translate that simple example to PHP in order to see how it would compare. Unfortunately, adding methods to strings is something PHP is unable to do. PHP doesn't let you extend strings, numerics, nor functions. So in order to do something similar to the previous test scripts, we have to reinvent the wheel, we have to invent a "String" class. Woah. And we can't even use that class as a string, because unlike any object-oriented language from the past 20 years, PHP is not even able to overload operators. It's why we have to invent a method ("set_value") just to set the content of the string. Ok, here we go for the PHP version of the above script :

class String {
    var $str;
    protected static $counter = 0;

    public function set_value($str) {
        $this->str = $str;
    }

    public function __construct($str) {
        $this->set_value($str);
    }

    public function __toString() {
        return $this->str;
    }

    public function testfunc2() {
        $array = split(' ', $this);
        $str = "";
        foreach ($array as $word) {
            if (!(empty($word) || self::$counter < 0)) {
                $str .= $word;
            }
            self::$counter++;
        }
        return $array;
    }

    public function testfunc() {
        $this->set_value($this . "abc ");
        implode($this->testfunc2(), "-");
    }
}

$str = new String("initial string");
$i = 5000;
do {
    $str->testfunc();
} while (--$i !== 0);

All those "$this->" and "self::" are boring and useless, they don't bring anything but ugly source code. PHP loves to annoy programmers by forcing them to write symbols like "$", "_", "->" and "::" everywhere. You have to write them over and over again, for everything you need in the current object, or you will get that wonderful error: "syntax error, unexpected TPAAMAYIMNEKUDOTAYIM".

Okay, let's benchmark the PHP script:

  • Ruby 1.8: 0m23.06s
  • Ruby 1.9: 0m11.65s
  • PHP 5.2.3: 1m36.46s

Yes, that's one minute and 36 seconds. You got the codes, try them yourself. The PHP script is not only ugly, it's also dog slow.

Please stop calling PHP a serious object-oriented language and please stop benchmarking languages over a function that computes prime numbers.

MaxKeepAliveRequests: keep it high

written by jedi on May 10th, 2007 @ 09:43 AM

The Apache HTTP server has a configuration directive that everyone knows about since Apache 1.1 : MaxKeepAliveRequests.

It defaults to 100.

People usually keep the default value or sometimes reduce it in order to save memory on small boxes.

Like a lot of other Apache keywords, MaxKeepAliveRequests is confusing. I sounds as if it was the total number of concurrent processes serving multiple HTTP requests on the same TCP connection.

But it's not.

It's actually the maximum number of requests to serve on a TCP connection. If you set it up to 100, clients with keepalive support will be forced to reconnect after downloading 100 items.

Lighty has a similar variable: server.max-keep-alive-requests.

What's the point? The keep-alive mechanism is good, why not just serve as many requests as necessary? In order to work around browsers bugs (IE + HTTPS and upload), why not just use the enable/disable keep-alive button?

A few days ago, I had a server with high system load. It was actually waiting a lot for the disk. That host serves pages with a lot of small images, about 500 images for a single page, plus CSS and Javascript files. It was running Apache, mostly in its default configuration. MaxKeepAliveRequests was bumped to 1000. Immediately, the system load decreased, there were less running processes and less wait for disk I/O. Just because clients could download a full page with a single connection. Bumping that value didn't had any negative impact, it only made everything snappier.

So: set MaxKeepAliveRequests to more than the maximum number of elements you will be serving for a web page. More than 100 is common these days. You can also set it to 0 to have it unlimited.

PHP 5.2.2 has been released

written by jedi on May 4th, 2007 @ 11:29 PM

Just in case you missed it, PHP 5.2.2 has just been released.

It fixes some important vulnerabilities disclosed during the month of PHP bugs.

Time to upgrade !

lighttpd 1.4.15 has been released

written by jedi on April 18th, 2007 @ 11:05 PM

A new maintenance release of the stable branch of lighttpd is now available.

Here's the changelog

If you are still using Apache, please give it or try, or alternatively, try Nginx, you won't look back.

Here you can download the port diff I made for OpenBSD-current.

Works fine so far.

PHP and its horrible hash/array mixup

written by jedi on April 18th, 2007 @ 03:43 PM

Unlike probably every other language on the planet, PHP decided to share a common type for hashes and arrays.

$a = array(4, 6, 9);
$b = array('foo' => 4, 'bar' => 6, 'blah' => 9);

A major issue with that mess is that with functions like array-merge(), PHP will reinvent the indexes if it finds that the current indexes look like numbers, even though you intentionnally used strings to get real hash keys.

$a = array('1357' => 10, '9753' => 20);
$b = array('2468' => 30, '8642' => 40);
$c = array_merge($a, $b);
var_dump($c);

Guess what you get with such a code?

array(4) {
  [0]=>
  int(10)
  [1]=>
  int(20)
  [2]=>
  int(30)
  [3]=>
  int(40)
}

Hey? 0, 1, 2, 3? Yes, because keys like '1357', although strings, were detected as integers and PHP decided to reinvent the indexes. Wotta mess.

  • Workaround #1: switch to a clean language with no silly surprises, like Ruby:

a = { 1357 => 10, 9753 => 20 }
b = { 2468 => 30, 8642 => 40 }
p a.merge(b)

Result:


{8642=>40, 9753=>20, 2468=>30, 1357=>10}
  • Workaround #2: add a dummy char before the first digit, like a space, so that PHP doesn't mess the indexes.

Notice the space before the digit in keys:

$a = array(' 1357' => 10, ' 9753' => 20);
$b = array(' 2468' => 30, ' 8642' => 40);
$c = array_merge($a, $b);
var_dump($c);

Result:

array(4) {
  [" 1357"]=>
  int(10)
  [" 9753"]=>
  int(20)
  [" 2468"]=>
  int(30)
  [" 8642"]=>
  int(40)
}

Woah, wonderful.

But are those keys actually strings? Given the behavior of array_merge(), yes. But with operators, they aren't, they can be used just as if they were integers:

$a = ' 42'; # notice the space
$b = ($a == 42);
$c = $a * 2;
var_dump($b);
var_dump($c);

Result:


bool(true)
int(84)

How logical...

empty() is also a good joke, btw. Why the hell does empty() evaluates as TRUE with 0, 0.0 and "0", but not with "-0" nor "0.0"?

Options:

Size

Colors