PHP-FPM might get merged into PHP
If you're running a highly loaded web site powered by PHP, you must be using php-fpm, don't you?
While new releases of php-fpm always immediately follow PHP releases, it's still a PITA to always patch the PHP source code at every release.
How come php-fpm hasn't been merged into PHP at the first place? The main reason is an incompatible license. Or rather... was.
Andrei just announced that the license of php-fpm had been changed. It's now the PHP license, and php-fpm can now technically get officially merged into PHP.
Here's a relevant post of the High-performance PHP group.
PHP is still going to suck, but faster :)
Pinba: a real-time statistics server for PHP
Just saw that one on the highload PHP list:
Pinba is a realtime statistics server for PHP.
It is a daemon gathering information sent by PHP processes by UDP. It is used at Badoo.
It accumulates and processes data sent over UDP by multiple PHP processes and displays statistics in a nice human-readable form of simple "reports", also providing read-only interface to the raw data in order to make possible generation of more sophisticated reports.
With Pinba extension users also can measure particular parts of the code using timers with arbitrary tags.
Here's a link to the Pinba manual
10 Reasons why PHP is Still Better than Ruby
Funny link : 10 reasons why PHP is still better than Ruby.
PHP-FPM 0.5.10 has been released
As always, Andrei made an excellent work keeping the high-performance PHP patch up-to-date.
Libevent was upgraded, it applies to PHP 5.2.8 and PHP 5.3 and it now supports Zend Thread Safety.
PHP 5.2.8 is out
Hopefully that release isn't broken like PHP 5.2.7 was (and thanks to the FreeBSD ports maintainers for committing untested, broken and insecure stuff into the tree, once again).
Since the web site hasn't been updated, here's the download link.
php-fpm will quickly get updated.
Closing the users browser connection whilst keeping your php script running
Here is an old, but still relevant trick for PHP coders: Closing the users browser connection whilst keeping your php script running.
To summarize because the site seems to often be down:
<?php
ob_end_clean();
header("Connection: close");
ignore_user_abort(); // optional
ob_start();
echo ('Text the user will see');
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush(); // Strange behaviour, will not work
flush(); // Unless both are called !
// Do processing here
sleep(30);
echo('Text user will never see');
?>
Lesser-known PHP vulnerabilities
Stefan Esser published his Slide from the Zend Conference 2008 covering various common vulnerabilities in PHP applications and in PHP itself.
It's definitely worth a read.
Next Internet Explorer to pass the Acid 2 test ?
According to the IE Blog, Internet Explorer 8 passed the Acid 2 test - more on this.
It will obviously take years before it is released and people actually leave IE 6 and 7 for it, but still, it shows that Microsoft seems to be on the right track.
So rejoy, web developpers, maybe some day it will finally be possible to write XHTML/CSS code without ugly tricks in order to work around IE bugs and limitations. Maybe. There's hope.
Easily embed PHP within C++
PHP is a jerky joke when it comes to writing standalone servers. A language like C++ is way more efficient for that kind of task, even through the pure web development may still use PHP.
However, sharing data between both languages means that in order to make a change, both versions have to be kept in sync.
Facebook has a nice answer to that: they embed the PHP interpreter in their C++ apps, as you would do with Lua. The C++ app just loads PHP files and then, PHP functions can be called almost as if they were C++ functions.
Facebook recently released a great BSD-licensed library that makes this task really trivial: PHPEmbed.
Bob Jenkins super fast hash for PHP
PHP's built-in functions like md5() perform really poorly when it comes to using the hash for hash tables or for uniform load balancing.
Paul Hsieh's hash is a very fast and popular hash function for that kind of job.
Last year, Bob Jekins compared a dozen of hash functions and designed its own (actually the third revision of it), that performs as fast as Paul Hsieh's with less collisions
To make a long story short: this is probably the most efficient hash function known yet, as long as it is not used for cryptographic purpose. And it's a pity that PHP doesn't implement it.
It's why I really wanted to spend a few minutes in order to create a PHP extension implementing that hash function.
Download the Jenkins hash extension for PHP and give it a try.
Preliminary benchmarks show that the Jenkins hash, as in the PHP extension, outperforms MD5 by 4 times, while still remaining a pretty good unique identifier.
Optimizing require_once with xend
Although things are getting better with PHP 5.2, file inclusion (require_once) has always been slow with PHP.
Libraries like Zend Framework, AdoDB or Smarty are designed have their source code is divided into multiple files, in a very clean way. The drawback is that in order to use them, PHP has to included files that include other files, that include other files... and it introduces a noticeable startup delay. And it even happens with accelerators like Xcache.
An obvious workaround is to concatenate every required file into a single file. Several people noticed that the Zend Framework was from 30% to 3 times faster that way.
There's an interesting project that automates that task : the Xend PHP extension.
Xend finds require_once() statements, inserts the content where it should be and then it automatically saves the one-file version into a new PHP file.
Just look at the benchmarks.
The Xend extension seems to be an easy way to speed up a lot of PHP applications without any code change. Unfortunately it's still at early stage (at least on OpenBSD, it's as stable as nitroglycerine).
By the way, another tiny project that can be interesting is the Dgx's PHP shrinker. Shrinking the source code will reduce the time needed by the accelerator (Xcache, APC...) in order to build the opcode cache for the file.
PHP 5.2.5 + Suhosin + FastCGI = unstable trio
Today, I upgraded a loaded server running vBulletin to PHP 5.2.5. That server is running OpenBSD-current, PHP-fastcgi from ports with Suhosin enabled as in default configuration and Lighttpd.
That setup has been very stable for 2 years.
But the upgrade to PHP 5.2.5 was a complete failure. PHP crashed with segmentation faults almost after every request on memory deallocation.
After disabling Xcache, libpuzzle and almost every module, it kept crashing over and over again.
Reverting to 5.2.4 immediately fixed the issue.
I didn't investigate that bug yet, moreover it doesn't happen with every script. Anyway, unless you absolutely need the fixes that were applied between PHP 5.2.4 and 5.2.5, maybe it'd better to wait for the next release. Or at least, if you want to upgrade, prepare a rollback procedure first.
Swiftiply: boost your framework-driven web applications
It's not a new project, but if you never heard about it, have a look at Swiftiply :
"Scaling your web applications should be easy. Start small, then when you need more capacity, just add it. Another process. Another machine. More capacity, instantly. Without additional configuration or software restarts.
That is what you get with Swiftiply.
Swiftiply is a backend agnostic clustering proxy for web applications that is specifically designed to support HTTP traffic from web frameworks. It is a very fast, narrowly targetted clustering proxy. In back to back comparisons of Swiftiply to HAProxy, Swiftiply reliably outperforms HAProxy (tested using IOWA, Rails, and Ramaze backend processes) and, depending on your web framework, you may not even need to put a traditional web server into your architecture at all.
Swiftiply is a clustering proxy server for web applications. What makes it different from other clustering proxies, however, is that it expects the backend processes to connect to it. That is, the backend processes are clients of the Swiftiply server, as are the browsers out in userland. The advantage of this is that it permits the back ends to maintain a persistent connection with the proxy server, which eliminates socket setup/teardown costs. And even more importantly than that, it permits backend processes to be started up or shut down without requiring any notification or configuration of the proxy. So, if more capacity is needed, all one needs to do is start the processes. It will immediately be available and will begin to be utilized."
I finally tested it with Rails and it works as advertised. Performance is immediately doubled, and it's a breeze to install. Swiftiply rocks.
PHP : notes about integers
Here's a classical scenario. You get an identifier as $_POST['id'] and you need to check that the value is actually a PHP integer value, that has just been converted into a string because everything becomes a string in the $_POST[] array.
is_int() obviously doesn't work, as $_POST['id'] is a string.
is_numeric() is also plenty wrong. is_numeric() is not designed to check whether the string contains only digits, neither it is designed to check whether it is something that would fit into a PHP integer value.
PHP's is_numeric() relies on Zend Engine's is_numeric_string() function. A great deal of PHP odd behaviors depend on that internal function, like those described in that article.
Here's what is_numeric() actually does :
- it skips leading spaces, tabs and \r, \n, \t, \v and \f characters.
- it then skips any leading + or -, but it then bugs out if the first characters after the spaces are "0x" or "0X".
- if there's no + or -, but "0x" or "0X", it understands that the rest should be hexadecimal digits.
- it then skips leading zeros.
- if it's not in hex mode, it looks for '.', 'e', 'E' and '+' or '-' after the 'e' and 'E'. If a '.' is found, it understands that it is in a floating-point number context.
- by default, it is in "integer mode". But depending on the compiler, if more than 10 or 19 digits are found, it compares subtrings in order to eventually switch to the floating-point mode.
Don't rely on is_numeric() if what you actually want is to check whether a string contains something like "8928", ie. a pure PHP integer, casted as a string. is_numeric() is designed to return TRUE if the string looks like a constant, regardless of the base and the type.
is_numeric("4E2") = TRUE
is_numeric("\r\n\r\n\t\f0X0") = TRUE
is_numeric(" 0xDeadBeef") = TRUE
is_numeric(str_repeat("9", 9999)) = TRUE (way out of bounds for a PHP integer object)
If you want to check that a string contains a casted integer, here's a way to do it:
if ($v === (string) (int) $v) { ... }
Also, don't forget that integer objects have minimal and maximal values in PHP. Actually, the limits are the same as the one of the "signed long" type of your compiler. Unlike Ruby that automatically switches to big (infinite) numbers, if there's an arithmetic overflow with PHP, the result is undefined. Since integers are always signed within PHP, the result is really undefined.
Casting a string into an integer can obviously give very different results:
$a = "10293847569";
if (is_numeric($a)) {
$b = (int) $a;
$c = (int) 2147483648;
echo "[$a] != [$b] != [$c]\n";
}
Sample result:
[10293847569] != [2147483647] != [1703912977]
The value you get for $b the upper limit of an integer value. If you application mixes types in order to reach a single attribute, this can be the root of weird bugs.
In order to know the upper and lower limits of integer values, PHP provides two constants : PHP_INT_MAX and PHP_INT_LOW. So, before multiplying two numbers, you can check whether an overflow would occur that way:
if (PHP_INT_MAX / $a < $b) {
throw new Exception("Arithmetic overflow");
}
$c = $a * $b;
Why PHP is a mess
Some pretty good links about the PHP language and how retardated it is :
- PHP in contrast to Perl
- What I don't like about PHP
- I'm sorry, but PHP sucks
- Experiences of using PHP in large websites
- Is Perl a good career move ?
While some of these documents were written years ago, everything is still true, and even worse.