Frank DENIS random thoughts.

Tokyo Cabinet, Mikio did it again

Mikio Hirabayashi is the guy behind QDBM (a kick ass embedded database library) and Hyper Estraier (nothing but the best opensource search engine ever made).

Although his name may not be familiar, but Mikio Hirabayashi is probably one of the best programmer in the world, when it comes to storing/indexing tons of data.

Mikio’s new baby is called Tokyo Cabinet. It defines itself as “a modern implementation of DBM”.

Tokyo Cabinet is developed as the successor of QDBM on the following purposes. They are achieved and Tokyo Cabinet replaces QDBM.

  • improves space efficiency : smaller size of database file.
  • improves time efficiency : faster processing speed.
  • improves parallelism : higher performance in multi-thread environment.
  • improves usability : simplified API.
  • improves robustness : database file is not corrupted even under catastrophic situation.
  • supports 64-bit architecture : enormous memory space and database file are available.

As with QDBM, the following three restrictions of traditional DBM: a process can handle only one database, the size of a key and a value is bounded, a database file is sparse, are cleared. Moreover, the following three restrictions of QDBM: the size of a database file is limited to 2GB, environments with different byte orders can not share a database file, only one thread can search a database at the same time, are cleared.

Tokyo Cabinet runs very fast. For example, elapsed time to store 1 million records is 1.5 seconds for hash database, and 2.2 seconds for B+ tree database. Moreover, the size of database of Tokyo Cabinet is very small. For example, overhead for a record is 16 bytes for hash database, and 5 bytes for B+ tree database. Furthermore, scalability of Tokyo Cabinet is great. The database size can be up to 8EB (9.22e18 bytes).

Tokyo Cabinet is coded in a modern way, featuring mmap()ed memory and Varnish-like tricks in order to fully take advantage of today’s operating systems.

Here you can get read some benchmarks, although it compares apples with oranges (CDB for instance can’t insert new records once the database has been created and other competitors have other restrictions that QDBM cleared).

And if you ever want to test Tokyo Cabinet within MySQL, Brian Aker even started a MySQL engine for it: Tokyo Engine. This MySQL engine is still not in useable state, though.