Big life secret: how I manage my money
Here's a very personal secret: how I manage my money.
Let's face it: I never was able to manage my money, or even to know how much I have (ot not) without any helper. Since I joined Skyrock my income is quite constant. But before, it was a complete mess, since I was juggling with part-time and short-time position and freelance jobs.
Some people seriously track what's happening to their bank account. Some people need no helper to always know what's going on. Some people know how to spare. I never was able to do anything like that. Never.
Software like GNUcash is probably a great way to manage one's money. I tried it multiple times, among with tons of other similar software and it was a complete failure. Way too complicated, way too lousy to keep up to date. Even software advertised as "for dummies" felt lousy and too complicated for a dummy like me that just has an income and pays regular stuff. Sounds like this kind of software is made for people with plenty of money and plenty of accounts.
About 20 years ago, I wrote a very basic (and in GfA-Basic) money tracker. 20 years later, I'm still using something similar and I'm still unable to use anything else.
It downloads (was through Minitel, now it is through the Internet) the account status from the bank, it stores that in a database and it displays bars. Hovering a bar shows details.
And that's all. Fucking all. And it's fucking enough to know:
- at any time, if I can afford to buy some expensive stuff or to have a break: a quick look at the shape of the curve and at the width of the bar the month before, is enough to know.
- when and why large expenses happened, and how they compare to other expenses, just by looking at the relative widths of the bars.
Prices can change, income can change, needs can change, family can change, a financial crisis can happen, no need to be a finance wizard, the tool is always accurate to appreciate whether there's a gap for buying cool stuff, or whether every penny must have a reason to get lost. I just have to try keeping the global shape of the curve regular.
Am I an immature fucking idiot? Probably. Do I care? No.
Excel as an exchange format is a nightmare for developpers
In my daily job, everytime someone wants to send a list of things to a developper, he always send an Excel file.
A list of names? An Excel file. A list of questions and answers? An Excel file. A list of translations for gettext? An Excel file. Mathematical expressions? An Excel file, embedding Word objects with its mathematical extension.
Of course, users are efficient with tools they work all the time with. And with Excel, they can easily create a clean layout for their data. When Excel is used in order to create documents to be read by other people, the tool is wonderful. To be fair I love Excel, this is the Microsoft thing I love, the thing that made me install Windows 3.11 a while ago. It's not flawless, but it works pretty well, it's easy to use, it's powerful and anyone can create standard, but clean and nice-looking document with it.
But Excel is not a tool designed to format data for developpers. Developpers need normalized and easy to parse data, not spreadsheet documents.
Sometimes, exporting the XLS file to CSV is enough. It's enough for very simple lists, like email lists, though you still have to check for typos.
Sometimes it's way more work that people could imagine.
A while back, I had to make a simple online quiz system. A WYSIWYG web-based administration area was made. But most people preferred to use Excel so that they could cut/paste parts of Word, Visio, Powerpoint and Excel documents. Everyone used a different layout. When I received the first Excel documents, I asked people to at least use a simple layout: first column with the question, next columns with the answers, last column with pictures, etc. Some did their best to follow that convention, but there was still no consistency. For instance a lot of people filled end of lines with spaces in order to open a new line instead of just pressing "Return". Diagrams were a mix of background pictures with text over them (still using spaces to get the characters at the right positions, reading the document without the right fonts gave unreadable results). There were typos and inconsistencies everywhere. People who worked on those documents did their best, and if the printed documents were perfectly useable. However, importing that into a MySQL database was a nightmare. Exporting these documents to CSV would have been pointless because of the pictures. It's why I tried to export them as MS-HTML documents. Parsing these documents was very difficult, especially since every user used a different layout for his document. The MS-HTML documents themselves were totally bogus, the text that was over background pictures was misplaced and the space-based paragraphs didn't work in a web browser. I gave up. There was no way to write a script that would reliably understand these documents. I printed the Excel documents, and I manually typed everything from scratch in the web-based administration area. It took me 3 weeks, including nights and week-ends. And people didn't understand why it took so long. They thought that it was because the server was running Linux, and that shitty operating system was unable to read Excel files. A few months later, the project leader asked me for an export of everything from that quiz system, as Excel files. That was easy. But a few weeks later, he sent me back the Excel files "with some minor corrections and some new questions. could you please merge them ASAP?". Ouch! An Excel sheet is designed to be human-readable, it's not a raw database dump. I had to write a "diff"-like application to compare Excel sheets, in order to find similar rows and changes. Then the database (lots of tables with foreign keys, nothing to do with the single-table layout of an Excel sheet) had to be updated, although there was no useable identifier in the sheets. Yet another nightmare. This is probably the most complex application I have ever written. Just because the changes have been made through Excel sheets instead of using the web interface.
Excel is designed to create printable documents. It might be not obvious for non-developpers, but please, please, please understand that Excel is not a tool to edit application data. In databases, data is not stored as a 1x1 table with columns containing Word-like text. Computers don't "read" documents, computers don't understand a layout designed to be printed.
For simple lists, developpers like text. Raw text, made with text editors like Wordpad, Nodepad++, BBEdit or Context. Instead of rows, just use lines. It might be a bit less handy than Excel, but it might save a lot of time to developpers who have to use the data.
A few days ago, I was asked to merge translations of country names in a web form. An english-speaking user would see "Germany", while a french-speaking user would see "Allemagne" in the list. Easy, very easy, especially since every country-name was already handled by a gettext interface. The relevant entries in the .po files just had to be translated. A colleague made a web-based interface in order to edit the .po files, although it that interface is not very handy for mass translations like a country list.
It's why I was sent a .rar file with Excel sheets. In one column, there was a cut/paste of the original list as seen in internet explorer (probably, since there were HTML attributes and unrelated entries), in another column there were translations, and sometimes a comment in another column. Something very simple to parse. But it actually took me hours to do it.
Why?
Out of 4 Excel files, 3 were saved as Office 2007 XML documents.
Gnumeric was unable to open those files. Google Spreadsheets was unable to open those files. Openoffice (even version 2.1) was unable to open those files. Back home, I tried with Excel on my Mac. Yes, I bought the Microsoft Office suite for OSX. It was able to open the files. Wow. But the CSV exports were... very odd. End of lines were single \r, not \n. A small Perl script changed that. But there was still something wrong with the charset. Non-ASCII characters weren't properly rendered. It was not Latin-15. It was not UTF-8, iconv refused it. The file command wasn't able to discover what it was. Finally, I used Excel to save these documents as Excel-97 files, then I downloaded and installed OpenOffice, the Excel-97 files were loaded by OpenOffice and they could be exported as CSV-files that could be easily parsed and exported as .po files by yet another small Perl script. A big waste of time just because the original files were Excel documents. The content of these documents was text. No text formatting, no meaningful cells decorations, nothing but standard text, that was even written in a way that could be easily parsed. But it took hours just because it hasn't been saved as text.
So please, if you send data to developpers, don't use Excel. Either send raw text, or use the web-based interface (or if there's none, ask them to write one, with a framework like Ruby On Rails it's damn fast to do, and probably faster than it would be to postprocess Excel files). Thank you.
Moderating bulletin boards with text classifiers
Bulletin boards require moderation. Even on serious bulletin boards for professionnals, moderation is mandatory. You need to remove spam, defames, abuses, threats, hateful statements, discussions about illegal activities and vulgar, obscene, pornographic, or indecent language.
Spam filters are text classifiers. They can tag spam and ham. Could spam filters also be used to moderate bulletin boards?
This is something I wanted to try for a long time. Some previous experiments with DSPAM were interesting, but there were many false positives and I didn't try any further.
But I recently seriously implemented that idea. The bulletin board is based upon Vbulletin and the text classifier is CRM114.
A daemon is scanning new messages, with an intentional delay, in order to avoid people who would post the same message over and over again as it disappears. These messages are processed by CRM114. If they are detected as spam, they become invisible, actually queued for manual moderation. A manual moderation interface shows unmoderated messages. The moderator can let a message pass through. In that case, and if the message was previously unclassified, CRM learns the message as non-spam. If the moderator deletes a message that was either a false negative or an unclassified message, CRM learns the message as spam. Quite basic, although the actual details are a bit more complex in order to deal with meta-moderation and with users who intentionnally remove some messages.
So, how does this system perform? Amazingly well, I was really impressed. It works and CRM114 learns very fast.
Illegal message are immediately trapped by CRM114, this is truely amazing. I was shocked to discover that it was quite effective to find aggressive messages. I was also shocked by the tiny number of false positives. At that point, and for that specific task, the classifier seems to be quite useful.
Of course, this doesn't replace human moderators. Because moderators have to move threads into the right categories, to help users, to understand the real meaning and the potential implications of some posts, etc. But the system catches most illegal messages before the manual moderation. It's a very efficient proactive tool, especially when you can't afford a team of moderators working 24 hours a day, 7 days a week.
I was also simply amazed by CRM114 itself, although I was a die-hard DSPAM lover. CRM114 is way, way, way more flexible, it's fast, it's actively maintained and the css files are reliable.
Skyrock Blog launched
Today, the company I'm working for officially launched Skyrock Blog a few hours ago, the international version of Skyblog, a popular french blog service. If you never heard about it, give it a try. Skyrock Blog has no bells, no whistles, but it focuses on ease of use.
English, German, Dutch and Spanish translations are now available.
The biggest part of the work was (and is... the work is still not really complete yet...) to change the software, hardware and network architecture. In past, we used to create static HTML files everytime the content of a blog changed. But access times of hard disks quickly became a showstopper. Now, there is no more static HTML files, PHP scripts are now serving everything and a farm of memcache servers are trying to reduce the database load. Everything works like a breeze so far, but given the insane amount of hardware that was added to the previous infrastructure, the opposite would be hard to believe.
I didn't work that much on the user-visible part and to be fair, I didn't want to. Recycling old and twisty code is not something I'm fond of. Kudos to the colleagues who did an amazing work at sorting that out. I worked on statistics tools, moderation, and various boring things that are mandatory but that no user really cares about.
The international version of Skyblog is just a first step. There is more, more, more stuff to come.

RSS feeds are finally there, but only with basic text. Intentionnally no encoded content and no picture. This is very frustrating. I'm currently watching about 50 interesting blogs, and without correct RSS feeds this is something I just couldn't do. Almost every blog system out there has correct RSS feeds. Skyblog had none, it's why some times ago I made a small script to create one. Now, Skyrock Blog has RSS feeds, but without any picture. These feeds are pointless, especially since most articles from Skyrock Blog are picture-centric.
It's why I decided to release the tool in public domain. Grab the Skyblog RSS / Skyrock Blog RSS generator and you'll get RSS feeds for any Skyrock Blog, with full text and pictures. This is a simple Ruby script that fetches pages like any web browser does, then it parses the content and packs it up to an RSS feed. Ruby makes it so easy to write such tools. You don't even need any web server, thanks to WEBrick. Just start the server :
./skyrock-blog-rss.rb &
And then get your Skyrock Blog RSS feeds from:
http://127.0.0.1:2000/rss?u=user name
Have fun.
00F.Net is back

One year ago, the 00F.Net code and content vanished with a lot of other data due to faulty hardware.
But I finally brang that blog back, even if it had to restart from scratch.
Welcome back to 00F.Net !