Frank DENIS random thoughts.

Timing a process out

A process can take 100% of the CPU resources because of an endless loop without any rest point. A process can take memory like mad until the system also starts swapping like mad. A process can open many files without closing them, and it’s a holy shit to even ssh to the computer in order to understand what’s going on.

Yes. It can happen. And it happens. Bugs do exist. To anyone claiming how stable his system is, have him remove the “kill” (and “pkill”, etc.) commands.

Unix-like systems have a nice feature to mitigate the implications of a process having an abnormal behavior. Using the “limit” or “limits” or “ulimit” shell commands, you can restrict how much CPU, stack, files, etc. a process can crunch. When a limit is reached, the process in sent a signal, or sbrk() fails.

Unfortunately, there’s a limit that no Unix has AFAIK. Except MirOS BSD that merged a kernel patch I made for OpenBSD a few years ago to implement that feature.

Unix has CPU limits, but no human-time limit, ie. a way to kill a process if it has run more than x seconds.

Yes, sometimes you have processes that aren’t burning your CPU, that aren’t taking memory like mad, but that are just stuck waiting for something that doesn’t happen. Like a dead lock, or user data that can’t get fed, or a process waiting for a child that is already dead.

Not so long ago, my local OpenBSD CVS mirror was out of date. Quite odd since it is kept up-to-date every day. The reason was that a cvsync process was there for some days, waiting for something, but actually doing nothing. I’ve been using cvsync for a long time without any issue, but one day, it failed, it decided to stop working before completing its task. This issue delayed other daily cron jobs, the FTP mirrors were also out of date, etc. And usual Unix limits have done nothing about it.

Time to also add cvsync to the list of things I start using a tiny tool I made a long time ago but that’s still quite useful nowadays: alarmer.

alarmer is trivial to use:


alarmer 3600 cvsync -f /etc/cs.conf

will start cvsync and kill it with an ALRM signal if it’s still there after 3600 seconds. Very simple, but always very effective.


#include 
#include 
#include 
#include 

int main(const int a, char * const *b)
{
    if (a < 3) {
        puts("Usage: alarmer <timeout (seconds)> <command> [args] [...]");
        return 1;
    }
    b++;
    (void) alarm((unsigned int) strtoul(*b, NULL, 10));
    b++;
    (void) execvp(*b, b);
    perror("Unable to spawn the command");
    
    return 2;
}
</code></pre>