James Stanley

How to interrupt a regex in Perl

Wed 23 March 2016

Since 5.8.0, Perl's "safe signals" defers the delivery of signals when a custom signal handler is in use, until it is at a safe point to handle them. This means you can not simply use alarm() to interrupt a long-running regex.

It is simple enough to create a child process to run the regex match, and use the default SIGALRM handler in the child to allow it to be timed out. Here is an example function to run a regex match with a timeout:

sub match {
    my ($string, $regex, $timeout_secs) = @_;
    
    my $pid = fork();
    die "can't fork: $!" if !defined $pid;
    
    if ($pid == 0) {
        # child process
        $SIG{ALRM} = 'DEFAULT';
        alarm $timeout_secs;
        
        exit(($string =~ $regex) ? 0 : 1);
    }
    
    # parent process
    waitpid($pid, 0);
    
    die "regex timed out\n" if $? & 0x7f;
    return !($? >> 8);
}

This child process instates the default SIGALRM handler, starts an alarm, and checks if the string matches the regex. It exits with 0 status if the string matches, and 1 otherwise.

The parent process waits for the child to exit. $? is the exit status. "$? & 0x7f" tells us which signal, if any, the child died from (we just assume it was SIGALRM). "$? >> 8" tells us the process exit status, which tells us whether the regex matched or not.

Given this information, the parent process either dies with "regex timed out\n" or returns 1 if the regex matched and 0 otherwise.

If you like my blog, please consider subscribing to the RSS feed or the mailing list:

James Stanley - james@incoherency.co.uk | jesblogfnk2boep4.onion | [rss]