Feeds:
Posts
Comments

Last time, I linked to some example code that forks a bunch of processes and communicates with them via pipes. This is the main feature of the code I’m interested in, but the explanation is the article is kinda sparse so you can consider this to be supplemental.

As usual, the perl doco is pretty good for covering this stuff.

Creating a child process (a kid) involves two pipes, one for the parent to send data to the kid, and one for the kid to send data back to the parent.

One probably obvious thing to note, you can’t directly send a reference down a pipe, (well, not in any reasonable way and that’s a feature, not a bug), so you’ll be interested in serialisation modules. I’ve mentioned them in passing before and I generally use JSON::XS these days.

Another hopefully obvious thing is if the writer is buffered and the reader is waiting for something specific, there will probably be some deadlock in your future. Remember to unbuffer the writer.

I made a couple more J – comments inline:

sub create_kid {
    my $to_kid = IO::Pipe->new;
    my $from_kid = IO::Pipe->new;

    # J - Fork returns undef for failure, 0 for the child, and the
    # J - child PID for the parent

    # J - Handle fork error
    defined (my $kid = fork) or return; # if can't fork, try to make do

    unless ($kid) { # I'm the kid
      # J - The kid reads from $to_kid and writes to $from_kid
      $to_kid->reader;
      $from_kid->writer;

      # J - unbuffer writing to the pipes.  Otherwise may deadlock
      $from_kid->autoflush(1);

      # J - Reset all of the signal handling
      $SIG{$_} = 'DEFAULT' for grep !/^--/, keys %SIG; # very important!
      do_kid($to_kid, $from_kid);
      exit 0; # should not be reached
    }

    # J - parent here...
    # J - The parent reads from $from_kid and writes to $to_kid
    $from_kid->reader;
    $to_kid->writer;

    # J - unbuffer writing to the pipes.  Otherwise may deadlock
    $to_kid->autoflush(1);

    $kids{$kid} = [$to_kid, $from_kid];
    $kid;
}

Parallel Tasks using Fork

Randal Schwartz wrote an example link checker which used forked processes to run tasks in parallel. Each child process created has a read pipe from and a write pipe to the parent (created with IO::Pipe).

The result is an inverted version of my preferred architecture. I like the parent to dump work on a queue and whichever child is ready to pull it off. This is pretty easy to do with threads.

In Randal’s version, the parent figures out which child is available to do work.

My grub menu finally got scrollbars so I decided to look into how to remove old linux kernels.

The easiest instructions I found are here.

I checked which kernels are installed in /boot.

$ ls /boot/vmlinuz*
/boot/vmlinuz-2.6.32-29-generic  /boot/vmlinuz-2.6.32-30-generic

And then removed everything apart from current and one version back.

sudo apt-get remove - --purge 2.6.32-28-*

My question about how windows works without fork got pointers to two useful Windows perl modules in the comments:

Spawning External Programs

(from dagolden) See Win32::Job or just use system(1, $cmd, @args)

Creating Daemons

(from Christian Walde) Use Win32::Detached


What I was really wondering about was how reliable software was written without fork. What do I mean by that?

What is the difference between?

parent
fork()
do_something_independent()
wait_for_childs_response()

and

parent
thread()
do_something_independent()
wait_for_childs_response()

If the child thread misbehaves – exits, dumps core, leaks memory, or whatever, that’s bad for the parent. Child processes can’t wreak quite so much havoc.

Presumably the solution adopted in Window is don’t do stuff in threads that causes them to accidentally exit [the whole process], dump core or leak memory.

How does windows work without fork? So many things start with a call to fork: starting a new process (fork/exec), creating a daemon, creating independent tasks within one program…

The nice thing about all of these things is, if a new process goes bad, it probably won’t take out the parent process. You don’t get the same safety with a thread. Therefore, fork is the basis of a lot of my reliable software.

I’m fairly confident that windows doesn’t have anything similar to fork. Otherwise the perl fork emulation could simply wrap whatever windows has. Sadly it doesn’t quite work that nicely.

While the emulation is designed to be as compatible as possible with the real fork() at the level of the Perl program, there are certain important differences that stem from the fact that all the pseudo child "processes" created this way live in the same real process as far as the operating system is concerned.

If the parent process is killed (either using Perl’s kill() builtin, or using some external means) all the pseudo-processes are killed as well, and the whole process exits.

I have been approaching this problem of reliable windows software from too low a level. Rather than copying what I do on Unix and start with fork (using Perl more or less as my OS wrapper), I now use web servers to manage my processes.

Remote Killing from Emacs

Previously, I’ve talked about integrating little perl "services" into emacs that read from stdin and write to stdout.

Sometimes the processes that make up your emacs application are running on different servers, picking up various pieces of information and it might be necessary to kill them independently or in groups. By default, comint binds C-c C-c to SIGINT, but if you have a moderately complex application, you don’t want to go through each comint buffer and kill the processes individually.

Each of your perl scripts needs to output the PID and the server it is running on.

use Sys::Hostname;
my $hostname = hostname();

print "Started PID $$ on $hostname\n";

Then the emacs application code needs to use a filter to read that information into a couple of variables.

(defvar proc-pid nil)
(defvar proc-host nil)

(defconst *re-match-pid-host*
  "Started PID \\([0-9]+\\) on \\([a-z0-9]+\\)")

(defun proc-filter (output)
  (when (string-match *re-match-pid-host* output)
    (setq proc-pid (match-string 1 output))
    (setq proc-host (match-string 2 output)))
  output)

(add-hook 'comint-preoutput-filter-functions 'proc-filter)

Running commands remotely is easy with ssh. (Incidentally, how does this work in windows-land?)

(defconst *ssh* "ssh -o StrictHostKeyChecking=no")

(defun kill-proc ()
  (interactive)
  (let ((cmd (format "%s %s kill %s" *ssh* proc-host proc-pid)))
    (message cmd)
    (message (shell-command-to-string cmd))))

I’ve demonstrated the make-comint code previously.

(defun start-proc ()
  (let ((buffer (get-buffer "*stdin-proc*")))
    (when buffer (kill-buffer "*stdin-proc*")))
  (apply 'make-comint "stdin-proc" "perl" nil
         (list "./test-stdin.pl")))

And the result is as expected.

Started PID 5008 on saturn

Process stdin-proc terminated

Relative Popularity

It’s hard to objectively measure Perl’s popularity compared to Python or Ruby. Even before nerds-central used job availability as a measurement, the ease of finding a job was a key requirement for me.

Who cares if I love Ruby if I can’t use it for the 10 hours a day I’m working?

Anecdotal evidence

In the last twelve years I’ve had no difficulty finding plenty of Perl work. In the past couple of years, I’ve had a few calls about Python jobs, but probably ten times fewer than the number of Perl calls I get. Hold the front page – Perl is ten times more popular than Python.

Well, no. Of course I get a lot more Perl calls. I’m a Perl programmer, and my résumé mentions Perl all over the place. Anyone calling about Python must really be desperate!

But I notice that a lot of Python programmers working at Python-centric firms apply similar logic. All the folks they work with do Python, clearly no-one does Perl anymore. Thank goodness – Python is finally the winner it always deserved to be.

Why do people care?

Network effects are important. All other things being equal, a language with more users will get more libraries, will have more companies using it, will get more users, will get more libraries, etc. vs a language with fewer users. And, popularity aside, none of the big three scripting languages has a huge advantage over the others.

Except of course that Perl was the first mover.

So, what can you do if you don’t have a real advantage you can point to? That’s right – you trash-talk the leader, hope you can convince enough people there is a real problem and when they jump ship hopefully you’ll be the one left with the positive feedback loop.

Michael Murdock doesn’t believe in anti-Perlism. Of course he has to indulge in some of his own.

“Every developer with whom I have discussed this issue says roughly the same thing: It’s very difficult to pick up a piece of Perl code, even when it’s your own, and be able to quickly understand the gist of it.”

I find it hard to understand French but I don’t go round saying it is a speak-only language.

You couldn’t make it up. Or was the irony intentional?

Replacing sed

I’ve spoken about processing logfiles with perl previously. Occasionally though, I still reach for sed.

Say I have a logfile that looks like this:

[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
... 1000s of lines
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
... 1000s of lines

Picking out lines following a pattern is easy with sed – p prints the current match and n takes the next line.

$ < log.txt sed -n '/somefunc()/ {p;n;p;n;p}'
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2

My first attempt to replace that with perl looks a bit ugly

< log.txt \
perl -ne 'if (/somefunc\(\)/) {print; print scalar(<>) for (1..2)}'

I’m not that happy with the module I came up with to hide the messiness either.

package Logfiles;

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT_OK = qw(process);

use Carp;

sub process (&$;$)
{
    my ($sub, $regex, $lines) = @_;
    $lines ||= 0;
    return unless /$regex/;
    if (! $lines) {
        $sub->($_);
    } else {
        croak "process() arg 3 not ref ARRAY" unless ref($lines) eq 'ARRAY';
        my $line = 0;
        my @lines = @$lines;
        while (1) {
            if ($lines[0] == $line) {
                $sub->($_);
                shift @lines;
            }
            ++$line;
            last if ($line > $lines[0] or (not @lines));
            $_ = <>;
        }
    }
}

1;

But at least my typical spelunking looks a little cleaner now.

< log.txt perl -MLogfiles=process -ne \
    'process { print } qr/somefunc\(\)/, [0..2]'

Any suggestions on how to improve this (without reverting to sed)?

Moose Types

Rants about generalisations notwithstanding, I’m a fan of typeful programming (I’m sure I’d love Ada). For a script that will be moderately complex, I like to sit down and think about the types I’m going to use before I start.

Any library that will enable me to specify my types more precisely and concisely is obviously a win.

And speaking of Moose…

Moose has a bunch of methods to specify your types and a built-in type hierarchy available that you can build off.

Abusing an example I’ve used before:

use Moose;
use Moose::Util::TypeConstraints;
use MooseX::Params::Validate;

subtype 'LegalDrinkingAge'
    => as 'Int'
    => where { $_ >= 18 }
;

coerce 'LegalDrinkingAge'
    => from 'Int'
    => via { 1 }
;

sub can_legally_drink
{
    my $age = pos_validated_list(
        \@_,
        { isa => 'LegalDrinkingAge' },
    );

    return 1;
}

print can_legally_drink(18), "\n";
print can_legally_drink(17), "\n";

Checking for a LegalDrinkingAge type here is obviously the wrong thing to do, but for the purposes of the example it will do.

The resulting error is fine, if a little ugly.

$ perl5.10.1 moose-types.pl
1
Parameter #1 ("17") to main::can_legally_drink did not pass the 'checking type constraint for LegalDrinkingAge' callback
 at /u/packages/lib/perl5/site_perl/5.10.1/MooseX/Params/Validate.pm line 168
        MooseX::Params::Validate::pos_validated_list('ARRAY(0x976ce40)', 'HASH(0x911b6f8)') called at moose-types.pl line 24
        main::can_legally_drink(17) called at moose-types.pl line 33