Feeds:
Posts
Comments

Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.

Why am I thinking about this now?

So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.

The line to match was:

<some_tag>quickbrownfox</some_tag>

Other valid values within the tags included quick,brown,fox or quick brown fox.

And the regex in the script was:

m{\<some_tag\>(\w+)\<\/some_tag\>}

As you can see, there were also a few unnecessary backslashes.

I suggested [^<]+ instead of \w+ with my reasons.

The developer changed the regex to:

m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}

Okay slightly better, but it still didn’t match all the possibilities.

I pointed out a valid pattern it didn’t match and asked again for [^<]+.

This was the result:

m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}

Okay, fine. The \s space matchers are redundant, but at least it covers everything.

So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.


1. Not an affiliate link

2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.

Living Without Emacs

I have integrated emacs into most parts of my daily workflow. In a typical day I might need to:

Some of these I have aliases for in my shell already.

A number of database GUIs have template functionality built-in or I could probably substitute the sql templates with scripts if I was sufficiently creative with the command line args.

And if I kept my TODO/Notes list in a wiki, it might be generally useful to the rest of my team.

Yes, thinking about it, I could live without Emacs. But life would be much less pleasant.

I wouldn’t get into google (their interviews don’t select for my brand of genius). But that’s fine as I wouldn’t be able to write perl there anyway.

I was trying to think of a few reasons why Google would have chosen Python over Perl as their primary scripting language. A number of the ones I came up with are less relevant than they used to be.

Python integrates better with C++

Boost.Python promises seamless interoperability between the two languages. It has been a while since I’ve been tempted to embed a scripting language in my C++ apps so I’m not sure what the counterpart is in Perl if any.

Python "threads"

Back in the day, these were better than the Perl offering. More recently, on Unix at least; with Coro, AnyEvent, Twisted, etc. it’s a wash.

Windows Implementation

A few years ago, the various Perl Windows implementations were not very high quality. Strawberry Perl has changed all that though. It’s awesome.

Jython et el.

Was Jython around when Google was choosing Python? I guess it was.

Perl (5) hasn’t escaped its VM as well as Python and Ruby. Having said that, the Jythons and JRubys of the world are always second class citizens compared to the original C VM as no-one can be bothered to rewrite all the C extensions.

Easier to avoid making a mess as a secondary language?

Experts in either language can write clean, efficient code. But what about folks who are expert in C++ and Java and use either Perl or Python on the side and only occasionally?

I think such usage for a large system will result in unwieldy code in either language, unless the large system is curated by experts, but for small to medium systems in the absence of experts, python may have the edge.

Rabid anti-Perlism

Even smart folks I know seem to think Perl is lacking in some way in comparison to Python. I haven’t managed to get them to enumerate any reasons though so I figure they are speaking from a position of ignorance. This could have been the case for the folks making the decisions for google too or, more likely, they may have considered some of the other reasons on my list.

I’m subscribed to Ironman Perl’s full Atom feed in google reader. Most posts display nicely, especially if they use embedded css like mine (boo – bad practice Jared) 😉

However, it seems as though there is some strange interaction between certain feeds, Ironman and Google Reader. Take a look at this post from Mark Fowler:

Ironman Perl in Google Reader

Everything has been stripped, even the paragraphs and line feeds.

At first I thought it was only feeds burned with feedburner, but it isn’t as simple as that.

Another minor nit: detection of the word perl isn’t done at the beginning and end of the words. Hence, properly would cause a post to show up in the feed, even if it has nothing to do with perl.

More on REPLs

Most people who think they want a REPL don’t actually want a REPL. It is what a REPL can give them – a Perl (or a Python or a Ruby) machine to which you can add functionality a bit at a time.

For an emacser like me, a sufficiently flexible command-line REPL can give me that incrementally adaptable machine. For most people, a graphical interface to that machine would probably be better.

And thanks, folks, for the suggestions of REPLs to take a look at. No doubt I’ll get around to evaluating a few of them at some point.

I figured out what the problem is with Devel::REPL and the command line REPLs provided by Python and Ruby – evaluation is not a separate step. When I press enter, I’m forced to evaluate the current line.

Yes, you say, that is what REPL means – Read, Evalute, Print, Loop. Evaluate comes after Read.

In reality, usable REPLs, such as Emacs1, let you control when the read evaluate print sequence happens. I can craft the most beautiful function I can think of. Better still, if I change my mind, I can easily modify the function and redefine it with a keystroke. Well, a key chord at least.

In contrast, with Devel::REPL, once I have pressed enter, changing my mind is painful. Integrating it with Emacs comint will probably alleviate a lot of that pain.

Or better yet, as Anonymous recommends, I should take a look at Sepia or PDE which already have emacs integration. Having said that, basic integration is, what, 20 lines of emacs-lisp?


1. Yes, Emacs is a REPL. Kinda.

I’m somewhat amused at one of the more recent comments here – Nathan L. Walls defends his "choice" of Ruby with some very woolly justifications (emphasis mine):

"Ruby’s community feels more vibrant. No, not something you can measure. It is a feeling."

"Yes, there are equivalents in Perl, but they are far rougher. Again, not really measurable, but a feeling."

Of course, his day job is still writing Perl. Moving swiftly on…

Devel::Repl

The main thing I got out of the comment apart from a chuckle, was it motivated me to look at Devel::REPL.

One of the other main tools in my toolbox is emacs and when writing emacs lisp, I make full use of the REPL. But I’ve never even wanted an equivalent in Perl.

One cpanm invocation later and I’m ready.

Wait, no I’m not. I copied Caleb’s repl.rc config to make it more usable. I added MultiLine::PPI which resulted in a bunch of errors at start-up. It turns out I need to add File::Next and B::Keywords separately.

$ cpanm File::Next
$ cpanm B::Keywords

Okay, now I’m good to go.

First REPL session

$ jared@localhost $ re.pl
$ sub f
$ {
> say 'h';
> say 'hello';
> }
h
hello
1 $ f();
Runtime error: Undefined subroutine &Devel::REPL::Plugin::Packages::DefaultScratchpad::f called at (eval 290) line 5.
$ sub f {
> say 'h';
> say 'hello';
> }
$ f();
h
hello
1 $

It still isn’t quite perfect. But to be honest, I find it (and the Python and irb REPLs) pretty useless. I probably need to look into integrating it with emacs comint.

Why Would I Ditch Perl?

A perl programmer asks why ditch perl? on the Perl Reddit. I briefly considered alternatives in my Is Perl My Perfect Compromise post. 18 months later there is a clearer answer:

Of the “big three general purpose scripting languages”, Perl is improving most quickly, has the best libraries and the most jobs available. There is no compelling reason to switch to Python or Ruby and plenty of good reasons to stay with Perl.

At my job Perl is one of the sanctioned languages and it makes work fun. Over the past 18 months, improvements have been coming thick and fast (go go Moose and regular release cycles). For Windows development Strawberry Perl is getting better and better.

Outside of web development CPAN is better than the Python and Ruby equivalents and Python has inferior scoping anyway.

None of the minority languages mentioned in my previous post such as Clojure, Scheme and Haskell have gained any traction so there is no need to discuss them further here.

Revisiting Autovivification

Last time I spoke about wrapping hash access I got a bit more than I bargained for. It’s still something I’m tempted to do from time to time.

Autovivification by default is very sensible (or perhaps Perl just suits me). When I set a parameter within a structure, I generally want all the ancestors to be created first. That’s why I have the following aliases to mkdir.

jared@localhost $ alias | grep mkdir
alias failingmkdir='/bin/mkdir'
alias mkdir='mkdir -p'

Autovivification on data retrieval though, can be a bit confusing.

use strict;

my $data = {};
print '1:', exists($data->{'key1'}), "\n";
if (! exists($data->{'key1'}{'key2'})) {
    print '2:', exists($data->{'key1'}), "\n";
}
jared@localhost $ perl5.10 t.pl
1:
2:1

The CPAN Autovivification Module

Fortunately, it’s easy enough to disable it with the autovivification module.

no autovivification qw(strict fetch exists delete);
perl t.pl
1:
Reference vivification forbidden at t.pl line 7.

Promote Uninitialized Warnings to Fatal

Zoul mentions a way to avoid typical autovivification errors1 on stackoverflow.

use warnings NONFATAL => 'all', FATAL => 'uninitialized';

Or unlock_keys and lock_keys_plus from Hash::Util mentioned by Chas. Owens in the comments might be closer to what is needed in some circumstances.


1. True, it doesn’t only apply to autovivification, but the other effects are useful too.

AnyEvent Notifier Consumer

For the producer demonstrated last time, it is easy to make a consumer using AnyEvent. Not only that, I can borrow most of the code from my list_processes script.

The utility functions are the same as in the unix pipe producer/consumer.

sub consumer
{
    my ($host, $port, $cb) = @_;

    my $cv = AE::cv();

    my $handle; $handle = AnyEvent::Handle->new(
        connect  => [$host => $port],
        on_error => sub {
            print("Connection error: $!\n");
            $handle->destroy();
        },
        on_eof => sub {
            print "Connection closed\n";
            $handle->destroy();
            $cv->send();
        }
    );

    # We need to consume the first line which contains the PID message
    $handle->push_read(line => sub {});

    $handle->on_read(sub {
        my $handle = shift;
        my $data = $handle->rbuf();
        $handle->rbuf() = '';
        $data =~ tr/\r//d;
        foreach my $line (split /\n/, $data) {
            $cb->($line);
        }
    });

    return $cv;
}

The callback function checks the subject is correct and then calls process_file(...). It should be fairly easy to see how to extend this for much more complex producers and consumers.

sub process_file
{
    my $file = shift;
    my_log "Processing file [$file]";
    # File processing logic here ...
}

sub handle_line
{
    my $line = shift;

    my ($subject, $action) = split /\s+/, $line;
    if ($subject =~ m{^/producer/file-creator/new-file}) {
        process_file($action);
    } else {
        my_log "FROM PRODUCER [$line]";
    }
}

my $cv = consumer('localhost', 12345, \&handle_line);
$cv->recv();