• Optimization

    For my wind visualization I have a bunch of 2-3 megabyte text files containing a lot of METARs. I parse them to get the winds data, then do some statistics on them to produce a rollup.

    Doing all this work in one swoop takes 5.3s for a smallish file (1.5 megs, 24000 lines).

    Parsing the file and writing it to CSV takes 4.6s. Reading the CSV in and doing the rollups takes 1.4s. So I’ve added about 0.7s overhead in splitting the two steps thanks to the intermediate file. OTOH I should really only need to generate that CSV once.

    Once again I learned the hard way that iterating line by line through gzip.open() is really slow; way faster to read the whole file in to RAM, a cStringIO if necessary. Terrible buffering in Python’s library.

    I’m beginning to regret not just loading all this crap into a relational database.

     

  • Headless SVG rendering part 2

    I revisted headless SVG rendering and decided to stick with PhantomJS since I already have it working. It’s rendering text poorly and path strokes not at all, probably fixable with some upgrades that are inconvenient in Debian. But I can work around that for now.

    Scripting PhantomJS turns out to be pretty easy. Here’s the Javascript program I use to drive it:

    if (phantom.state.length == 0) {
        phantom.state = "rasterize";
        phantom.viewportSize = { width: 64, height: 64 };
        phantom.open("vis-phantom.html");
    } else {
        phantom.render("out.png");
        phantom.exit();
    }
    

    Unfortunately the input and output filenames are hard-coded. Phantom has command line argument support but I can’t figure a way to get that all the way into my HTML file (to load a specific script), so I just went with a hardcoded input file named data.js.

    Here’s my shell script to render a PNG, crop it, and optipng crush it. The crop shouldn’t be necessary, but for some reason I’m ending up with 3 pixels below my SVG element.

    set -e
    # Render a wind rose image for the given datafile
    input="$1"
    output="${1%.js}.png"
    
    # remove old crap
    rm -f data.js out.png cropped.png "$output"
    # Symlink the data to the expected place
    ln -s "$1" data.js
    
    # Render, crop, and crush
    phantomjs/bin/phantomjs render.js
    convert out.png -crop 64x64+0+0 cropped.png
    optipng -force -quiet -preserve -out "$output" cropped.png
    
    # clean up
    rm -f data.js out.png cropped.png
    

    It takes about 300ms to render a single image for my simple SVG based graphics.

  • Samba: get rid of stupid executable files

    Editing files on Windows has an annoying habit of making them show up +x on Unix. Windows doesn’t have an idea of an execute bit, so some ambiguity is expected, but still. Turns out the real culprit is the map archive configuration option. It’s on by default, which means the Archive bit in Windows is mapped 1:1 with the U+X bit on Unix. That’s really stupid; Windows apps set the Archive bit willy-nilly (hello, Notepad++) and it gets annoying, particularly with git tracking file modes. So configure Samba with “map archive = no” for your fileshare and life gets better.

     

  • forking a branch on github

    I’m trying to submit patches to https://github.com/mbostock/d3. Only what I want to patch is in the gh-pages branch, a special magic branch on github that is published as a website. Forking and modifying a branch is confusing if you don’t understand git. Here’s what works in my shell, after forking the repository:

    # Get a copy of the forked repo
    $ git clone git@github.com:NelsonMinar/d3.git
    Cloning into d3...
    
    # Link in the upstream repo you forked
    $ cd d3
    $ git remote add upstream https://github.com/mbostock/d3.git
    $ git fetch upstream
    From https://github.com/mbostock/d3
     * [new branch]      gh-pages   -> upstream/gh-pages
     * [new branch]      master     -> upstream/master
     * [new branch]      mbostock-master -> upstream/mbostock-master
    
    # Create a local branch, somehow linked to the upstream's branch
    $ git checkout -b gh-pages origin/gh-pages
    Branch gh-pages set up to track remote branch gh-pages from origin.
    Switched to a new branch 'gh-pages'
    
    # Make your changes and commit them locally
    $ ed README.txt
    $ git commit -a -m testing
    
    # Push to your fork on github
    $ git push
    To git@github.com:NelsonMinar/d3.git
       0d1679a..d351228  gh-pages -> gh-pages
    

    The magic step is that checkout line. Other similar things like git branch gh-pages; git checkout origin/gh-pages don’t work and you can end up in weird states like “You are in ‘detached HEAD’ state.” I don’t really understand what all the working command is doing (I’ve got Matt to think for the code to copypasta) but I’m hopeful after reading the git docs on branches and remotes maybe it’ll make more sense.

  • Screenshots

    I’m about to refactor all my wind plotting code, so I decided to take some screenshots before I broke and changed everything.

  • Headless SVG rendering

    I’m doing an S3/SVG visualization project and want an easy way to generate 3000 static images with a script. I need a headless SVG renderer. Mike suggested a couple of options: Rhino + Env.js + Batik, or maybe using node.js instead of rhino. Apparently the Protovis community does this; need to delve in their mailing list for examples.

    PhantomJS happened to float across today, so I tried it out. It’s WebKit based instead of Mozilla based. It was pretty easy to get going. I had to install libqt4-dev libqt4-webkit on my Debian system to build it. I also learned it’s not really headless, requires an X server, so I installed xvfb and ran it via “Xvfb :1”. With all that in place the basic PhantomJS works. Including rasterizer.js, which loads an arbitrary URL and renders it.

    To run it:

    $ Xvfb :0 2> /dev/null
    $ export DISPLAY=:0
    $ phantomjs rasterize.js vis-phantom.html out.png
    

    My visualization didn’t work. First I had to remove the Ajax calls from it and statically load the data. (May not be strictly necessary, not sure). Second I had to set the background to white; the default background in qt4-webkit is transparent, which is actually pretty useful. With those changes my rendering sort of worked, but the font rendering is ugly and none of my stroked lines are visible. I’m using libQtWebkit-4.6.3; apparently 4.7 may fix the stroke problem.

    I think I could massage this into something usable, but I’ll probably investigate the Rhino/Batik solution or something next.

  • WingIDE 4.0 problem with Python 2.7.1 on Windows

    My Python IDE of choice, Wing, has a capability to start the debugger when an exception is raised, but only if it “appears unhandled”. It’s great but a little delicate. It broke entirely on my recently, all exceptions appeared unhandled. I narrowed the problem down to running in Python 2.7.1. Python 2.6 works fine. The workaround Wing recommended was setting the debugger to break when exception is printed.

  • Javascript dictionary key types

    In Javascript, the dictionary type coerces its keys to Strings:

    z = { 3: "three", 4: "four" }
    for (var x in z) { console.log(typeof x); }
    
    string
    

    This sort of makes sense because dictionaries are really just Javscript objects, and objects are typically accessed by strings. Ie: z.foo is the same thing as z[“foo”]. Worth noting that z.3 doesn’t work, though. It gets spookier when you use null:

    z[null] = "foo"
    
    for (var x in z){ console.log(x, typeof x); }
    3 string
    4 string
    null string
    

    Maybe it wasn’t wise to use integers and null everywhere as keys in my JSON dictionaries. In fact the JSON spec says keys have to be strings, and the Python JSON library was already serializing my keys as strings (with None mapped to “null”). Now I know.

  • D3 and IE9

    IE9 is out, and Microsoft finally supports SVG. Early testing with D3, Mike’s new Javascript library good for doing visualizations, is encouraging. All his examples work. My own code worked once I put <!DOCTYPE html> at the start to convince IE not to use quirks mode.