• gzip encoding, Chrome dev tools bug

    Apache 2 on Debian does not gzip encode by default. Enabling it is as simple as doing a2enmod deflate, then restarting Apache. The default configuration is fairly conservative; if you just want to compress everything modify defalte.conf to say “SetOutputFilter DEFLATE” instead of adding a few MIME types.

    Google Chrome has bug 40502 where it appears files aren’t being compressed even when they are. Serving JSON files, if they were smaller than 190k dev tools showed Transfer was much smaller than Size, indicating compression. Bigger files looked uncompressed. But checking with WireShark they were actually comprssed, dev tools just lied. Apparently Apache2 decided to use chunked encoding for the larger files and Chrome has as bug with that.

     

  • using git on Windows via SMB

    I keep my source code on a Linux machine, but I develop on Windows. The cross-platform thing is a nuisance. I mostly prefer to work in the Unix way, and make Windows cope.

    msysgit seems to be the Windows git client of choice. (Cygwin is too foreign an environment). To make it play nicely with Unix systems:

    • core.autocrlf=false
      Don’t ever modify line endings, just accept what’s there. Every Windows editor I use can handle Unix line endings just fine.
    • core.filemode=false
      Don’t mess with file modes. Works around the problem that Samba doesn’t really handle Unix file modes right, particularly +x.  Avoids the problem of “old mode 100755 new mode 100644”.  This probably means you can’t meaningfully set file modes from Windows.
    • gui.encoding=utf-8
      Not strictly a Windows thing, but a good idea.

    These can be set via the msysgit bash shell with “git config core.filemode false”. You can see your settings with “git config -l”. There’s something confusing I don’t understand about system/global/local settings that I’m ignoring.

  • Very simple python logging

    I do a lot of quick and dirty Python scripts. One thing that bites me is trying to print useful data on exceptions, where the useful data contains non-ASCII and I’m trying to write it to sys.stderr and I get an exception in my exception handler, killing my program:

    try:
        process(inputString)
    except:
        sys.stderr.writeln("Error with input %s" % inputString)
    
    UnicodeEncodeError: 'ascii' codec can't encode character u'\x81' in position 0: ordinal not in range(128)
    

    Very irritating. One workaround is to use Python’s logging package. It’s a hugely complex configurable beast, but the default config is useful: print all warnings and above to stderr. And it handles Unicode sensibly.

    >>> logging.warning('u"\x81"')
    WARNING:root:u"▒"
    >>> logging.warning('u"\u2022"')
    WARNING:root:u"\u2022"
    

    Not sure what it’s doing; maybe trying Latin-1 and escaping anything else? I don’t care if the output is mangled as long as it doesn’t kill my program.

  • Python zipfiles: don’t read a line at a time

    readline() inside a Python zipfile is slow. It’s much much faster to read the whole thing into memory once, then scan.

    Compare:

    fp = zipfile.ZipFile("foo.zip").open("data.txt")
    # Iterate a line at a time
    for l in fp:
        if l.startswith("FOO"): print l
    
    # Read the whole file into memory, grep
    regexp = re.compile(r'^(FOO.*)$', re.MULTILINE)
    data = fp.read()
    if m: print m.group(1)
    

    I’m not surprised that the second style is faster, but I’m surprised how much. Some 25 times faster both on my test case (0.5s vs 12s) and over 3.6 gigs of real data (8 minutes vs 200 minutes). Obviously reading the whole file first takes more RAM, but in my case the files are only 500k each.

    A quick profile suggests zipfile spends a lot of time calling next(); maybe it’s doing work a character at a time? Issue 7216 suggests there’s a 100 byte buffer at work; yuck! Fixed in 2.7 or 3.x or something.

  • What’s new since Python 2.2?

    I learned Python when 2.2 was the new hotness. Since then I’ve picked up a lot of newer Python features but never really sat down to think of all the new language and library capabilities the system had. Here’s a list of some highlights in the 2.x releases I picked up off the what’s new pages. 2.6 (3.0) means the feature is available in 2.6 as a backport from 3.0. 2.6f (3.0) means the feature is only in 2.6 as a __future__.

    Language

    • 2.3, 2.4, 2.5: Generators. Generator expressions. send(), throw(), close().
    • 2.4: Function and method decorators. 2.6 (3.0): Class decorators
    • 2.5: Conditional expressions (“foo” if x == 3 else “bar”)
    • 2.5: Absolute imports (default in 3.x)
    • 2.5: try/except/finally unified
    • 2.5, 2.6: with statement (this one is important; need to use it more)
    • 2.6 (3.0): except ExceptionClass as variableName:
    • 2.6 (3.0): strings are unicode. bytes type.
    • 2.7 (3.1): dictionary and set comprehensions

    Libraries

    • ???: codecs.open() as a way to be explicit about character encoding
    • 2.3, 2.4: Sets. 2.7 (3.1): set literals
    • 2.3: Universal newline mode for files
    • 2.3: logging
    • 2.3: CSV files
    • 2.5: functools.partial() for closure-like partial function application
    • 2.5: ctypes, quick access to shared libraries
    • 2.5: ElementTree (xml.etree)
    • 2.5: sqlite3
    • 2.6f (3.0): str.format(). 2.7: autonumbering
    • 2.6f (3.0): print() as a function
    • 2.6 (3.0): abstract base classes, particularly for collections and numbers
    • 2.6: json (via simplejson)
    • 2.7: collections.OrderedDict
    • 2.7: argparse
    • 2.7 (3.1): dictionary views
    • 2.7: collections.Counter (replaces my accumulator.py)
    • 2.7: unittest improvements, backported to 2.4 as unittest2

    Actually using some of these features is harder: Google AppEngine is stuck at 2.5 and Debian is stuck at 2.6. And of course 3.2 is the new hotness now, but largely not usable for ordinary user code.

  • Calculating the difference between two GPS tracks

    When comparing my Bad Elf GPS track to my AMOD track, I wanted some mathematical calculation of the difference. I don’t really know even what that’d mean, but matt_c pointed me to some ideas:

    • Hausdorff Distance (PostGIS, GEOS, explanation). My quick read suggests this will calculate the maximum divergence between the two tracks, not the sum of many little errors.
    • Fréchet distance. “The Frechet distance is a measure that takes the continuity of shapes into account and, hence, is better suitedthan the Hausdorff distance for curve or surface matching” (source).

    This sounds complicated.

     

  • Bad Elf / MotionX technical notes

    I got a Bad Elf GPS for my iPhone. Some notes on using it with MotionX to record tracks:

    • It seems to work better in airplane mode. First time I tried it, I never got < 50m accuracy, maybe it was still using wifi or 3G positioning? Airplane mode guarantees the only location fix is the Bad Elf GPS.
    • MotionX seems to be the best GPS tracking software. The UI is awfully non-Apple, but it seems to track well.
    • I consumed about 60% of my iPhone 3GS battery creating a 2.5 hour long track. GPS had a good view of half the sky (on the glareshield of the plane), airplane mode, doing nothing but running MotionX. I put it manually in Flying mode.
    • The 2.5 hour track was 260kb of GPX, 132kb of KML.
    • The MotionX overview of the track says it’s 191.8nm, 2:29:13 elapsed time, avg speed 77kts, max speed 152kts. My (known buggy) calculation on the AMOD log was 196.9nm, 2.3h elapsed, 1.9h in motion.
    • The Bad Elf + MotionX track looks right. I overlapped it with the AMOD log track and they were mostly on top of each other. Tracks diverged by as much as 16m. Spot check on one specific divergence and the AMOD said it has 10 satellites and HDOP of 1.1, a very accurate fix. Hmm.
    • GPX file contains a list of fixes: (lon, lat, ele, time). Once in motion it’s one fix every 2 or 3 seconds. The timing discrepancy doesn’t seem to just be a rounding error; I’ve got a sequence of 20:07:13 20:07:15 20:07:18 20:07:20 20:07:22 20:07:24 20:07:27.
    • KMZ archive contains a “doc.kml” whose main contents are a LineString with (lon, lat, ele). No timestamps. The KML is rendered without altitude by default in Google Earth but you can change the properties to show an extruded track, elevation profile, etc. It appears the elevation is clamped to 0′. It’s not DEM corrected, just a hard-coded 0?
    • KMZ archive also contains an extra file “raw.xml” which seems to contain raw data, not relevant to Google Earth. But lots of extra data from MotionX! It contains two sections. The XML has “locations”: a list of (time, activeTime, distance, stopped, lat, lon, alt). Time looks accurate to 1ms. activeTime and delta are just deltas from previous locations (it has both 2 and 3s activeTimes)  Stopped must be MotionX’s guess of whether we’re moving? The XML also has “recordingSnapshots”, which look to be 1 minute rollups of distance and altitude.
    • The email also contained a Google Maps link, which incorporates the same KMZ archive as was mailed to me (including raw.xml) served for 6 months from share.gps.motionxlive.com.
    • MotionX does not seem to record any measure of GPS fix accuracy. Too bad, that data is available in the Apple API. It may be smart enough to automatically filter out fixes with bad data.

    I’ve been meaning to measure Bad Elf + GPS when the phone isn’t moving. My first attempt recorded no track, I think because it was clever enough to not record when not moving. Need to investigate Settings / Global: Accelerometrics Assisted GPS, Accuracy Filter.

  • Coordinate reference systems and projections

    I get confused as to the difference between coordinate reference systems (CRS) and projections. This overview clued me in some, but I still might be wrong. Here’s my current understanding.

     

    CRS are all about spherical coordinates, naming points on the sphere. When I say “San Francisco is at 37.75, -122.45”, what I’m saying is it’s 37.75 degrees north of the equator of the sphere and 122.45 degrees west of the prime meridian through Greenwich. Points are defined in spherical geometric relation to absolute, fixed locations on the earth.

    Only it’s not that simple. First, the earth isn’t a sphere, it’s squashed, and CRS are defined relative to a specific datum that has some oblate spheroid shape with complex geometry. Second, the prime meridian we most commonly use doesn’t go through Greenwich anymore, it goes about 100m east.

    The most common CRS in use is WGS84, the basis of GPS navigation. WGS84 itself changes as measurements get more accurate, so the specific thing we all use is WGS84 (G1150). (I’m unclear on how measurement changes are managed; the earth itself is moving, too, making absolute location a slippery concept. See also ITRF2000) The other CRS I’m familiar with from NACO aviation charts is NAD83 (CORS96). It agrees with WGS84 to within two meters and I believe NACO advises treating them as interchangeable. In practice I don’t see any useful reason to do work in anything other than WGS84, unless importing data from some other source that used a different CRS.

     

    Projections are all about planar geometry, warping a spherical map to a flat surface.  There’s a large variety of map projections in common use. Mercator is most popular for casual maps and shows up frequently online because it’s what Google Maps (and all slippy maps?) use. Lambert Conformal Conic shows up in NACO sectional charts and USGS sectionals; it’s sort of like a but the projection is parameterized to have the least distortion between two chosen latitudes. Also Lambert Conformal has the nice property that a straight line drawn on the map is very close to a great circle route (shortest distance); handy for airplane navigation.

    In maps showing a relatively small area of the Earth, the differences between projections are relatively small. I think I’m inadvertently benefiting from this in my approach plate georeferencing experiments. I’m just treating the plate as if it were drawn in a mercator projection and it looks about right.

  • Extracting images from SWF files

    I wanted to try to get at an image hidden inside an SWF file, a Flash game on Armor Games. Google searches for “SWF extract” and the like offer a lot of unsavoury software, but a quick search of Debian turned up SWFTools, a GPL toolchain for making and unmaking SWF files. Compiling on Linux was as easy as ./configure; make

    To actually extract the images, you have to find their IDs first, then extract them one by one. The compiled SWF file doesn’t seem to keep the source filename, so you have to hunt by size or the like.

    $ swfdump foo.swf | grep image 
    [024]     14446 DEFINEBITSLOSSLESS2 defines id 1725 image 660x561 (32 bpp)
    $ swfextract -p 1725 -o 1725.png foo.swf
    $ ls -lg 1725.png
    -rw-r--r-- 1 nelson nelson 14877 Feb 19 14:04 1725.png
    

    That example is for a 600×561 lossless image, 14446 bytes long, with ID 1725. I didn’t see any quick way to get swfextract to just extract all the images, although of course it’d be easy enough to write a shell script to do it.

  • Javascript closures gotcha with loop variables

    The way Javascript scopes “var” in loops and how it interacts with closures trips me up every time I go back to writing Javascript code. Here’s a fleshed out example that shows the confusion.

    // Demonstrates using closures to capture the value to do what you meant. See also
    //   http://calculist.blogspot.com/2005/12/gotcha-gotcha.html
    //   http://robertnyman.com/2008/10/09/explaining-javascript-scope-and-closures/
    
    var data = ["a", "b", "c"];
    var wrongFunctions = [];
    var rightFunctions = [];
    
    function load() {
      console.log("Creating functions");
      for (var i = 0; i < 3; i++) {
        // X will be set to "a", then "b", then "c". But at what scope?
        var x = data[i];
    
        // The naive way to construction a function that returns X
        // Will not work: X is scoped to the function load() and
        // is being modified in the loop
        var wf = function() { return x };
    
        // A more complicated way to make a function that returns X.
        // The function execution creates a closure
        // that captures the value during the loop
        var rf = function() { 
          var t = x; 
          return function () { return t };
        }();
    
        // While the loop is executing and we're building functions,
        // the output of wf() and rf() will match data[i]. No problem!
        console.log ("  ", x, wf(), rf());
        // output is "a a a", "b b b", "c c c". Everything looks great!
    
        // Let's store both functions away, what could go wrong?
        wrongFunctions.push(wf);
        rightFunctions.push(rf);
      }
    }
    
    function play() {
      console.log("Running stored functions");
      // Let's loop through and call our stored away functions
      // and see what happens
      for (var i = 0; i < 3; i++) {
        console.log("  ", data[i], 
            wrongFunctions[i](), rightFunctions[i]());
      }
      // output is "a c a", "b c b", "c c c".
      // wrongFunctions were indeed wrong.
    }
    
    load();
    play();