Locklin on science

Lush: my favorite small programming language

Posted in Lush, tools by Scott Locklin on November 19, 2024

I meant to write about this when I started my blog in 2009. Eventually Lush kind of faded out of my consciousness, as it was a lot easier to get work doing stuff in R or Matlab or whatever. The guy who was maintaining the code moved on to other things. The guys who wrote most of the code were getting famous because of the German Traffic Sign results. I moved on to other things.  I had a thought bubble the other day that I’d try to compile it and see what happened. Binutils guys have been busy for the last decade and changed all manner of things: I couldn’t even find documentation of the old binutils the last version of Lush2 was linked against. Then I noticed poking around the old sourceforge site that Leon Bottou had done some recent check ins fixing (more effectively than me) the same problems in the Lush1 branch. I stuck the subversion repo with history so you can marvel at it on github. I may try to revive a few of the demos I remember as being cool.

I call it a small language; compared to contemporary Python or R it is quite small, and had a small number of developers. The developers were basically Yann LeCun and Leon Bottou  and some of their students (there are other names in the source like Yoshua Bengio). This tool is where they developed what became Deep Learning –lenet5; the first version of Torch was in here (as I recall it was more oriented to HMMs at the time). Since it’s a lisp, it’s easy to add macros and such to make it do your bidding and fit your needs. Unlike anything else I’ve ever used, Lush is a real ergonomic fit for the programmer. It has a self-documenting feature which is incredibly useful: sort of like what R does, it takes comments in code and makes them into documentation. Unlike R documentation there is a way of viewing it in a nice gui and linking it to other documentation. So you have a nice manual for the system and whatever you built in it, almost automagically. Remember “literate coding?” It was always a sort of aspiration: this is a real implementation of it, and it’s so easy to use, you’d have to be actively malicious or in a pretty big hurry not to do it. Here’s a screen I made for myself so I could remember how to use some code I built 15 years ago (it still works BTW). You can update it at the CLI, just like everything else in a Lisp.

As a Lisp, you have access to macros which allows you to do magic things that make Paul Graham happy.  I am smooth brain: only wrote a couple of them: I’ve written considerably more C macros than Lisp macros and plan on keeping it that way. The Lush authors also don’t use them very often; mostly in the compiler, which is how it should be.  “A word to the wise: don’t get carried away with macros.”  as Peter Norvig told us in PAIP. There is a nice object system, and a very useful set of GUI tooling. Not just the help gizmo; there’s a full fledged GUI (ogre). Imagine that; something to develop old fashioned graphical user interfaces without importing two gigabytes of Electron and javascript baloney. The helptool uses this; it is not an HTML browser. The documentation format looks a bit like markdown with a few quirks; I never had to look at a manual to write the stuff. Essentially it looks like the standard two sentence comments you put to remind yourself what a complicated function does. It has a nice object system the GUI thing is written in; I assume it’s something like CLOS: whatever it is, there are no surprises and anyone who knows about namespaces and objects can use it. I found it particularly useful for its encapsulation of raw FFI pointers and other tooling which is best trapped in a namespace where it can’t hurt anything.

Since it is oriented around developing 80s-90s era cutting edge machine learning, one of the core types is the array. The arrays are real APL style arrays: rank 0 to rank 4, which is probably one rank higher than most sane people use (most people use rank 2, aka matrices). It looks like it had up to rank-7 at one point: I have no idea what you’d do with that. APLs such as J often have rank-whatever, so someone somewhere has probably done something with such structures. Lush2 had an interesting APL like sublanguage for operating on the arrays, which looked pretty handy, but which I never quite got into (most of my work was in Lush1).

All this is cool, but I suppose other small programming languages promise things like this. The really cool thing about it is the layers. You get a high level interpreted Lisp. You also have a compilable subset of Lisp; mostly oriented around numerics things, just as one would expect in a domain specific language one might develop early convolutional net/deep learning algorithms in. Even better than this, if you want to call some C, including calling libraries, you can enclose your C in a Lisp macro and compile it right into the interpreter. Most of the interesting and useful code in the world still sits behind a C API. With a tool like this: suddenly you have a useful interpreter where you can vacuum in all the DLLs you want, and they’ll be available at the command line.

Most interpreters have some FFI facility for doing this; none to my knowledge are this easy to use or powerfully agglomerative. The memory management happens for free, more or less. In, say, R’s repl, you can do something called dyn.load on libraries with R compatible types. If it’s more complex than that you might have to write significant wrapper code, and this is a hack: it might just leak memory all over the place. You have to work pretty hard to encapsulate C libraries in a proper R package, compiling against the R sources. J, same story; you can use the 15!:0 foreign to load a dll and wrap up J structures to send, with some tooling to deallocate or copy memory locations (very carefully). In Lush, you call the C functions directly, in C, on C’s terms (or C++ ). You can write a couple of lines C wrapper, a couple of pages; whatever: it’s all a part of the Lush source. If you look at examples of well-wrapped dlls in R on CRAN, you’ll see they’re festooned with all manner of ugly R structure casts, mysterious R #defines and all kinds of badness and quasi-memory management you’d have to read a 300 page manual to make sense of what’s going on. Having done this a few times, I’m exaggerating a tiny bit, but it is tedious and fiddly and takes a fair amount of work; a couple days if you’ve never done it before, versus a couple minutes. In Lush you just stick a dollar sign in front of variables you allocated in Lush in your C function calls, and after it’s been compiled into the interpreter (which happens if you “libload” the file), you call them, variables appear where they’re supposed to. No memory leaks. Usually doesn’t take down the interpreter when something goes wrong, though of course if you send something weird to a raw pointer it will probably segfault and die. Here’s an image grab of a simple method for instantiating a KD-tree using LibANN (a bleeding edge nearest neighbor library of circa 2009):

First lines are the documentation; inside the defmethod we try to make a new kdtree; the stuff between #{ and }# is normal C++. You can see the $ in front of $out, this tells the Lush compiler to pull the result back into the interpreter. This method gets compiled and loaded and accessed like any other method in Lush. idx2 is a matrix type, the other stuff does what you think it does.

Lush dates from 1987: I don’t even remember what kind of computers people used back then. I assume something like a 68020 Sun Workstation or a VAX. Even when I was using it in 2009, a “multicore” system might have two cores, so it wasn’t really designed with that sort of thing in mind either (though you could link to blas which do this in most numerics cases and it has tooling to use it on a cluster). Some of the intestines of the thing probably reflect this. I’m pretty sure Lush1 is not completely 64 bit clean: when I was using it in 2009 it was 32 bit binaries only, which was fine as nobody had 256g of ram back then. Other stuff which will seem unfamiliar to contemporary people: it’s for talking to local libraries. There is no provision for a package manager over the interbutts, or much other network stuff I noticed beyond sockets. No JSON (didn’t exist; s-expr are better anyway), sql interfaces (was exotic pay-for technology) and none of the stuff modern code sloppers are used to having. It was mostly a tool for developing more Lush code which links to locally installed libraries: this is what R&D on machine learning algorithms had to be back then. As a tool for building your own little universe of new numerics oriented algorithms it is almost incomparably cozy and nice. You get the high level stuff to move bits around in style. You get the typedefed sublanguage to compile hot chunks to the metal, and you get C/C++ APIs or adding new functions written in C/C++ as a natural part of the system. Extremely cozy system to use. While it’s not the Lisp machine enthusiasts like Stas are always telling us about, it’s probably about as close as you’re going to get to that experience using a contemporary operating system and hardware. Yes you have to deal with the C API: I’m sorry about that, but it’s just current year reality. Nobody is going to rewrite BLAS  in Haskell or CMU-CL to make you happy. Purity is folly.

As a tool, if I had to fault it for anything, it’s a few small things which I could probably fix. For example, in Kubuntu anyway, you can’t copy/paste examples from the helptool. This is probably something that could be repaired if I dig down into whatever X library the ogre package calls to do this. It’s no big deal; not a very wordy language anyway, and I should be reading the docs and typing code I’m about to run in emacs rather than copypasta. Another slightly annoying thing is a lack of built in pretty-print for results. Many languages have this problem: in Lush it’s easy to write one and I have one around somewhere. Some of the packages aren’t well documented and some don’t work because of various forms of bitrot: this is to be expected in something this old. Other than that, no faults. Very cozy programming language. The coziest.

The C insides are fairly understandable, modulo the glowing crystal dldbfd.c gizmo at the center that does the binutils incantations that make the dynamic linking magic happen. Even that looks like it could be understood if you were familiar with binutils. Lush1 there are a number of odd pieces that were planned to be sawed off which you can sort of infer by their absence in Lush2, which had a redesigned VM. However, Lush1 compiles and runs the old code, and Lush2 doesn’t.

While this programming language could (and really should) be revived, even in its present state it can be marveled at. Both for its historical importance in developing machine learning algorithms, and for its wonderful “programmer first” utility. I don’t know what exigencies caused them to move the Torch neural net library to Lua; probably whiny wimps who were intimidated by parenthesis. I can guess why it ended up in Python (the Visual Basic of current year). It’s one of those things where, had things worked out a little differently, machine learning people would be typing lots of parenthesis in vastly more futuristic Lush instead of drearily plodding along with spaghetti in Jupyter. It represents a very clear vision of how software development should work. No bureaucracies or committees were involved in its design: just people who needed a good tool to invent the future. I suspect the committees and social pressures involved in larger programming languages is why they’re often so awful. Lush is all designed and built by makers, not bureaucrats and “product managers.” It feels purposeful. It also feels incomplete, which is as it should be, as these guys were too talented to maintain programming languages. Like an unfinished DaVinci painting; you can see the grandeur of the artist’s vision.

I’ve always been fans of these guys; as I pointed out in my article on DjVu, there is much to admire beyond their good taste in algorithms and dogged determination to continue working on them at a time when only eccentrics were interested in neural nets. All the cool kids of the era were doing SVMs …. because …. researchers are mostly trend following rather than thinking. Hopefully I don’t cheese them off too much by bringing it up, though as an American it is arguably a sovereign duty to piss off the French. For myself, I have a shitload of work to do in coming months. I sort of hope I can find an excuse to fiddle around with it some more, or maybe even use it in production in some small way. If I do, I’ll write about it. I  encourage others to give it a try and ponder how cool 2024 would have been if we used this tool instead of trashfire Python slop you’re all doomed to use in your day job.

DjVu and its connection to Deep Learning

Posted in information theory, Lush by Scott Locklin on May 31, 2023

DjVu is a vastly superior file format for books, mathematical papers and just about anything else you can think of to original PDF (current year PDF adopted some of its innovations, but they’re only used to break into your ipotato afaik). PDF is mostly postscript with a bunch of weird metadata and layers. This is fine if the PDF is generated by LaTeX and is designed to be something that comes out a printer. But most of the world’s useful text is still on pieces of paper that have to be scanned to be on the interbutts. DjVu is good at sharing compressed book scans, and PDF is not. It shows its superiority when someone makes a big image scan in PDF, which is just a bunch of photographic images in jpeg (which is absolute shit at representing text in part because of how the FFT works) or tiff. DjVu assumes that the data is some kind of mix of text and images and as such most of the data can be safely thrown away.  This is a good assumption; usually I just want the text and plots, and DjVu captures those well. PDF generally clownishly captures everything in a scan more or less as a bitmap, or using jpeg’s silly rastered cosine transform.

Why jpeg sucks on text

Yann LeCun, Léon Bottou and Yoshua Bengio were creators of DjVu along with some other guys you’re less likely to have heard of (Patrick Haffner, Bill Riemers etc etc). All three are also  fathers of Deep Learning (along with Geoffrey Hinton before he developed his peculiar fear of ballpoint pens). Leon and Yann are also creators of my favorite little programming language, Lush -the Lisp Universal Shell, where they did much of their pioneering work back in the 90s. While I know enough about programming languages to understand why current year Torch migrated from Lush to Lua to Python, it will always remain one of my all time favorite designs; as interesting in its own way as the K family, and since it never had to do crap like maintain order books …. one of the comfiest languages I’ve ever used. When I retire I’ll probably revive it and use it to power superior robot vacuum cleaners or something. It’s really that good. There’s so much cool shit hanging around in it from their R&D days as well; just mind boggling stuff -like looking at Leonardo’s notebooks. I ain’t even talking about the neural stuff; all of it from the Ogre UI to the codebook editor is genius.


Since deep learning models is all the bugman talks about any more, the older work product of their creators should interest people, at least for historical perspective. It was an important problem: in 1998 the internet was still pretty new and stuff like PDFs didn’t quite work right. We mostly downloaded LZ77/Huffman coded postscript files when we wanted to use the internet for its original purpose of sharing scientific papers. Those were awful. It wasn’t awful because you had to unzip the file before you could look at it, but you did, but  because they were quite large (maybe 4x what PDFs delivered from compiled LaTeX), and the internet in those days was very slow. It would take minutes to download a couple of shitty jpeg files with boobs in them, let alone the 40 megs of javascript that websites now make you download now so they can track you.

At the time DjVu solved an important problem, allowing very good compression ratios and even allowing scanned stuff to be efficiently shared online, potentially making the internet into a super library including all printed books as well as generated net.content. The problem was most operating systems didn’t come with a DjVu reader, but Adobe made sure everyone had a PDF reader. Finding and installing a DjVu reader was a pain in the ass. Browsers in those days mostly couldn’t display either PDFs or DjVu, so that wasn’t even an option.

One of the cool things about DjVu is it internally uses an image format very similar to JPEG2000 for image backgrounds (called IW44). You have probably never seen a JPEG2000 image (unless in a DjVu file), but it’s a fantastic idea using wavelet compression, so if you only get the first quarter of the file you’ll get a pretty nice low-resolution image. It provides a natural way of doing lossy compression; just drop the higher order wavelets. It also compresses better than regular JPEG. The wavelets are further compressed with arithmetic coding which is also a mighty cool idea.

There is another format it used for foreground text (it looks for text) called JB2 which is related to the thing in PDF  which was buffer overflowed on the ipotato by Pegasus. You have to be careful with your document formats; I strongly suspect PDF has more holes like this in it, just because it has so much going on inside. JP2 is cool because it’s a sort of clustering algorithm where it looks for bitmaps which are around the size of characters, then looks for things which are geometrically similar to them; effectively doing a quick and dirty map of pixel clusters into symbols (not necessarily text symbols: the idea is textually agnostic). Then the document is arithmetically code compressed with the symbols.

The arithmetic coding system used is also innovative; it’s called the ZP-coder. It’s similar to other simple run length coding systems in its use of probability tables, but oriented towards decoding performance. It is a shame the ZP-coder isn’t a universal coder, as if it was it might make convincing fake documents based on the corpus in the document (aka do generative prediction the way openai does with neural nets, using a considerably cheaper algorithm). Pretty cool it works well on the wavelets and the text though.

It’s a shame it didn’t catch on better, and there is probably a HBS case study for the full story of why the objectively superior tool failed in the market. It even failed in Internet Archive use, which it was also well suited for. DjVu  still has utility in scanned documents and reading scanned documents. The main problem with it is the problem it had of old: lack of support. Black and white e-book readers like the kindle and the kobo don’t support it natively despite it being just about the perfect format for scanned documents on a limited processor greyscale e-book reader. I personally use a Kobo-Forma rooted with the excellent koreader to get access to the many useful DjVu files I have (basically all my textbooks available on the road). It’s ridiculous that I have to hack a device to get access to physically portable DjVu files, but I suppose scanned books don’t make anybody money.

I’ve long held that most of the knowledge developed since the advent of the internauts is basically anti-knowledge, meaning those scanned books in DjVu are potentially more valuable than all the PDFs in the universe. It would be nice to see it used by more mainstream publishers, but the lack of a DjVu target for things like LaTeX means it probably won’t be. I guess in the meanwhile DjVu is the most punk rock document format.

 

https://en.wikipedia.org/wiki/DjVu

 

Ruins of forgotten empires: APL languages

Posted in Design, J, Lush by Scott Locklin on July 28, 2013

One of the problems with modern computer technology: programmers don’t learn from the great masters. There is such a thing as a Beethoven or Mozart of software design. Modern programmers seem more familiar with Lady Gaga. It’s not just a matter of taste and an appreciation for genius. It’s a matter of forgetting important things.

talk to the hand that made APL

talk to the hand that made APL

There is a reason I use “old” languages like J or Lush. It’s not a retro affectation; I save that for my suits. These languages are designed better than modern ones. There is some survivor bias here; nobody slings PL/1 or Cobol willingly, but modern language and package designers don’t seem to learn much from the masters. Modern code monkeys don’t even recognize mastery; mastery is measured in dollars or number of users, which is a poor substitute for distinguishing between what is good and what is dumb.  Lady Gaga made more money than Beethoven, but, like, so what?

Comparing, say, Kx systems Q/KDB (80s technology which still sells for upwards of $100k a CPU, and is worth every penny) to Hive or Reddis is an exercise in high comedy. Q does what Hive does. It does what Reddis does. It does both, several other impressive things modern “big data” types haven’t thought of yet, and it does them better, using only a few pages of tight C code, and a few more pages of tight K code.

arthur

This man’s software is superior to yours

APL languages were developed a long time ago, when memory was tiny compared to the modern day, and disks much slower. They use memory wisely. Arrays are the basic data type, and most APL language primitives are designed to deal with arrays. Unlike the situation in many languages, APL arrays are just a tiny header specifying their rank and shape, and a big pool of memory. Figuring out what to do with the array happens when the verb/function reads the first couple of bytes of the header. No mess, no fuss, and no mucking about with pointless loops.

Code can be confusing if you don’t drink the APL kool-aide, but the concept of rank makes it very reusable. It also relegates idiotic looping constructs to the wastebin of history. How many more for() loops do you want to write in your lifetime? I, personally, would prefer to never write another one. Apply() is the right way for grown-assed men do things. Bonus: if you can write an apply(), you can often parallelize things. For(), you have to make too many assumptions.

roger-hui

Roger Hui, also constructed of awesomeness

One of the great tricks of the APL languages: using mmap instead of scanf. Imagine you have some big chunk of data. The dreary way most languages do things, you vacuum the data in with scanf, grab what is useful, and if you’re smart, throw away the useless bits. If you’re dealing with data which is bigger than core, you have to do some complex conga dance, splitting it up into manageable chunks, processing, writing it out somewhere, then vacuuming the result back in again. With mmap, you just point to the data you want. If it’s bigger than memory …. so what? You can get at it as quickly as the file system gets it to you. If it’s an array, you can run regressions on big data without changing any code. That’s how the bigmemory package in R works. Why wasn’t this built into native R from the start? Because programmers don’t learn from the masters. Thanks a lot, Bell Labs!

Masters

Fred Brooks, Larry Breed, Joey Tuttle, Arthur Whitney, Eugene McDonnell, Paul Berry: none of these men can be held responsible for inflicting the horrors of S+ on the world

This also makes timeseries databases simple. Mmap each column to a file; selects and joins are done along pointed indexes. Use a file for each column to save memory when you read the columns; usually you only need one or a couple of them. Most databases force you to read all the columns. When you get your data and close the files, the data image is still there. Fast, simple and with a little bit of socket work, infinitely scalable.  Sure, it’s not concurrent, and it’s not an RDBMS (though both can be added relatively simply). So what? Big data problems are almost all inherently columnar and non-concurrent; RDBMS and concurrency should be an afterthought when dealing with data which is actually big, and, frankly, in general. “Advanced” databases such as Amazon’s Redshift (which is pretty good shit for something which came out a few months ago) are only catching onto these 80s era ideas now.

Crap like Hive spends half its time reading the damn data in, using some godforsaken text format that is not a mmaped file. Hive wastefully writes intermediate files, and doesn’t use a column approach, forcing giant unnecessary disk reads. Hive also spends its time dealing with multithreaded locking horse shit. APL uses one thread per CPU, which is how sane people do things. Why have multiple threads tripping all over each other when a query is inherently one job? If you’re querying 1, 10 or 100 terabytes, do you really want to load new data into the schema while you’re doing this? No, you don’t. If you have new data streaming in, save it somewhere else, and do that save in its own CPU and process if it is important. Upload to the main store later, when you’re not querying the data. The way Q does it.

The APL family also has a near-perfect level of abstraction for data science. Function composition is trivial, and powerful paradigms and function modifications via adverbs are available to make code terse. You can afflict yourself with for loops if that makes you feel better, but the terse code will run faster. APL languages are also interactive and interpreted: mandatory for dealing with data. Because APL languages are designed to fit data problems, and because they were among the first interpreters, there is little overhead to slow them down. As a result, J or Q code is not only interactive: it’s also really damn fast.

It seems bizarre that all of this has been forgotten, except for a few old guys, deep pocketed quants, and historical spelunkers such as myself. People painfully recreate the past, and occasionally, agonizingly, come to solutions established 40 years ago. I suppose one of the reasons things might have happened this way is the old masters didn’t leave behind obvious clues, beyond, “here’s my code.” They left behind technical papers and software, but people often don’t understand the whys of the software until they run into similar problems.

aplkeyb

Some of these guys are still around. You can actually have a conversation with mighty pioneers like Roger Hui, Allen Rose or Rohan J (maybe in the comments) if you are so inclined. They’re nice people, and they’re willing to show you the way. Data science types and programmers wanting to improve their craft and increase the power of their creations should examine the works of these masters. You’re going to learn more from studying a language such as J than you will studying the latest Hadoop design atrocity. I’m not the only one who thinks so; Wes McKinney of Pandas fame is studying J and Q for guidance on his latest invention. If you know J or Q, he might hire you. He’s not the only one. If “big data” lives up to its promise, you’re going to have a real edge knowing about the masters.

Start here for more information on the wonders of J.

http://conceptualorigami.blogspot.com/2010/12/vector-processing-languages-future-of.html

Only fast languages are interesting

Posted in Clojure, Lush, tools by Scott Locklin on November 30, 2011

If this isn’t a Zawinski quote, it should be.

I have avoided the JVM my entire life. I am presently confronted with problems which fit in the JVM; JVM libraries, concurrency, giant data: all that good stuff. Rather than doing something insane like learning Java, I figured I’d learn me some Clojure. Why not? It’s got everything I need: JVM guts, lispy goodness; what is not to love?

Well, as it turns out, one enormous, gaping lacuna is Clojure’s numerics performance. Let’s say you want to do something simple, like sum up 3 million numbers in a vector. I do shit like this all the time. My entire life is summing up a million numbers in a vector. Usually, my life is like this:

 (let* ((tmp (rand (idx-ones 3000000))))
    (cputime (idx-sum tmp)))

0.02

20 milliseconds to sum 3 million random numbers enclosed in a nice tight vector datatype I can’t get into too much trouble with. This is how life should be. Hell, let me show off a little:

(let* ((tmp (rand (idx-ones 30000000))))
    (cputime (idx-sum tmp)))

0.18

180 milliseconds to sum up 30 million numbers. Not bad. 60 times worse than I’d like it to be (my computer runs at 2Ghz), but I can live with something like that.

Now, let’s try it in Clojure:

(def rands (repeatedly rand))
(def tmp (take 3000000 rand))
(time (reduce + tmp))

Java heap space
[Thrown class java.lang.OutOfMemoryError]

Restarts:
0: [QUIT] Quit to the SLIME top level

Backtrace:
0: clojure.lang.RT.cons(RT.java:552)
(blah blah blah java saying fuck you java blah)

Oh. Shit. Adding 3 million numbers makes Clojure puke. OK. How well does it do at adding, erm, 1/10 of that using my piddley little default JVM with apparently not enough heap space (@130mb).

(time (reduce + tmp)) "Elapsed time: 861.283 msecs"

Um, holy shit. Well, there is this hotspot thing I keep hearing about…

 

user> (def ^doubles tmp (take 300000 rands))
user> (time (reduce + tmp))
  "Elapsed time: 371.451 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 107.619 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 46.096 msecs" 149958.38785575028 

user> (time (reduce + tmp))
  "Elapsed time: 43.776 msecs"

Great; now I’m only a factor of 20 away from Lush speed … assuming I run the same code multiple times, which has a probability close to zero. Otherwise, with a typedef, I’m a factor of 200 away.

Maybe I should try using Incanter? I mean, they’re using parallel Colt guts in that. Maybe it’s better? Them particle physicists at CERN are pretty smart, right?

user> (def tmp (sample-uniform 300000 :mean 0))
#'user/tmp
user> (time (sum tmp))
"Elapsed time: 97.398 msecs" 150158.83021894982
user> (def tmp (sample-uniform 3000000 :mean 0))
#'user/tmp
user> (time (sum tmp))
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user>

A bit of hope, then …. Yaaargh!

Let’s look into that heap issue: firing up jconsole and jacking into fresh swank and clojure repl processes, I see … this:

I can’t really tell what’s going on here. I don’t really want to know. But it seems pretty weird to me than an idle Clojure process is sitting around filling up the heap, then garbage collecting. Presumably this has something to do with lein swank (it doesn’t do it so much with lein repl). Either way, this isn’t the kind of thing I like seeing.

Now, I’m not being real fair to Clojure here. If I define my random vector as a list in Lush (which isn’t really fair to Lush), and do an apply + on it, the stack will blow up also. The point is, Lush has datatypes for fast numerics: it’s designed to do fast numerics. Clojure doesn’t have such datatypes, and as a result, its numeric abilities are limited.

Clojure is neat, lein is very neat, and I’ve learned a lot about Java guts from playing with these tools. Maybe I can use it for glue code somewhere. I’m not going to be using it for numerics. Yeah, I probably should have listened to Mischa, but then if I had, I’d be writing things in numeric Perl.

 

Edit Add:

Thanks to Rob and Mike for showing me the way, and thanks everyone else for demonstrating my n00bness and 4am retardation

(let [ds (double-array 30000000)]
(dotimes [i 30000000] (aset ds i (Math/random)))
(time (areduce ds i res 0.0 (+ res (aget ds i)))))

"Elapsed time: 65.018392 msecs"

 

I daresay, this makes clojure “interesting” -or at least more interesting than it was a few hours ago. It would be nice if someone had already written some package which makes taking the sum of 3 million numbers a bit less of a chore (a la idx-sum). I mean, what’s going to happen when I have to multiply two matrices together?