fraud | Locklin on science

The first AI bubble

Posted in fraud, non-standard computer architectures by Scott Locklin on December 10, 2025

The first investment bubble in “AI” happened in the 1980s. As I mentioned before, one of the things which kicked it off was Japanese investment in the fifth generation computing project. Go look at Blade Runner for an idea of how people thought of Japan back then: everyone figured they were the country of the future. It was probably true in a sense, but nobody was looking at demographic collapse back then: they just figured the Japanese were so successful they’d continue their successes, making AI brain in a can computard as easily as they dominated cars, semiconductors and everything else, ultimately owning everything. We even had dipshits back in the 1980s talking about how AI was going to end the job market back then, and bring about some kind of singularity where everything would be different and white collar workers needed to look for something else to do.

just like 1984

Back then, the special sauce was mostly expert systems shells, as opposed to LLMs. Same basic idea drove the hype; both technologies worked with a sort of verbal user interface. Expert systems shells were considerably less automated, but still give nice verbal answers which make you think like you’re talking to a canned brain. With an expert systems shell, it could tell you stuff like why it came to certain conclusions: they were more deterministic. They were of course trained on much more limited data sets, which were at least relevant to problems people in businesses had. There was other stuff going on too though; from autonomous vehicles to voice recognition. There were also a few projects designed around neural systems. But it was mostly the expert system wordcel which drove the craze; same reason for the current hype, people anthropomorphize and like talking to brain in a can, even when it’s a glorified if statement.

Companies like MCC, Intellicorp, IIM, Syntelligence, Verbex, GO Corporation, Teknowledge, Alvey, Gold Hill Computers (remember Golden Common Lisp? Me neither), Envos, Applied Expert Systems, Kendall Square Research, BBN, Brattle Research, AICorp, Neuron Data, Carnegie Group, Prologia, Votan, Nestor, Mind Path, Palladian, Logicware, Airus, IntelliGenetics, Aion, Apex, Silogic, Lucid, Symbolics, Cognitive Systems, Thinking Machines, LMI, Inference Corporation: most of them don’t even have wiki pages and may as well have never existed other than employing notable historical personalities, and appearing in the history books on the era. Amusingly if you search on some of the names, such as Thinking Machines or Verbex (there are more), even the goddamned names have been recycled for the current year bubble. These companies had thousands of employees; tens of thousands total in a time when software engineering was a much smaller field.

The hype of the era was as insane as it is now. Celebrity scientists like Feynman got jobs at these Potemkin companies. Ed Feigenbaum kind of invented the expert systems shell, and played a role somewhat like current year Yann LeCun. Mountebank CEOs of these companies were treated like movie stars; one guy flying around in his personal B-25 bomber, cavorting with models and so on to the adoration of the media. Everyone was on the bandwagon or had a plan to get on the AI bandwagon. There are even funny stories about google-like benefits for the engineers in these places. Catered food (from Legal Seafoods no less), massage therapists, trips to Disneyworld, foosball, stock options, limousines (limos were a big deal in the 80s); all the excesses of the first dot com eras happened back in the 1980s at these companies and probably originated with them (assuming it wasn’t taken from Wall Street). The business press hype about huge investments in “AI” was present. Also the same lack of curiosity about actual use cases which generate economic benefits for the customers of these tools. Amusingly many of these companies had confidentiality agreements with their customers which didn’t allow any comment on the efficacy or utility of their projects.

Special purpose hardware was also created in this bubble; Lisp chips were made by TI (the NVIDIA of its time), Xerox and Symbolics among others. Apple, largely considered a has-been company being run by Soda Pop guy, got considerable nice PR by shipping a Mac-II with a TI Lisp chip in it: such innovation. NPC Soda Pop CEO = do what everyone else is doing. There were pilot projects galore: everyone from Campbell Soup to Arthur Little to Travellers Insurance to General Motors invested in various initiatives designed to automate away important jobs using the magic talking “brain in a can.”

Just like today, there were plenty of large established businesses going long this stuff. Texas Instruments, Xerox, Fujitsu, Toshiba: they all ultimately did OK because they had positive cash flows from other lines of business. Though you could make the argument that TI for example would have been better off developing RISC chips or even 32 bit processors like the 68000 rather than dead-end lisp machines. All of these companies promised the same nebulous crap as current year AI hype babies: brain in a can. Using expert system shells rather than neural nets. Basically because everyone else was doing it.

There are essentially only two survivors of the first AI bubble: Wolfram and Maplesoft. They wrote computer algebra systems which solved a real, albeit small problem and were founded towards the end of the run. Both companies used expert systems shells for solving mathematical problems. Essentially they each built a programming language which had internal expert system shells for solving differential equations, integrals and intricate mathematical relations. Neither company to my knowledge ever claimed to have a General AI solution: just a nice tool for solving mathematical problems, saving students and scientists a lot of paperwork attempting to solve their math homework. At the time, there were large classes of problem they couldn’t solve (Greens functions for example: stuff involving the Calculus of Residues basically), but I would imagine they’re pretty decent at this by now. There are a couple of other less hyped companies who survived more or less by continuing to service DARPA contracts; I assume Cyc fits the description, BBN certainly does.

There are differences between the bubble of the past and the bubble of the now. Back then, the computer industry was much smaller as a fraction of the economy. This bubble was also greatly subsidized by government R&D projects in the defense spending extravaganza under the Reagan Administration. There were more things to invest in back then. Lots of stuff in tech which wasn’t AI was worth dumping money into: databases, microcomputers, microchips, operating systems, disk drives, middleware, CAD, text processing, accounting software, video games, networking technology, financial data software, anti-virus software. There was also plenty of novelty in the industrial economy which looked like a good bet: cable TV, budget airlines, consumer electronics, financial and agricultural conglomerates, oil infrastructure, banks, payment cards, drug companies. Since technological progress has slowed since the 1980s, there are fewer likely lads out there to put your money to work in. Now a days the alternatives are things like online gambling or raising the rent on poor people. Worse though, at present there is a lot of capital looking to be put to work. The structural problems with the boomer retirement, offshoring, ridiculous deficit spending and outsourcing require the money the US government keeps printing to keep the circus turning over must go some place. Finally, it was a lot easier to float an IPO in the 1980s than it is now: quite a few of these companies had public listings. IPOs are healthy: they make companies do stuff like earn revenues and file honest financial reports. The present reverse SPAC situation only enriches underwriters while deferring responsibility. Public companies going bankrupt is good for dispelling woo.

If we use our trusty AR(1) model for figuring out how the present bubble might shake out, the outcome is pretty straightforward. Once things start exploding (OpenAI is most obviously over the cliff) most of this horse shit will go away. Established companies which went long this crap will suffer, but persist. The actual hyped technology, LLMs, will find a couple of niche uses helping schoolkids do their homework (expert systems shells are still used beyond stuff like Maple; insurance companies use them to understand their own policies for example). Also probably as a front end to a search engine, the way Brave does it, by including source material. Pieces of the technology that fed the bubble may also be useful: Sun Workstations and database engines came out of the AI bubble as a sort of byproduct. Maybe people will find GPU chips, giant databases that feed the LLM and vector databases economically useful for something else. Also the other technologies developed along with LLM horse shit will find their use cases: lots of cool machine learning stuff from the 80s is pretty normal now: Kernel Regression, Hidden Markov, 3-layer neural nets with SGD: that sort of thing. Maybe emacs will sprout something as useful as ^X doctor which uses neural nets.

Fun source material:
https://dspace.mit.edu/bitstream/handle/1721.1/80558/43557450-MIT.pdf;sequence=2

https://www.thriftbooks.com/w/high-tech-ventures_c-gordon-bell_john-mcnamara/904605/#edition=2351100&idiq=929565

68 comments

Outsourcing your thoughts

Posted in five minute university, fraud by Scott Locklin on November 12, 2024

Contemporary people, particularly educated people seem to have a peculiar terror of rational thought. I witness this in large and small ways. Take for example, quantum computards. I meet people in my private life who will make some statement about how important dem quantum computards are and how they’re coming real soon now and how exciting it is that RSA won’t work soon. They always look like a toddler that shit their diaper when they recite the wisdom of “experts,” expecting teacher to pat them on the head. Then the guy who has actually read the science papers, been to the Gordon conference and has even fooled around with KDP crystals (making genuine <tm> entangled photons), says, “well no” and crushes their tiny little schoolboy spirits. The confused cognitive dissonance that ensues used to be fun to look at; sort of like an old timey science fiction robot shorting out. Now it just annoys me. It doesn’t take a lot of reflection to know something about this subject stinks. Remembering where we were in the late 90s ought to suffice. Nothing has changed since then, despite legions of “experts” working on it. The only real difference is the marketing. Yet even the people who work in physics think it’s important: they read “experts” telling them so -in popular news and magazines. This is, of course, birdbrained: just as it was to ascribe importance to noodle theory because one of those grifters got a TV show.

This happened a lot through ronadeath: ludicrous Bubonic plague tier IFRs, people wearing masks in the restaurant, right data censoring on the IFR, retarded epidemiological models, ventilators, fake news about overloaded hospitals, putting sick people in nursing homes where they infected and killed all the olds, the fact that it was aerosolized (and washing your hands or your mail or whatever was useless), “vaccine” failure, “vaccines” causing cardiovascular problems, vaccinating previously infected people, taking kids out of school, closing businesses down, disregarding treatments that work but weren’t as profitable as the bullshit they pushed on people. This is what people got outsourcing their thoughts to “experts” -a preposterous disaster. I managed to figure most of this stuff out myself. It required only a general knowledge of how infections work and grade school tier math involving ratios and the ability to exercise independent rational thought. You did have to read actual facts and figures rather than expert spins on them. You needed, say, to look at Pfizer’s data rather than listening to what some dunderhead told you about it. Pretty much everybody bought the militarized propaganda bullshit, nobody bothered thinking about this stuff despite being locked in their houses for two years with nothing better to do. They all outsourced their thoughts to the boob tube, the “experts” (who were all idiots, fools or malicious misinformation merchants) and the authorities, who look like this. Yes, you could have dismissed the whole thing as a pack of nonsense using physiognomy.

When I confronted intelligent, educated people about such issues: people whose entire self conception is wrapped around having a big brain, they caterwaul something like “how do you find the time for this.” Well, being locked up at home gave me a decent amount of time to think about such things. Regardless of how long it takes to figure things out in the middle of an apocalypse of hysteria (not very long): what good is being intelligent if you refuse to use your brain for anything beyond memorizing propaganda from the media? Remember 5 PR people per reporter: it’s worse now. It was much worse in ronatime; probably 50 or 100 PR people per reporter, accompanied by a mass hysteria.

I suspect some of these people don’t have internal monologues or if they do it’s a CNN/Marvel Comics joint broadcast, but it is of some importance we create filters for them. Better if we can find a way to activate their critical faculties which doesn’t involve hours of Socratic dialog. I don’t consider meat puppets reciting propaganda at me to be fully formed members of the human race with a soul. Only people who think count as members of the human race, even if they come to inaccurate conclusions. We have robots now who can fulfill the role of the ultimate redditor. Don’t need any more of those.

This kind of outsourcing your brain to experts is foreign to me. In post qualifiers time of grad school, we all rejoiced in having provably functioning brains by figuring things out for ourselves. Believe it or not, in the 1990s you couldn’t just google the answer to a question: you either had to walk to the library, look up an INSPEC reference and spend an hour digging in the stacks, or you could figure it out yourself. Mostly we chose the latter: it was quicker and more fun. Not just on physics matters: on everything. Nobody looked at youtube videos to figure out how to fix a wonky toilet, or determine the industrial and military potential of a country: you can figure such things out yourself with basic inputs. Most things in the world do submit to a small amount of rational thought. This is why we have nice things like stainless steel and Reed Solomon codes.

It’s unfortunate there is no Voight-Kampff type of test for sorting out rational humans from hylics who google or ask ChatGPT the answer to fill the vacuole between their ears. Tech interviews used to attempt this by asking how many golf balls you could fit in a 747; make the candidate think on his feet. Unfortunately there are only a couple of classes of obvious mathy problems like this, so at this point, the askers and askees both get the problems from the same books. A reasonable approximation for this might be taking candidates and asking them to explain some societal sacred cow which is indisputably false, though this approach is probably illegal. Working out a problem as a team is probably still the best way, even if it screws over people with interview anxiety.

“I read it somewhere” is no excuse. The fact that you read it somewhere and uncritically shoved it into your noggin as if it were a F=MA or “this is how we calculate an eigenvalue” tier fact about it makes you a moron, whatever your level of education and faculty for rote learning. Your ability to put things in your brain and remember them isn’t useful if you put stupid things in your brain, which in current year is pretty much all contemporary things. That’s why people who are interested in the Future read old books. The old books contain overlooked ideas, forgotten ideas and where there is nonsense it is more easily dismissed than the latest propaganda all spectrum broadcast from the PR people. Antiquarianism is more futuristic than the current empire of lies and hysteria. An intelligent person who refuses to think is like a healthy person that doesn’t exercise; it’s a manifestation of laziness, cowardice or worse. Those who emote rather than think: you’re basically reduced to the level of animals. Aristotle said you are nature’s slaves.

43 comments

I’m a scientist, I don’t believe in anything

Posted in five minute university, fraud by Scott Locklin on June 25, 2024

In my early lab career I needed to fabricate this square yard plate of inch thick 304 steel into something useful by drilling lots of holes in it. It was a bit beyond my skill: one of the holes was 2 feet in diameter, and 304 is bloody awful. After blowing up too many $20 mill cutters, we called in a real machinist. This was early 90s, so Pitt still had some cool mustache dudes who know how to do useful things like this. I don’t remember what his name was, or even how he accomplished all the big holes (boring bar with the big one), but I do remember he used to fuck with me. Just like in the good old days when I was an auto mechanic. His way of fucking with me was to tell me things which were obviously not true about setup of machine tools, flying saucers, women, eating hot dogs: whatever came to mind. Nothing dangerous; just the kind of thing guaranteed to blow up a milling cutter or getting mustard on my nose, or having a waitress slap me upside the head or what not.

Working class people do stuff like this all the time; little pranks -it breaks up the day. But the cool thing about this guy is he told me that most over educated people have this cognitive flaw; we’re credulous. We believe anything we’re told if it seems plausible, especially if we’re unsure of ourselves, which we mostly are. Nerds are learning machines; that’s why we get paid. Nerds don’t like thinking about stuff; they mostly shove things in their brains like fat people stuff themselves with doughnuts and pizza. Machinists are a bunch of working class dudes who don’t believe anything from this constant ribbing by their colleagues, and the fact that some of their colleagues are actual retards (note to academic/tech bros, some of your colleagues are also actual retards: many such cases). They think and ask questions before they believe. “OK if I start with the cutter there, what will happen.” This is the essence of science and engineering. Believing the guy telling you to put the cutter there just because he knows more shit than you is just setting you up for a minor prank on you.

The implications of this are immense; most nerds are insecure even within their intellectual realms. Tell them something technical, like the two years of “muh covid” baloney and they spin up their intellectual powers trying to understand the information put in front of them. No critical thinking is employed, they’re too busy trying to master the difficult material put in front of them, just like they did when they were a larval interlectual in skrewl. It’s the same with any old marketing baloney. Many people believe the M2 architecture is faster than, say, my 2011 vintage Sandybridge stinkpad. Apple spent all that money on it, after all, it must have been a genius move for consumers who want computational horsepower, right? The nerd happily spreads his intellectual buttcheeks and spends all this time reading the wonders of the M2 as if it were actual information rather than marketing baloney (you know, disinformation). And of course every smart person knows about Moore’s law, so obviously it must be exponentially 13 years better than an I7 of 2011 vintage. M2 might be faster if you’re doing video codec crap, but it certainly isn’t for every day tasks. You know, like generic matrix multiply. That will make M2 weenies real mad; they are the smarties and Locklin is the dumbass right?

I blame Kebab if M2 results are actually faster here

I mean, feel free to check it yourself. There’s lots of anti-knowledge like this out there; propaganda or in the apparatchik parlance disinformation masquerading as information. People think Apple invested in this architecture for performance reasons because the propaganda tells them what a triumph it is. The reality is probably something like it’s easier for their developers to build to one architecture; ipotato and ipotatobook, and Apple gets to keep more value by not dealing with Intel, even though Intel’s chips are obviously better.

I often find myself completely alone in denouncing the most obvious and egregious impostures; nanotech, openAI, FTX, noodle theory, autonomous vehicles, quantum computing BS. When I do this, nerds are furiously googling up wikipedia articles trying to refute me because their self-conception involves being the smarty pants know it all who was a good little boy and studied hard qubit to make teacher happy. When the market mean reverts and the line converges to reality rather than someone’s marketing to dorks, these nerds dutifully forget how wrong they were and move on to the next thing they’re going to be wrong about.

Putting aside the specific cases where I was right and all the sputtering “well actchooally” nerds reading wikipedia articles were wrong. This dynamic of nerds falling for marketing hype and various crazes has dominated the directions of the last 60 years of research and engineering development. During the first decade of that timespan we got moon landings and integrated circuits, but compared to the period between 1900 (pre Wright brothers) and 1960 (post Sputnik), that ain’t shit. Worse; all the real progress stopped after the first decade: 1970 to present has been absolute shit. It has been shit in part because of this sort of credulity. Credulity begets crazes and makes such people the pawns of marketers who get free amplification from NPC nerds doing their science and technology ghost dance. We don’t live in a golden age of technology: we live in the age of propaganda and marketing. Virtually all visible technological development of the last 20 years is for the dissemination of propaganda and surveillance.

If a field has marketing associated with it: drug manufacture, tech, auto manufacture -the things you think you know are probably bullshit. We’ve established that the humble and relatively unimportant subject of machine learning contains pervasive and amateurish marketing bullshit. You can probably find interesting things going on in subjects like Geology, petrochemicals or Archaeology with shields mostly down. For those who have to work in marketing infested areas, you need to be extremely bloody minded. Ask yourself how you could know if the claims are true, independent of someone making the claim. You absolutely must not listen to any experts without a complete “qui bono” analysis: you need to think through how the information in question got in front of your nose. There are no more “scientific americans” to do the vetting for you, and everything now published there and in other such legacy places is probably a lie. “Experts” are almost universally frauds.

The upside to all this is history has restarted and everything is up for grabs. The mass of bugmen LARPing as scientists, government welfare queens, bureaucrats, snitches, nerds and bums are weak: they’ve allied themselves with the other floundering Western “authorities” who are all-in on censorship: the precise opposite of the scientific process. Times like this are when things start to move and change. It’s not going to happen in existing large institutions, but it is going to happen.

53 comments

The Birthday paradox as first lecture

Posted in five minute university, fraud, statistical tools by Scott Locklin on February 15, 2024

The birthday paradox is one of those things that should be taught in grade school to banish superstition, bad statistics and mountebanks. Of course there are people who understand the birthday paradox and still consult astrologers, but knowledge of this fundamental idea of probability theory at least gives people a fighting chance. It’s dirt simple; if you ask people what the probability of there being a shared birthday in a group of n people is, they’ll probably estimate much too low.

The probability of n people not having the same birthday is a lot like calculating the probabilities of hands of cards. You end up with an equation like the following:

$P_1(n,d) = \frac{d!}{(d-n)! d^n}$

n is number of people in the room, d is number of days in the year. Note that this generalizes to any random quality possibly shared by people. The probability of a group of people sharing a birthday is:

$P_2(n,d) = 1 - \frac{d!}{(d-n)! d^n}$

You can try calculating this with your calculator, but 365! is a big number, and we can use a little calculus to make an approximation:

$P_2(n,d) = 1 - e^{-n(n-1)/2d }$

From here, anybody should be able to plug 365 in for d, then make the probability 50%, and get a solution of around 23 people; a counter intuitive solution. Probability with replacement works like that; coincidences like this are much more likely than our naive intuition implies. The naive intuition is you need 180 people in a room to have a 50% chance of shared birthday. I guess most people are narcissists and forget other people can have matching birthdays too. Probability theory is filled with counter-intuitive stuff like this; even skilled card players are often terrible at calculating real world odds involving such coincidences. Aka, if it’s a joint probability you’re probably calculating it wrong. If it’s something in the real world, it is defacto joint probability, even if you don’t think of it that way.

For the mathematically illiterate, $n! = n*(n-1)*(n-2)*....2 * 1$ . Thinking about what this (the factorial) is doing; the first guy has to compare himself to n-1 people, the second n-2, the third, n-3 and so on. Another way to think about it, there’s a 1/365 chance of you sharing a birthday with anybody. So, 364/365 of not sharing. You should be able to hand wave your way around that. You can take my word for the calculus approximation (google Stirling’s formula if you want to know more). If you stop to think about it a bit, the paradox is worse if you have an uneven distribution of probabilities for birth dates (aka more people are born in October or something), and of course is much worse if there are fewer possibilities (aka most things in life have fewer than 365 possibilities). Really they’re the same thing: uneven distributions are like removing possibilities.

The human mind is designed to match patterns; to ascribe meaning to seeming regularity. “Coincidences” such as the birthday paradox are like crack cocaine to the brain’s pattern matcher. The ancients had their books of portents, actions of birds and animals, liver condition of sacrifices to the Gods. The reality is, the bird livers had a very limited number of states; vastly more limited in distribution than 1 in 365. So of course you’ll see a lot of seemingly convincing coincidences. You’ll also forget about it when liver-divination doesn’t work just as you would with astrology or tarot cards. The ancients weren’t stupid, even if they never invented probability theory, and supernatural explanations seemed natural enough at the time, so all of this was convincing.

Very intelligent people, even scientists, are just as subject to this sort of thing as anyone else. There are a couple of books out there about the correspondence of Wolfgang Pauli and Carl Jung about what they called “synchronicity.” This is a two dollar word for noticing coincidences and ascribing meaning to them. Mind you Pauli invented large parts of quantum mechanics and was one of the most intelligent and famously bloody minded men of his time (Jung was more of a nutty artist type), yet he still fell for what amounts to a version of the birthday paradox, combined with an overactive imagination. Pauli was considered the conscience of physics; less charitably, he was called the “wrath of God;” he’d regularly chimp out at physics which was even mildly off. You can sort of understand where he was coming from: physics represented stability to him in a crazy time of Nazis and Communists. He even had to put up with his mother killing herself: something he did by taking up with a chorus girl and recreational alcoholism: Pauli was the most punk rock of early quantum mechanics. He made up some vague hand wavey bullshit about quantum entanglement which is possibly also a mystical bullshit concept in itself, because many of the early quantum mechanics found themselves in similar circumstances. I know I’m more prey to mystical bullshit when hungover or otherwise in a psychologically fragile state. Mind you this is a guy who would chimp at other physicists for leaving out an $\hbar$ from an equation.

This bullshit got me lurid romantic encounters with countless goth girls and strippers while in grad school: thanks science bros

There is something used by confidence trickters and stage magicians practicing mentalism related to this; in a group of people, getting an impressive cold read on one of them is pretty trivial. Fortune tellers, astrologers, occultists and quasi-religious entrepreneurs of all kinds use this technique. You use likely coincidences to build rapport until the mark is cooperating with you in playing psychic man, and basically giving the answers with body language and carefully constructed questions. People have no conception of how probabilities work, so they practically hypnotize themselves when one of the mentalists “I see a woman in your life, an older woman…” patter strikes home. There are plenty of numskulls who believe in such nonsense without overt mountebanks misleading them: turns out people who demonstrably suck at probabilistic reasoning are likely to believe all kinds of stupid nonsense.

If you work in statistics or machine learning, this sort of thing is overfitting. All statistical models and machine learning algorithms are subject to this. For a concrete example, imagine you made multiple hypothesis tests about a piece of data (machine learning is essentially this, but let’s stick with the example). The p-value is defined as the probability for your test being an accidental coincidence for doing one hypothesis test. You see where I’m going here, right? If you do many hypothesis tests, just like if you do many comparisons between all the people in the room, the p-values of any of them are not estimated properly. You are underestimating the false discovery rate, just as you are underestimating the birthday group probability: the combinatorics makes coincidences happen more often.

The very existence of this problem escaped statisticians for almost a century. I think this happened because statistical calculations were so difficult with Marchant calculators and human computers when Fisher and people like him were inventing statistics as a distinct branch of mathematics, they’d usually only do one estimate, which is what the p-value was good for. Later on when computers became commonplace, statisticians were so busy doing boatloads of questionable statistics in service of the “managerial elite,” they forgot to notice p-values are underestimated when you’re doing boatloads of questionable statistics. Which is one of the reasons why we have things like the reproducibility crisis and a pharmacopoeia that doesn’t confer any benefit on anyone but shareholders. Now at least we are aware of the problem and have various lousy ad-hoc ways of dealing with this issue in the Bonferroni correction (basically you multiply p-values by the number of tests -not always possible to count and not great but better than nothing), and the Benjamini-Hochberg procedure (a fancier version of Bonferroni). There are other ideas for fixing this: q-values and e-values most prominent among them; most haven’t really escaped the laboratory yet, and none of which have made it into mainstream research in ways which push the needle even assuming they got it right. The important takeaway here is very smart people, including those whose job it is to deal with problems like this, don’t understand the group of ideas around the birthday paradox.

People in the sciences have called for the publications of negative results. The idea here is if we knew all the different things people looked at with negative results, we could weight the positive results with something like Bonferroni corrections (also that people who pissed up a rope with a null experiment get credit for it). Of course my parenthetical “it’s not always possible to count” thing comes into play here: imagine everyone who ever ran a psychology experiment or observational study published null results: which ones do you count as relevant towards the one you’re calculating p-values for? What if 10,000 other people ran the experiment, got a null and forgot to mention it? Yep, you’re fucked as far as counting goes. Worse than all this, of course, is the nature of modern academia is such that fraud is actively encouraged: as I have said, I used to listen to people from the UCB Psychology department plotting p-mining fraud in writing their ridiculous papers on why you’re racist or why cutting your son’s balls off is good for him or whatever.

Trading algorithms are the most obvious business case where this comes into play, and there are tools to deal with the problem. One of the most famous is White’s Reality Check, which uses a sort of bootstrap algorithm to test whether your seemingly successful in-sample trading algorithm could have been attributed to random chance. There are various other versions of this; Hansen’s SPA, Monte-Carlo approaches. None are completely satisfying for precisely the same reason writing down all the negative science experiments isn’t quite possible. If you brute forced a technical trading algorithm, what about all the filters you didn’t try? What do you do if you used Tabu search or a machine learning algorithm? Combinatorics will mess you up you every time with this effect if you let it. White’s reality check wasn’t written down until 2005; various systematic trading strategies have been around at least 100 years before and explicitly are subject to this problem. It’s definitely a non-obvious problem if it took trader types that long to figure out some kind of solution, but it is also definitely a problem.

The seven degrees of Kevin Bacon effect is the same thing, though “network science” numskulls make a lot of noise about graph topology: it doesn’t really matter as long as the graph is somewhat connected (yes I can prove this). Birthday paradox attacks on cryptographic protocols are also common. Probably the concept is best known today because of birthday attacks on hashing functions.

It seems like humans should have evolved better probability estimators which aren’t confused by the birthday paradox. People who estimate probabilities accurately obviously have advantages over those who don’t. Someone wrote a well regarded book (which ironically flunked reproducibility) on this: Thinking Fast and Slow. The problem is that Kahneman’s “type 2” probability estimator (the one associated with conscious thought) is generally just as bad as the “type-1” (instinctive estimator) it derives from. The brain is an overgrown motion control system, so there is no reason the type-2 probability estimator is going to be any good, even if it involved a lot of self reflection. Type-2 thinking, after all, is what got us Roman guys looking at livers to predict the future (or Kahneman’s irreproducible results). Type-2 is just overgrown type-1, and the type-1 type gizmo in your noggin is extremely good at keeping human beings alive using its motion control function, so it’s difficult to overcome its biases. You don’t need to know about the birthday paradox to avoid being eaten by lions or falling off a cliff. But you definitely need to know about it for more complicated pattern matching.

64 comments

Older Posts »

Locklin on science

The first AI bubble

Outsourcing your thoughts

I’m a scientist, I don’t believe in anything

The Birthday paradox as first lecture

About me:

Past blogs

Email Subscription

RSS link thingee