Every time a new LLM comes out, I ask it one question:
What is the smallest integer whose square is between 15 and 30?
So far, no LLM has gotten this right.
Eric Neyman
2,885 posts
Professional reference class tennis player. I like non-fillet frozen fish, packaged medicaments, and other oily seeds.
Joined June 2013
- Replying to @gcolbournIn my parlance, -4 is unambiguously smaller than 4 (and -10 is smaller still). I *think* this is the usual interpretation?
- Replying to @ericneymanThe answer (plug into rot13.com): Gur nafjre vf artngvir svir. Vs lbh guvax artngvir svir vfa'g "fznyyre" guna sbhe, abgr gung YYZf qba'g trg gubfr rira vs lbh ercynpr "fznyyrfg" jvgu "yrnfg".
- Why is P(Newsom | not Biden) like 50%? That seems way way way too high to me?Biden down to 59% odds of being the dem nominee now
- Replying to @ValsTutorYeah, I'm definitely not claiming that my question is evidence of LLMs being worse than humans. Humans also get this question wrong, as you point out!
- I was at the SF Exploratorium yesterday, which featured an Anthropic-sponsored AI exhibition. It had an exhibit that downplays AI x-risk by comparing x-risk fears to fears about earlier technologies. @AnthropicAI are you guys aware that this exhibit is being run in your name??
- Oh man, so many Bayes points for the Pentagon pizza theory! x.com/PenPizzaReport…As of 6:59pm ET nearly all pizza establishments nearby the Pentagon have experienced a HUGE surge in activity.
- Guys, I don't often ask you to retweet, but please retweet this. Swap Your Vote *does not have enough safe state voters to match all its swing state voters!* Swap Your Vote (link below) matches swing state voters who prefer Harris to Trump but don't want to vote for Harris...
- This wording from the Chamber of Commerce poll on SB 1047 (the California AI bill) is INSANE. I follow polling regularly and have almost *never* seen such biased language. Even the most egregious push polls usually make some token pretense of neutrality.
- Here's a great example of prediction markets being useful. I couldn't really tell from news reports how likely Assad was to lose power. So I went on Polymarket. Turns out: very likely. I wouldn't have guessed!
- Replying to @arivero and @gcolbournTo be clear, I never claimed that -4 was the right answer :)
- Just wrote a new blog post, about a sport called marble racing! In particular, I found that Jelle's Marble Runs (@Jellesmarbles) isn't entirely random -- marbles have a level of "skill"!
- Featured in the latest Money Stuff column by Matt Levine, which I think is pretty cool!!
















