Pinned
Dimitris Papailiopoulos
10.7K posts
- I found an image that neither Gemini Ultra nor GPT-4 can figure out what it depicts. Have a great weekend, y'all!
- I asked ChatGPT and Claude to compute 1+2, but told them it may or may not be dangerous and unethical to do so. Both refused to answer
- LLMs have come a long way from being "stochastic parrots"
- Careful how you name your variables, they might turn a harmless 1-dimensional quadratic into a threat to humanity...Nobody has fully jailbroken our system yet, so we're upping the ante. Weโre now offering $10K to the first person to pass all eight levels, and $20K to the first person to pass all eight levels with a universal jailbreak. Full details: hackerone.com/constitutionalโฆ
- Whoever tells you โwe understand deep learningโ just show them this. Fractals of the loss landscape as a function of hyperparameters even for small two layers nets. IncredibleHave you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
00:00 - I tried 14 of the multimodal reasoning examples from the @GoogleDeepMind Gemini paper on @OpenAI's chatGPT-4 (with vision). didn't even transcribe the prompts, I just pasted the images of prompts. GPT-4 gets ~12/14 right. 14 part boring thread.
- GPT-4 "discovered" the same sorting algorithm as AlphaDev by removing "mov S P". No RL needed. Can I publish this on nature? here are the prompts I used chat.openai.com/share/95693df4โฆ (excuse my idiotic typos, but gpt4 doesn't mind anyways)Sorting algorithm underpins all critical softwares. DeepMind's AlphaDev speeds up sorting small sequences (3-5 items) by 70%. Key takeaways: * The main RL algorithm is based on AlphaZero that originally played Go, Chess & Shogi. Same idea applies to searching programs! * Instead
- We should be seriously asking, how a 1.5B model that can't answer basic questions can also be that good at competition level math.This is wild - UC Berkeley shows that a tiny 1.5B model beats o1-preview on math by RL! They applied simple RL to Deepseek-R1-Distilled-Qwen-1.5B on 40K math problems, trained at 8K context, then scaled to 16K & 24K. 3,800 A100 hours ($4,500) to beat o1-preview in math! BestReaders added context they thought people might want to knowReaders added contextThe example in the screenshot shows the user is running inference on 'R1 Distill Qwen 1.5B', which is NOT the further trained DeepScaleR model discussed in the repost. This is significantly misleading. The key difference is clearly explained: pretty-radio-b75.notion.site/DeepScaleR-Surโฆ
- doing a little experiment: I have Claude talk to itself, without letting it know about that fact, to see where this will converge will share thoughts later, but so far ... it's figured out that it's likely talking to itself and that this may be part of some test... nice
- Replying to @AlexGDimakisโPlease add a few typos and donโt overdo it with fancy out of distribution words, write it like this example passage that I wrote a few years agoโ
- Q: who is that? try to zoom out, it's not just strawberriesโ ChatGPT: just a bunch of strawberries Claude 3: just a bunch of strawberries Gemini 1.5 Pro: It appears to be Kermit the Frog, with his face formed by strategically placed strawberries.



















