I recently demonstrated GPT4 to my spouse's 101-year-old grandfather, who remains in excellent health and has a sharp mind.
Following my demonstration, he paused thoughtfully and then said something I will remember — “This technology instills hope for our future. It's high time
Joel Kronander
1,527 posts
I try to learn something every day✨Former Head of ML at Nines, Former Head of Synth data at Scale AI 💫
Palo Alto, CA
Joined March 2008
- Stanfords DSPy is the best high level LLM programing framework I have seen this far. Langchain never resonated with me; despite being an early LLM framework, its design and abstractions felt overly complex. DSPy, on the other hand, is a huge step in the right direction. DSPy
- An interesting new Nature paper compares fMRI recordings with activations across layers in a language model, and find evidence of correlations. The study seems to suggests that brain regions located at the top of the language hierarchy, responsible for nature.com/articles/s4156…
- Six years ago, Geoffrey Hinton asserted that AI would take over radiology within five years, suggesting we cease training radiologists. Was he correct? The situation is more complex than simply being right or wrong. While AI has surpassed radiologists in certain diagnostic
- Deep learning is typically bottlenecked by memory not compute ⚡️Flash Attention ⚡️ optimizes transformers, like GPT, to minimize costly GPU memory fetches and achieves impressive speedups of 2-4x, 5-20x less memory intensive, and enables scaling to longer arxiv.org/abs/2205.14135
- Self-consistency is underrated for improving accuracy for LLMs in a range of reasoning and arithmetic tasks. It works with any off-the-shelf LLM, eg GPT3 variants, and also provides estimates of how certain the LLM is of the provided answer. arxiv.org/abs/2203.11171 Takeaways👇
- A simple trick to make LLMs “calibrated” — ie “to know when it doesn’t know something” — is to reformulate the answers as a single word or a short phrase, and look at the predicted logprobs of the word. As LLMs are trained to predict the probability of the next token, they are
- 🤖️LLM can self-improve 🧠 1) Self-consistency boosts reasoning skills by sampling multiple paths & finding the most consistent answer But more samples = more comp. requirements. 💻 2) but we can train better LLM with self-generated solutions from 1) arxiv.org/abs/2210.11610
- What it you had trained a model to play legal moves in Othello by predicting the next move, and found that it had spontaneously learned to compute/represent the full board state in it's weights - an emergent world representation? That's just what this thegradient.pub/othello/
- Insightful paper that succinctly covers essential high-level knowledge to keep in mind regarding LLMs: - Large language models (LLMs) predictably improve with increasing investment, but many key behaviors emerge unpredictably. - LLMs often learn and use representations of the
- ✨Neat LLM trick for 📈 math & logical abilities ✨ Improves on Chain of Thought (CoT) prompting by 1) Replace natural language, step by step instructions, in CoT examples with commented, stepwise, python code. 2) Run the code Several recent papers on this (see refs below⬇️)
- Want to know a simple trick for LLMs to generate more plausible long documents, breaks out of repetition better, and more reasonably truncate low probability tokens? Learn about LLM truncation sampling! Some takeaways from arxiv.org/abs/2210.15191 👇🧵
- LLMs suffer from overconfidence and poorly calibrated uncertainty estimates However, self-consistency, where on samples multiple paths & finds the most consistent answer, seems to offer a practical solution. Interesting figure from page 4 in "LLMs can self improve" paper









