Introduced basics of: P & T in ChatGPT, very deep learning, meta learning, neural distillation, GANs, etc. Co-authored most-cited AI paper of 20th century
Congrats to @nvidia, the first public $4T company! Today, compute is 100000x cheaper, and $NVDA 4000x more valuable than in the 1990s when we worked on unleashing the true potential of neural networks. Thanks to Jensen Huang (see image) for generously funding our research 🚀
The #NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science. It's mostly about Amari's "Hopfield network" and the "Boltzmann Machine."
1. The Lenz-Ising recurrent architecture with neuron-like elements was published in
DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system.
The #NobelPrize in Physics 2024 for Hopfield & Hinton turns out to be a Nobel Prize for plagiarism. They republished methodologies developed in #Ukraine and #Japan by Ivakhnenko and Amari in the 1960s & 1970s, as well as other techniques, without citing the original inventors.
The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
Who invented convolutional neural networks (CNNs)?
1969: Fukushima had CNN-relevant ReLUs [2].
1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than
LeCun's "5 best ideas 2012-22” are mostly from my lab, and older: 1 Self-supervised 1991 RNN stack; 2 ResNet = open-gated 2015 Highway Net; 3&4 Key/Value-based fast weights 1991; 5 Transformers with linearized self-attention 1991. (Also GAN 1990.) Details: people.idsia.ch/~juergen/lecun…
Quarter-century anniversary: 25 years ago we received a message from N(eur)IPS 1995 informing us that our submission on LSTM got rejected. (Don’t worry about rejections. They mean little.) #NeurIPS2020people.idsia.ch/~juergen/deep-…
Re: The (true) story of the "attention" operator ... that introduced the Transformer ... by @karpathy. Not quite! The nomenclature has changed, but in 1991, there was already what is now called an unnormalized linear Transformer with "linearized self-attention" [TR5-6]. See (Eq.
The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following
In 2020, we will celebrate that many of the basic ideas behind the Deep Learning Revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" 1990-1991:
Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” arxiv.org/abs/1511.09249. A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to
30 years ago in a journal: "distilling" a recurrent neural network (RNN) into another RNN. I called it “collapsing” in Neural Computation 4(2):234-242 (1992), Sec. 4. Greatly facilitated deep learning with 20+ virtual layers. The concept has become popular
people.idsia.ch/~juergen/deep-…
Meta used my 1991 ideas to train LLaMA 2, but made it insinuate that I “have been involved in harmful activities” and have not made “positive contributions to society, such as pioneers in their field.” @Meta & LLaMA promoter @ylecun should correct this ASAP. See
LeCun's "5 best ideas 2012-22” are mostly from my lab, and older: 1 Self-supervised 1991 RNN stack; 2 ResNet = open-gated 2015 Highway Net; 3&4 Key/Value-based fast weights 1991; 5 Transformers with linearized self-attention 1991. (Also GAN 1990.) Details: people.idsia.ch/~juergen/lecun…