Vaishaal Shankar (@Vaishaal) / X

Vaishaal Shankar

530 posts

Vaishaal Shankar

@Vaishaal

Birth date Add your date of birth Switch to professional

Berkeley, CA

Joined August 2008

Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Finally get to try the new bart trains! I've been literally waiting months for this. Will live tweet the experience.
Vaishaal Shankar
@Vaishaal
Jul 18, 2024
We have released our DCLM models on huggingface! To our knowledge these are by far the best performing truly open-source models (open data, open weight models, open training code) 1/5
51K
Vaishaal Shankar
@Vaishaal
Jun 18, 2024
I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x
120K
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
Looks like the good-ol reboot trick didn't work. Now we are going to exit the train! In the tunnel! This is the most exciting commute I've had
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
They are trying to fix the train by turning the computer off and on again. I am sure this will work.
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
The bart technician says this is all because "all this technology is too new" [points to train].
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
Made it outside. Thanks @SFBART workers for handling the situation and getting everybody out safely!
Vaishaal Shankar
@Vaishaal
Oct 6, 2023
I had an argument with @PreetumNakkiran about MLPs 4 years ago. He said with enough data + compute the MLP/ConvNet gap would go to 0. I was convolution-pilled and convinced this wasn't possible. He was right:
arxiv.org
Scaling MLPs: A Tale of Inductive Bias
In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into...
28K
Vaishaal Shankar
@Vaishaal
Dec 6, 2023
Replying to @hankgreen
It's a multiple choice exam that covers ~57 subjects. It's generally a good benchmark for capabilities of a model. 90% just means the model got 90% of these questions right. The paper is not a terrible read:
arxiv.org
Measuring Massive Multitask Language Understanding
We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy...
15K
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
these trains are so much quieter I can hear the sound of my own thoughts
Vaishaal Shankar
@Vaishaal
Mar 6, 2020
Neural Kernels Without Tangents arxiv.org/abs/2003.02237 Joint work with Alex Fang, @WSguo, Sara Fridovich-Keil, @lschmidt3, @jrk and @beenwrekt Taking inspiration from convolutional networks, we construct high performance kernel functions for image classification (1/6)
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
Well we evacuated the train successfully! Turns out we were only few hundred yards from 12 street bart
00:00
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
Welp so much for that honeymoon phase. Looks like the train stopped abruptly in the tunnel before 12th street.
Vaishaal Shankar
@Vaishaal
Jun 24, 2019
Replying to @Vaishaal
They have succesfully dimmed the lights and turned off the AC.