user avatar
Vahid Kazemi
@VahidK
PhD in machine learning. KTH 14. Ex @xAI, @OpenAI, @Apple, @Google.
San Francisco Bay Area, CA
Joined April 2008
  • Pinned
    user avatar
    Finally finished editing my video. Episode 02: Build your own language model: youtube.com/watch?v=7ny8Sr….
  • user avatar
    In my opinion we have already achieved AGI and it’s even more clear with O1. We have not achieved “better than any human at any task” but what we have is “better than most humans at most tasks”. Some say LLMs only know how to follow a recipe. Firstly, no one can really explain
  • user avatar
    Effective PyTorch. github.com/vahidk/Effecti… First 6 lessons are committed. More to come.
  • user avatar
    I was working on optimizing some Pytorch code today and was amazed how fast Pytorch ran a pretty non-optimal code. So I made some test cases to compare with TensorFlow. Pytorch handily beat TensorFlow running vectorized and non-vectorized code in my test cases. Here's one:
  • user avatar
    Since I left Alphabet, I came to realize that computational resources can be limited! Gone are the days of using hundreds of TPUs without anyone raising an eyebrow. Now I spend a lot of time optimizing neural nets to train and run faster. The experience has been very rewarding.
  • user avatar
    We barely understand how a 3 layer MLP with ReLU optimized with SGD works. Let alone a 200B parameter Transformer model optimized on entirety of internet. I take with a grain of salt any expert opinion for what an LLM can or can not do.
  • user avatar
    I've been looking for literature around efficient data collection for ML models and realized for every 1000 papers about NN architecture you can maybe find one paper about data. In practice data engineering is equally or more important, but it's completely neglected by academia.
  • user avatar
    ML engineer productivity tip: Don't stare at TensorBoard.
  • user avatar
    I wonder how many millions of hours of engineering time would have been saved, if C++ had a built-in standard linear algebra library like Eigen.
  • user avatar
    A practical lesson I learned from doing research in deep learning is to spend considerable amount of time at the beginning of the process on optimizing data loading and common operations making sure 100% of my GPU resources are utilized. It pays off massively in the long run.
  • user avatar
    I made a small package which allows reading tfrecord files in PyTorch with no tf dependency:
  • user avatar
    Python is so inefficient, Python coders think twice before implementing any new algorithm; they prefer a ready made library (usually written in C++). Paradoxically this has made Python coders much more productive. C++ coders are still writing their own string classes in 2019.
  • user avatar
    Executing each op in TensorFlow has a massive overhead. One way to optimize your code is to use as few ops as possible.
  • user avatar
    My paper on real-time face landmark estimation (used by Snapchat and several other companies) just passed 1000 citations according to Google scholar. Quite a milestone! scholar.google.com/scholar?cluste…