muhtasham (@Muhtasham9) / X

muhtasham

2,251 posts

muhtasham

@Muhtasham9

evals, evals, evals

Latent Space

muhtasham.github.io/blog/

Joined March 2020

Pinned
muhtasham
@Muhtasham9
Jul 4, 2023
w boss
11K
muhtasham
@Muhtasham9
Mar 26, 2024
A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)
muhtasham
@Muhtasham9
Aug 14, 2022
Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
135K
muhtasham
@Muhtasham9
Jan 15, 2024
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan's post on this topic, I hacked something together over the weekend to streamline this
43K
muhtasham
@Muhtasham9
Feb 4, 2023
Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…lnkd.in/edQhXf3q lnkd.in/eM6nW38a
35K
muhtasham
@Muhtasham9
Oct 23, 2022
Meta: Multi-tasking while reading about Multi-task NLP models
muhtasham
@Muhtasham9
Mar 3, 2024
StarCoder2 running on M2 8GB
00:00
9.8K
muhtasham
@Muhtasham9
Mar 12, 2024
DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4? arxiv.org/abs/2403.06634
6.9K
muhtasham
@Muhtasham9
May 25, 2022
Replying to @_jasonwei and @arankomatsuzaki
Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says
NLP's Clever Hans Moment has Arrived
From thegradient.pub
muhtasham
@Muhtasham9
May 15, 2023
Replying to @Mascobot
muhtasham
@Muhtasham9
May 15, 2023
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
18K
muhtasham
@Muhtasham9
May 15, 2023
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
21K
muhtasham
@Muhtasham9
Mar 3, 2024
The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: huggingface.co/mlx-community Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model
BigCode
@BigCodeProject
Feb 28, 2024
Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open! hf.co/bigcode/starco…
mlx-community (MLX Community)
From huggingface.co
7.9K
muhtasham
@Muhtasham9
Feb 7, 2024
Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
00:00
15K
muhtasham
@Muhtasham9
Nov 3, 2022
Replying to @tszzl
amasad.me/red Here is PDF by @amasad
muhtasham
@Muhtasham9
Apr 30, 2023
If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.
muhtasham.github.io
A Deep Dive into the LLM Bootcamp Experience: Revolutionizing AI-Powered Applications – Koding...
3.3K