Pinned
muhtasham
2,251 posts
evals, evals, evals
- A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
- Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan's post on this topic, I hacked something together over the weekend to streamline this
- Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…lnkd.in/edQhXf3q lnkd.in/eM6nW38a
- DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4? arxiv.org/abs/2403.06634
- Replying to @_jasonwei and @arankomatsuzakiMight contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says
- Replying to @Mascobot🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
- 🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
- The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: huggingface.co/mlx-community Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --modelIntroducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open! hf.co/bigcode/starco…
- Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
00:00 - If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.







