i spent the weekend working on a high performance inference system that will allow us to serve a few flavors of entropix via API on a limited basis, primarily to facilitate evals and research.
I needed an inference system that allowed for continuous batching, arbitrary
Culmination of months of hard work by the Akash+Brev teams. Big shoutout to the man himself @tmonty_12 for leading the charge.
Come get yourself some H100s :)
Today, Brev.dev is officially launching support for Akash GPUs.
Brev is on a mission to create the easiest way to develop with the leading open-source AI models and deploy on GPUs in the cloud. Purpose-built for developers who prioritize speed and simplicity.
We're excited to announce NVIDIA Dynamo v0.1.1 🚀
New in v0.1.1:
- H100 disaggregation benchmarks with vLLM
- TensorRT-LLM KV-Aware Routing + Disaggregation
- Unified Rust + Python logging configuration
- ManyLinux + Ubuntu pip wheels/crates
More details and a sneak peek at
Wanted to share some work I’ve been doing on Forky: A git-style approach to LLM conversations.
The problem: You’re deep in a chat and suddenly need to explore a tangent. You could open up a new chat and explore…but now you have 2 separate chats
How do you combine them? 🧵
Exciting news...
🚨 Brev.dev has been acquired by NVIDIA! ❤️🤙
We started Brev with the goal of building the best damn developer experience possible.
We’ve been working closely with NVIDIA since August, and we couldn't be more excited to team up.
I'm
I've spent the last year building a startup focused on democratizing access to GPUs.
Here's my thoughts on the new @BG2Pod with Jensen Huang, from someone who's lived the shift from AI infra startup to NVIDIA 🧵
Time really does fly when you're having fun...
After founding @tamublockchain last summer, today is my final day as president of the organization.
Here's a couple things I've learned throughout the process 🧵
NVIDIA Dynamo v0.2.0 is live 🚀
New in this release
- GB200 support with ARM builds
- Improved K8s deployment support
And the first version of Planner - a custom engine designed to intelligently manage and scale an LLM inference deployment...🧵
Had an incredible time chatting with @Baxate_carter on the podcast.
We chat about everything from grad school to distributed inference to approaching difficult decisions early in life :)
Give it a listen and let me know what you think!
How do you go from non-technical (econ degree) to NVIDIA Machine Learning Engineer in 2 years?
Also, get acquired twice, cause why not.
I had the chance to sit down with @0xishand and ask exactly this.
Link below