• NVIDIA GTC 2026: Keynote Announcements

    There are product launches. And then there’s what NVIDIA just did at GTC 2026. Jensen Huang didn’t walk on stage to announce one thing. He announced everything: silicon, software, and about half the enterprise software industry along with it. If you blinked, you missed something significant. So let me save you the scroll and share my favorite announcements.

    Before we dive into the announcements, I do want to emphasize what an amazing story teller Jensen Huang is. The build up in the keynote was, yet again, amazing. If you work in tech and you’ve never joined an in-person GTC, it should be on your bucket list. The speed of the keynote is bizar with announcement after announcement and demo after demo. One cool demo he showed, was the one on DLSS 5, where AI is used to enhance video images. Especially since it featured the Dutch National Football team:

    Now, let’s dive into the announcements :)

    New Hardware: Seven Chips, Full Production, No Waiting

    The NVIDIA Vera Rubin platform is the headline. Seven new chips, five rack configurations, all in full production today: Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and a newly integrated Groq 3 LPU. Designed to operate as a single coherent pod-scale AI supercomputer.

    The Vera Rubin NVL72 delivers up to 10x higher inference throughput per watt at one-tenth the cost per token compared to Blackwell, while training large MoE models with one-fourth the number of GPUs. The Vera CPU packs 88 custom Olympus cores with 1.2 TB/s of memory bandwidth: twice the bandwidth at half the power of a general-purpose CPU. A full rack holds 256 liquid-cooled CPUs sustaining 22,500+ concurrent environments. Alibaba, Meta, Oracle, and CoreWeave are already deploying it.

    The Groq 3 LPX rack targets trillion-parameter, low-latency agentic workloads — up to 35x higher inference throughput per megawatt paired with Vera Rubin. The BlueField-4 STX adds a dedicated KV cache storage tier boosting inference throughput by up to 5x. And the Spectrum-6 SPX brings co-packaged optics with 5x greater optical power efficiency.

    As a “one more thing”, Jensen also showed Rubin Ultra, and the switch he announced as the new NVLink. Long story short, you are now able to link 144 GPUs together. Mind blowing.

    Software: The Stack Beneath the Models

    NVIDIA Dynamo 1.0 is now generally available and open source: the inference operating system of the AI factory. It orchestrates GPU and memory resources across the cluster, routes requests intelligently, and boosts Blackwell inference performance by up to 7x. Already running at AWS, Azure, Google Cloud, Perplexity, Pinterest, and PayPal.

    The NVIDIA Agent Toolkit bundles Nemotron open models, the AI-Q Blueprint for agentic search (tops DeepResearch Bench, cuts query costs 50%+), and the new OpenShell an open-source runtime enforcing policy-based security and privacy guardrails for autonomous agents. Adobe, SAP, Salesforce, ServiceNow, Siemens, CrowdStrike and others are already building on it.

    NemoClaw brings Agent Toolkit to the OpenClaw community in a single install command: Nemotron models and OpenShell on your RTX PC, DGX Station, or DGX Spark. Always-on, private, locally controlled agents. Jensen compared OpenClaw to Mac and Windows for the personal AI era. Big claim. Watching the adoption curve, it doesn’t feel like a stretch.

    More about the NemoClaw solution can be found in the other blog post I released today. Please find it here: NemoClaw and DGX: Powering the Era of Agentic Scaling

    Ecosystem: Half the Industry Just Got On the Train

    The Nemotron Coalition: Mistral AI, Black Forest Labs, Cursor, LangChain, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab is co-developing open frontier models on DGX Cloud. The first model is co-built with Mistral AI and underpins the upcoming Nemotron 4 family.

    Adobe and NVIDIA announced a deep partnership: next-gen Firefly models on CUDA-X and NeMo, agentic creative workflows using OpenShell and Nemotron, and a cloud-native 3D digital twin solution for marketing built on Omniverse. Worth watching closely if you work in creative or marketing tech.

    On the industrial side, Cadence, Dassault Systèmes, Siemens, and Synopsys are shipping NVIDIA-powered AI agents for chip design and EDA. Honda runs aerodynamic simulations 34x faster on Grace Blackwell. Siemens’ Digital Twin Composer is already used by Foxconn, HD Hyundai, PepsiCo, and KION. The Vera Rubin DSX reference design ties it all together: a co-designed framework spanning compute, power, cooling, and grid integration.

    Because with $300B+ in equipment backlogs and 200+ gigawatts waiting in U.S. interconnection queues, energy is quietly the biggest bottleneck in AI infrastructure right now.

    So, What Does It All Add Up To?

    The inference layer has shipped. The agent runtime is open source. The training data pipeline is blueprinted. The hardware is in production. And an ecosystem of cloud providers, enterprise software platforms, and model builders is actively building on top of it all.

    NVIDIA isn’t selling GPUs. It’s selling the operating model for AI factories, from the silicon to the agent runtime to the power grid keeping it all running. And not just for the extremely large enterprises or the Fortune 500, but for everyone.

    If you weren’t paying close attention this week, now you are.

    👉 Curious about a specific announcement? Feel free to reach out, happy to go deeper on any of these.

  • |

    NemoClaw and DGX: Powering the Era of Agentic Scaling

    At GTC 2026, NVIDIA just dropped a bombshell: the NemoClaw stack for the OpenClaw community. If you thought AI was just about LLMs answering prompts, you’re looking at the past. We are officially entering the era of Agentic AI—systems that don’t just talk; they reason, use tools, and take autonomous action.

    Jensen Huang described this shift as the “fourth scaling law”: agentic scaling. We’ve already seen compute, data, and inference scale, but now we are scaling agency. This is about AI-to-AI workflows where the system understands the mission, plans the steps, and executes using the right tools. It’s the move from a passive chatbot to an active digital employee.

    For the architects and engineers reading this, the real breakthrough is the NemoClaw stack. It is an integrated environment that allows you to deploy NVIDIA Nemotron models and the newly announced NVIDIA OpenShell runtime with a single command. That’s it. One command to spin up a full, production-grade agentic platform.

    Whether you’re developing locally on an NVIDIA RTX PC or scaling out on DGX Station and the DGX Spark, NemoClaw handles the heavy lifting of the infrastructure. It introduces the privacy and security guardrails that have been the missing link for enterprise adoption. These “claws” (autonomous AI agents) are self-evolving, meaning they learn and adapt within your specific environment without ever needing to send sensitive data to a public cloud.

    NVIDIA is positioning OpenClaw as the operating system for personal AI. This is a fundamental shift in the software stack. We are moving away from centralized black boxes toward secure, always-on assistants running on our own metal. It’s about taking full control of the AI factory, from the silicon up to the agent layer.

    My take on the NemoClaw announcement?

    I’ve been holding back on running OpenClaw in my lab because of security/privacy reasons, but this new development has convinced me to come back from that decision. Some or the aspects I really like:

    • One-Command Deployment: NemoClaw is an open-source stack that allows users to install NVIDIA Nemotron models and the NVIDIA OpenShell runtime with a single command
    • Integrated Security: It installs the NVIDIA OpenShell runtime, which provides an isolated sandbox and defines how agents access data and use tools within strict policy boundaries
    • Flexible Model Support: The stack can tap into open models like NVIDIA Nemotron running locally, or use a privacy router to access frontier models in the cloud
    • Optimized Performance: NemoClaw uses the NVIDIA Agent Toolkit to optimize OpenClaw, providing the necessary infrastructure for agents to be productive while maintaining security and privacy guardrails
    • Always-On Agency: The stack enables proactive, “always-on” AI assistants to run around the clock for tasks such as writing code, analyzing data, and simulating outcomes

    I’ve been working on an idea for a app where I use Gemini as my product manager/planner, and Claude Code as my developer/tester. This obviously works pretty good, but hitting usage limits is a regular issue. Moving to NemoClaw as my developer is one of the first things I will do after GTC ;)

    Be sure to keep following me, because I’ll be creating some in-depth content about NemoClaw in the coming days/weeks!

    Be the first to hear about these articles by signing up for my LinkedIn newsletter here: Subscribe on LinkedIn

    Cheers,

    Johan

  • Open Source AI at NVIDIA GTC (with Rhys Oxenham and Sanjeet Singh from SUSE)

    Open source is quietly powering most of the AI world.

    From Linux to Kubernetes, much of the infrastructure behind modern AI systems is built on open technologies. But when people talk about open source AI, they often mean very different things.

    In this episode of The Private AI Lab, I sat down with Rhys Oxenham and Sanjeet Singh from SUSE to explore what open source AI really means — and why enterprises are increasingly turning to private AI platforms.

    The conversation also previews some of the themes we expect to see at NVIDIA GTC, where open source AI has become a major topic.

    What Is Open Source AI?

    When people hear the term open source AI, they often think about models. But the reality is broader. Open source AI typically includes several layers:

    • Infrastructure (Linux, Kubernetes, GPU drivers)
    • AI frameworks and runtimes
    • Model serving platforms
    • Open-weight models
    • Data pipelines and MLOps tools
    • Vector databases and retrieval systems

    Much of today’s AI ecosystem is built on these components. Companies like SUSE focus on taking these open technologies and making them enterprise-ready — providing security, validation, lifecycle management, and support.

    Why Enterprises Want Private AI

    One of the biggest themes in enterprise AI today is data sovereignty. Many organizations are hesitant to send sensitive data to externally hosted AI models Banks, governments, and healthcare providers often need to keep their data:

    • On-premises
    • Within specific jurisdictions
    • Under strict security controls

    Open-weight models allow organizations to deploy AI inside their own infrastructure, instead of relying entirely on external APIs. This approach enables organizations to:

    • Fine-tune models on internal data
    • Maintain full control over sensitive information
    • Avoid leaking patterns or proprietary insights

    This is one of the main drivers behind the rise of private AI platforms.

    Open Source Innovation Is Accelerating

    There’s a common misconception that the most advanced innovation in AI only comes from large frontier-model companies. But the open source ecosystem is moving incredibly fast. A recent example discussed in the episode is DeepSeek, which surprised the industry by releasing highly capable open models that dramatically reduced cost and compute requirements. Open source communities are also experimenting with:

    • highly specialized models
    • optimized inference engines
    • more efficient architectures
    • smaller models with strong reasoning capabilities

    In many cases, these specialized models can outperform larger frontier models for specific tasks.

    Efficiency vs Scale

    Another interesting trend is how open source projects are approaching AI development differently. Frontier models often pursue maximum scale. But open source developers are increasingly focused on efficiency. Instead of building one model that does everything, many teams are developing smaller models designed for specific tasks such as:

    • coding
    • reasoning
    • mathematics
    • customer support
    • document analysis

    These models can often deliver comparable performance while requiring significantly less compute. For enterprises running AI workloads on their own infrastructure, this efficiency can make a major difference.

    Open Source AI at NVIDIA GTC

    Open source AI is also becoming a major theme at NVIDIA GTC. NVIDIA has long supported open ecosystems — from CUDA integrations with Linux distributions to collaborations with open source AI frameworks. Many of the discussions at GTC now revolve around:

    • private AI infrastructure
    • sovereign AI deployments
    • enterprise AI platforms
    • agentic AI
    • physical AI and robotics

    The open ecosystem plays an increasingly important role in enabling these technologies.

    Building AI Platforms for the Enterprise

    The core goal of platforms like SUSE AI is to make it easier for organizations to deploy and operate AI workloads. That includes capabilities such as:

    • model lifecycle management
    • security and supply-chain validation
    • observability and monitoring
    • guardrails and policy controls
    • scalable GPU orchestration

    These capabilities allow enterprises to run AI workloads safely and reliably inside their own environments.

    AI Beyond Chatbots

    When many people think about AI, they still think about chatbots. But the real opportunity is much broader. AI platforms today support:

    • generative AI applications
    • machine learning pipelines
    • fine-tuning and training workflows
    • robotics and physical AI
    • intelligent automation

    In the episode, we even discuss how AI platforms can be used to power robotics applications and gesture recognition systems. The possibilities go far beyond simple text generation.

  • Getting excited for NVIDIA GTC (with Dirk Glücker)

    Next week the AI world gathers in San Jose for NVIDIA GTC, and if you’ve ever attended the conference, you know it’s unlike any other tech event. GTC is not just another vendor conference. It’s a place where researchers, platform engineers, AI practitioners, and industry leaders meet to explore what’s next in accelerated computing.

    In this pre-show episode of The Private AI Lab, I sat down with Dirk Glücker to talk about what we expect from this year’s event — from the keynote announcements to the most interesting sessions and practical tips for attendees.

    You can listen to the full conversation below:

    What We Expect from the 2026 Keynote

    The Jensen Huang keynote is always the centerpiece of GTC. Unlike many tech conference keynotes that feel heavily scripted, Jensen’s presentations tend to move fast and dive deep into the technology.

    Based on current rumors and trends, some of the areas we discussed that may appear in the keynote include:

    • updates to NVIDIA’s AI factory architecture

    • developments in distributed inference infrastructure

    • advancements in CUDA and the CUDA-X ecosystem

    • updates around AI infrastructure platforms

    • potential hardware announcements

    Sessions worth watching

    Below you can find a list of the sessions we are looking forward to:

    The State of Open Source AI [S81791] (in-person)

    Open source AI is evolving at record speed, reshaping how we train, refine, and deploy models across industries. Join leading builders as we explore the current state of open models and tools, the breakthroughs driving their momentum, and the new capabilities needed as AI becomes more specialized, multi-modal, and agent-driven.

    AI Factories in Europe: Building the Foundations for Scalable Intelligence [S81899] (hybrid)

    As Europe accelerates its AI ambitions, the concept of “AI factories” has become central to scaling innovation and production. This panel examines the three critical components required for successful AI factories: a reliable and sustainable energy supply, advanced data center infrastructure, and strong customer use cases that demonstrate measurable business value. Experts will discuss how Europe can integrate these elements to establish efficient, secure, and sovereign AI production systems. The conversation will highlight pathways to strengthen Europe’s competitiveness in the global AI landscape while aligning with sustainability goals and economic resilience.

    Accelerate AI Through Open-Source Inference [S81902] (Hybrid)

    The open AI ecosystem is thriving—powered by a new wave of high-performance inference frameworks and community-driven model development. In this session, we hear from ecosystem leaders as they discuss the critical role of open models in progressing AI innovation. We’ll dive deep into the collaborative infrastructure enabling state-of-the-art generative systems—from tokenizer to transformer to GPU kernel. Whether you’re building your own language model, optimizing inference pipelines, or contributing to open AI research, this is your chance to learn about optimization breakthroughs, interoperability standards, and the future of deploying open models at scale.

    vLLM in 2026: Architectural Challenges and Performance Optimizations [S82059] (In-person)

    As LLMs grow in size, context length, and architectural complexity, vLLM must evolve to meet new performance and scalability challenges. This talk presents key improvements in vLLM’s core architecture, including a GPU-first design for zero CPU overheads and an architecture for cluster-scale serving deployment. It also highlights major optimizations in KV cache management and GPU kernels. Gain a detailed technical view of how vLLM is advancing to deliver next-level performance on NVIDIA GPUs at scale.

    Inside the NVIDIA Inference Platform and Ecosystem [S81911] (In-person)

    A deeper dive into key announcements from the keynote with a focus on enabling technologies, predicted effects, and opportunities across industries.

    Join GTC today!

    If you like to join NVIDIA GTC, the only thing you need to do, is to register. Unfortunately, due to high interest, in-person registration isn’t possible anymore, but attending virtually still is!

    Use the following registration link to attend all the AI fun: https://nvda.ws/4qXGFjm

  • 8.5 Tips to Get the Absolute Most Out of NVIDIA GTC

    There are conferences you attend. And then there’s NVIDIA GTC, the kind of event where you land in San Jose curious… and leave questioning whether you’re moving fast enough. If you’re heading to GTC (physically or virtually), here are 8.5 slightly battle-tested, mildly opinionated tips to make sure you don’t just attend. but actually get the most out of it.

    1: Accept That You Can’t See It All (in-person)

    GTC is huge. You’ll scroll through the session catalog thinking, “Oh this looks good… and this… and this…” until your agenda resembles an overcommitted Kubernetes cluster. You cannot attend everything. And that’s okay. Pick a theme. Maybe it’s generative AI. Maybe it’s infrastructure. Maybe it’s robotics. Go deep instead of wide. You’ll enjoy it more, and remember it better. But, a bonus tip here, is that most of the sessions are being recorded and can be attended virtual as well!

    2: The Keynote Is Your North Star

    There’s always a moment during the keynote when the audience collectively leans forward. That’s when you realize: this isn’t a roadmap update. This is a glimpse of where things are heading. Watch it live if you can. In person, the energy is electric. Remote, it’s still powerful, just resist the temptation to answer email at the same time. Future-you will thank present-you.

    3: Block Real Time for the Show Floor (It’s Humungous)

    You may think, “I’ll just swing by the expo hall between sessions.” You will not. The show floor is enormous. It’s packed with startups, partners, demos, robots, simulations, and things that make you say, “Wait… that’s real?”

    Block time specifically to explore it. Wander. Ask questions. Watch demos you don’t fully understand. That’s where the unexpected inspiration happens. Also: yes, you will encounter robots. Be polite. They are taking notes.

    4: Wear Comfy Shoes. Not Your Sales Shoes.

    Let’s be clear. This is not the conference to debut your stylish-but-structurally-unsound footwear. You will walk. A lot.

    Please bring comfy shoes instead of your sales-shoes. Your feet will be thankful. Your step counter will quietly judge you. And you’ll still look plenty professional while not limping into your afternoon session.

    5: Schedule One “Outside Your Comfort Zone” Session

    Pick one session that feels slightly intimidating. The one where you think, “I’m not sure I’ll fully understand this.” Go anyway.

    Last year I went to a session about quantum computing, this year that’s session S81479: Physical AI for the Real World: A Vision From NVIDIA Robotics Research.

    It’s an example of a session that sparks a new idea, a new direction, or at least a new set of questions I didn’t know to ask. Growth rarely happens in the session titled “Things You Already Know.”

    6: Talk to Humans (Yes, Even in the Coffee Line)

    The real magic of GTC isn’t just on stage. It’s in hallway conversations. The spontaneous whiteboard sketches. The “what are you building?” chats with someone who casually mentions a solution to your exact problem. The sillicon valley investors, and even stars from shark tank :)

    Everyone is there because they care about building what’s next. Lean into that.

    7: Make Time for a Silicon Valley Reality Check

    If you’re flying all the way to California, carve out a few hours to visit things like the Computer History Museum, or try to visit some of the campuses in the area. Especially related to the musem, walking through the evolution of computing, from room-sized machines to today’s intelligent systems, gives GTC a whole new perspective.

    You’ll see how fast innovation moves. And how today’s “cutting-edge” becomes tomorrow’s museum piece. It’s humbling. And motivating.

    8: If You’re Attending Remotely, Be Intentional

    Can’t make it to San Jose? You’re still very much part of GTC. Many sessions are available online. But here’s the trick: don’t treat it as background noise. Block time in your calendar. Close your inbox. Watch the keynote live. Take notes. Maybe even host a mini watch party with your team. In the Netherlands the NVIDIA team always hosts a viewing party, which is a ton of fun!

    You might miss the 15,000 steps per day, though. But you won’t miss the ideas.

    8.5: Leave Space to Be Surprised

    This is the half tip. Don’t over-optimize your schedule so tightly that there’s no room for discovery. Leave a gap.

    Walk into a random session. Stop at a booth you hadn’t planned to visit. Sit down for a conversation that wasn’t on your agenda.

    Some of the best GTC moments aren’t planned.

    They just… happen.

    Ready to Experience It?

    Whether you’re going all-in in San Jose or joining from your favorite chair at home, GTC is one of those events worth experiencing with intention. If you haven’t registered yet, here’s a link to make it official:

    👉 https://nvda.ws/4qXGFjm

    Bring curiosity. Bring energy. Bring comfortable shoes. And if a robot asks what you’re working on, answer confidently. See you at GTC. 😉

  • Vibe Coding: Productivity Hack or Production Nightmare? (with Andrew Morgan)

    Johan sits down with seasoned developer Andrew Morgan to explore the emerging world of Vibe Coding. As organizations race to adopt AI, the conversation dives into the balance between rapid Vibe Coding and the essential discipline of Vibe Learning.

    Key Takeaways

    • Vibe Coding accelerates product delivery but can introduce hidden security and scalability issues.
    • Vibe Learning emphasizes mastering software architecture, DevOps, and security while using AI tools.
    • On‑prem AI (private AI) offers control over data, compliance, and reduces exposure to external threats.
    • Effective guardrails, code reviews, and prompt engineering are critical for safe AI deployments.

    Understanding Vibe Coding vs Vibe Learning

    Vibe Coding refers to using AI to generate code quickly, often bypassing deep technical understanding. While it boosts speed, it can produce fragile, insecure applications. Vibe Learning, on the other hand, combines AI assistance with continuous education in software patterns, architecture, and security, ensuring that developers grow alongside the technology.

    Security Risks of Rapid Vibe Coding

    The conversation highlights numerous incidents where startups released SaaS products built with AI‑generated code only to be compromised—exposed API keys, insecure databases, and even malicious payloads. Without a solid grasp of security fundamentals, private AI deployments can become easy targets for hackers. Implementing strict input validation, secret management, and regular security audits mitigates these risks.

    On‑Prem AI Deployments and Private Vector Stores

    Private AI solutions, such as running LLMs on‑prem or in a controlled cloud environment, give organizations full control over data and model behavior. The discussion mentions tools like Qdrant Vector Store for secure semantic search, emphasizing that keeping models and data in‑house reduces reliance on external APIs and aligns with compliance requirements.

    Best Practices and Guardrails for Production AI

    To safely bring AI‑enhanced applications to production, teams should establish:

    • Code review processes that involve human oversight.
    • Automated testing pipelines that include security scans.
    • Prompt engineering standards to prevent model jailbreaks.
    • Documentation of architectural decisions and AI integration points.
      These practices ensure that private AI systems remain reliable, maintainable, and secure.

    Conclusion

    Private AI is poised to become a cornerstone of modern software development, but its success hinges on disciplined Vibe Learning, robust security practices, and thoughtful on‑prem deployments. By adopting these strategies, organizations can reap AI benefits while safeguarding their assets.

    Enjoyed the insights? Subscribe to the johan.ml newsletter, follow Johan on LinkedIn, and stay tuned for more deep dives into private AI and emerging technologies.