Getting A Proprietary-Bus GPU Onto PCIe Enables Cheaper Local LLMs, For Now

If you’ve been thinking of getting into self-hosting generative AI, but don’t have a big budget for hardware, you might want to check out [Hardware Haven]’s latest video on an unusually cheap GPU option — but you’ll have to do so quickly, before the market realizes the chance for arbitrage and prices rise accordingly.

He’s gotten a hold of a 16 GB NVidia V100 card for only about a hundred bucks, mostly because it’s not easy to plug in, being on an SXM2 socket rather than the PCIe bus. SXM is a server architecture, and not something you’re likely to get on your motherboard. Another hundred got him an adapter board to fit this enterprise GPU on a consumer motherboard. That’s still a lot less than the PCIe version of the same card, which will likely set you back a thousand or more unless you get very lucky on eBay.

It’s not the newest card, dating back from 2017, but that doesn’t mean it can’t run the latest open models. After 3D printing a fan shroud for the thing so it didn’t cook itself, adding very slightly to the build cost, [Hardware Haven] set to work seeing what it could do. Going head-to-head against an RTX 3060 12 GB, the older V100 delivered more tokens per second at a  slightly higher efficiency — but much higher idle power.

Still, it’s nice to see a cheap way to get into local AI, even if it might not still be cheap by the time you read this. Once you have the hardware, you might want some easy software options so you don’t have to spend all day on setup. Of course you only need a hefty GPU to run larger models — you can get into hosting your own AI on a Raspberry Pi, if you’re patient.

Continue reading “Getting A Proprietary-Bus GPU Onto PCIe Enables Cheaper Local LLMs, For Now”

A GPU card with a home-made fan assembly

3D-printed Fan Mount Keeps Server GPU Cool In Desktop Case

Most readers of Hackaday will be well aware of the current shortages of semiconductors and especially GPUs. Whether you’re planning to build a state-of-the art gaming PC, a mining rig to convert your kilowatt-hours into cryptocoins, or are simply experimenting with machine-learning AI, you should be prepared to shell out quite a bit more money for a proper GPU than in the good old days.

Bargains are still to be had in the second-hand market though. [Devon Bray] chanced upon a pair of Nvidia Tesla K80 cards, which are not suitable for gaming and no longer cost-effective for mining crypto, but ideal for [Devon]’s machine-learning calculations. However, he had to make a modification to enable proper thermal management, as these cards were not designed to be used in regular desktop PCs.

The reason for this is that many professional-grade GPU accelerators are installed in rack-mounted server cases, and are therefore equipped with heat sinks but no fans: the case is meant to provide a forced air flow to carry away the card’s heat. Simply installing the cards into a desktop PC case would cause them to overheat, as passive cooling will not get rid of the 300 W that each card pumps out on full load.

[Devon] decided to make a proper thermal solution by 3D printing a mount that carries three fans along with an air duct that snaps onto the GPU card. In order to prevent unnecessary fan noise, he added a thermal control system consisting of a Raspberry Pi Pico, a handful of MOSFETs, and a thermistor to sense the GPU’s temperature, so the fans are only driven when the card is getting hot. The Pi Pico is of course way more powerful than needed for such a simple task, but allowed [Devon] to program it in MicroPython, using more advanced programming techniques than would be possible on, say, an Arduino.

We love the elegant design of the fan duct, which enables two of these huge cards to fit onto a motherboard side-by-side. We’ve seen people working on the opposite problem of fitting large fans into small cases, as well as designs that discard the whole idea of using fans for cooling.

Continue reading “3D-printed Fan Mount Keeps Server GPU Cool In Desktop Case”