Questions and Answers from Running a Local LLM

I had a few random questions from my Running a Local LLM on Your Laptop session at the Houston AI-lytics 2026 event last week, so this post looks at a few of those questions and my answers.

Note: This stuff is changing rapidly, and there aren’t a lot of factual answers. A lot of what you should look for is guidance and rational reasons for leaning in some direction.

Questions below:

  • Do we need an NPU? (Or what do I think of NPUs)
  • How do we audit or Test an AI LLM and know what is happening?
  • In which situations would you run a local model?
  • Which Model is Best?

Do we need an NPU? (Or what do I think of NPUs)

You don’t need an NPU to run a local LLM model, but they help with efficiency. An NPU is a Neural Processing Unit, which is a type of CPU that is designed to work with AI-type applications and process instructions more efficiently. This could be training a model or running LLM workloads.

I think an NPU is a great idea for efficiency. We already know AI applications use a lot of compute and power. Just look at all the concerns over power/water and investments being made in new data centers for AI. Being more efficient helps.

Just like a GPU helps with graphics and makes your laptop more efficient, an NPU will help, but it’s not required.

How do we audit or Test an AI LLM and know what is happening?

First, LLMs aren’t deterministic, so they might not return the same thing all the time. It’s hard to test a non-deterministic thing because we look to assert if a is passed in, b is returned. If I pass in a and sometimes get b, sometimes c, and rarely f, this is hard to test.

I have no idea how to test a model for behavior in this case.You get useful results from experiments, and more often they are useful than un-useful to continue using it. If that happens, faster, then you have a better model. If it’s slower/more expensive/less useful, it’s a worse model.

Auditing is looking at what happened, which means reaching into the processing of these GPT-type tools. There are some tools to help (AuditLLM), but I can’t speak to whether these are a) worth the effort, b) effective, or c) junk. I’m still learning here, too.

In which situations would you run a local model?

This is a hard one because there are a few situations in which I’d seriously consider a local model (including Amazon Bedrock/Azure AI/Google Vertex).

First, when I’m worried about costs and I want to control them. While the vendors give you some limits and throttles, it can be expensive. In many cases, if I want to set controllable spend, a known spend, I might consider a local model in some service because I can allocate out capacity and know what is available, what it will cost, and who will be using it. Perhaps the cloud vendors will give us more controls and ensure we aren’t on “shared” systems, but any efficient use of hardware to do this will be for their benefit, not mine.

Second, when I’m really concerned about data security. While most companies might promise they won’t use your data and will delete sessions, they might not, and they might make mistakes. And if they do, would they accidentally use my data, or send it in response to some sort of legal subpoena accidentally? If I’m outside the US or really worried, I’d run local models.

Third, if I want to ensure that I have complete control over the training of the model or the prompts, I might use a local model where I know there aren’t any system prompts being injected into my context.

Which Model is Best?

Yes.

There’s no good answer here. If you look at the list of models on Hugging Face, for example, there are lots and lots of models. None of us has time to test many of them, or even a small fraction. I think you have to depend on the community here to help you decide that any of these models are better for your situation.

Think about what you want a model to do, what things are important to your problem space, and then look for a model that people think works well and does the type of things you want to do. Similar to how you interview a person for certain types of work, think about that for a model.

The nice thing outside of the large LLMs is that you can use smaller models to fill in certain situations if you find you want to provide that capability a lot to your organization. I would see interpretation and linting of best practices in code, for example, using a smaller model that uses less compute, but it trained, or you fine-tune it for your particular situation (and save money).

Posted in Blog | Tagged , , | Leave a comment

Houston AI-Lytics 2026–Powerpoint Slides

Thanks to everyone for attending my session on running a Local LLM.

If you have any questions, please feel free to reach out with them. The slides with links embedded is below:

Slides: Running a Local LLM.PPTX

I’ll blog next week on a few of the questions people asked, so if you have anything you want answered, please reach out.

Posted in Blog | Tagged , , , | Leave a comment

The Book of Redgate: Do the Right Things

I do believe that Redgate has been very customer focused since it’s inception. I’ve worked with them in some capacity since 2002 and I’ve felt this along the way:

2026-02_0177

The next page has this statement:

We believe that if we do what is right for our customers then we will thrive.

I think that’s been true when we keep this in mind. The (relatively) few times we’ve started to do things for ourselves rather than thinking about customers, things haven’t worked out as well.

I think this sentiment is one that guides a lot of my life. Certainly inside Redgate, but also in the rest of my life. If I do what is best for another, or the world, often that works out well. It doesn’t mean I’m as efficient, profitable, less stressed, or anything else as I could be.

But I’m happier and I thrive.

I have a copy of the Book of Redgate from 2010. This was a book we produced internally about the company after 10 years in existence. At that time, I’d been there for about 3 years, and it was interesting to learn a some things about the company. This series of posts looks back at the Book of Redgate 15 years later.

Posted in Blog | Tagged , , | 1 Comment

Local Agents

Recently I saw an interesting article, saying that someone could build a general purpose coding agent in 131 lines of Python code. That’s a neat idea, though I’m not sure that this is better than just using Claude Code, especially as the agent still uses the online version of  the Claude model from Anthropic to generate code or perform other tasks. There’s a video in the article showing how this code can be used to perform some quick tasks on a computer.

However, the code isn’t specific to Anthropic. It can be used with any LLM, and I started doing just that, with a copy of the code from the article, but modified to use a local AI LLM running under Ollama. You can see my repo and feel free to download and play with it. It’s expecting a local LLM on 11434.

I’m a big fan of local agents for a variety of reasons, but mostly because I know humans tend to do dumb things. Especially with new technology, and maybe even more especially in development areas.

That includes me.

I’ll take shortcuts. I’ll give an agent sysadmin on a dev database to try things. I want to be able to experiment, learn, and see what works. I want to learn how to use tools and fail using them. That’s how I get better. That’s how I get better in sports, in music, and in technology.

And that’s not a project I can take time to work on. I don’t get to dedicate time to just learn and then go back to work. Work never ends. It’s a grinding, constant, continuous treadmill of things I need to deliver to others. I have to learn to experiment around those deliverables when I can find spare moments.

With AI, that means we’ll do things that get InfoSec teams to cringe. I get the concerns over data transiting networks and going to who-knows-where to be used who-knows-how-by-others. I appreciate business subscriptions that guarantee that data won’t be used, but I also want extra safeguards at times. That means local models. Not necessarily on my laptop, but in my data center.

Plus, that way I (or my org) can control the costs and manage expectations.

I hope local models and local agents catch on, I hope more vendors support them and more organizations are willing to run them. Even in something like AWS Bedrock or Azure Open AI or Vertex AI. Then I can rent the latest and greatest hardware, but have more control over how my organization uses it.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged , | Leave a comment