This article about running LLMs on a laptop inspired me to try out LM Studio again on Windows. This is on a computer that I first built 8 years ago and updated in 2020. Ryzen 3700X CPU with 32GB RAM, and an old GeForce 1080 with 8GB of VRAM.
Bottom line: I can run Q4 models up to 8B parameters pretty fast, 20-25 tokens/s. It feels about as fast as basic ChatGPT. Big models like 27B run in one tenth the speed, 2-3 tokens/second. The big difference is the smaller models fit entirely in my 8GB VRAM. I think there could be some headroom for more than 8B models, maybe as much as 12B? But in practice the 12B models are running a lot slower, so maybe not.
You can run inference usefully entirely on a CPU, no GPU at all, as long as you aren’t in a hurry. LM Studio is smart about offload parts of a model to use as much of the GPU as it can, but it seems much faster if it can run the whole thing on the GPU. Most of these models are 4 bit quantized, so a 12B model needs roughly 6GB of RAM to load. Plus overhead which is significant, 20%? I can’t make the numbers add up in testing, I must not understand something.
Update: I can get about 20 tokens/s on fast small models, and 2 tok/s on big slow models. FWIW ChatGPT says the consumer web products are working at about 50 tok/s.
Anyway, some quick tests of recent models. The prompt was something like “write me a python program to reverse the words in a sentence. No fair using [::-1] to slice it backwards, that’s cheating Just write the reverse function and show it to me, don’t discuss your answer or show usage and tests”.
I didn’t really inspect the quality of the output, just testing what I can run. Unfortunately I can’t exactly measure how much RAM and VRAM these things are loading: LM Studio does not accurately report it. So instead I’m recording the model download size: RAM usage seems to be 10-20% more.
Fast
deepseek/deepseek-r1-0528-qwen3-8b. 20-25 tokens/sec. 5.03GB. Note deepseek has an extensive “thinking… process” first so the useful output takes longer. Most of those tokens at 20 tokens/sec are not useful to me! It spun for several minutes on my programming questions, thinking to itself. I’ve got no idea what’s going on here but it was not very useful.
google/gemma-2-9b: 20 tok/s. 5.1GB
google/gemma-3n-e4b (6.9B). 25 tok/s, 4.2GB
gemma-3-270m (0.3B) 112 tok/s but while the output is coherent it is very wrong
Slow
google/gemma-3-12b. 7.5 tok/s. 8.2GB. LM Studio says “Full GPU Offload possible” but I have my doubts based on observed behavior.
google/gemma-3-27b. 2.7 tokens/sec. 16.4GB
mistralai/mistral-small-3.2 (24B). 3.6 tokens/sec. 15.2GB
Prompts (Aug 1)
I forgot to document the prompts I’m using to testing. It’s very simple. (typos and all!)
- who am i talking to?
- write my a python program to reverse the words in a sentence
- write my a python program to reverse the words in a sentence. No fair using [::-1] to slice it backwards, that’s cheating… Just write the reverse function and show it to me, don’t discuss our answer… or show usage and tests
- summarize the literary career of Edmund White in about 500 words.
Update (Linux server, Aug 1)
I’m doing some new experiments using Ollama on my main server, a 2022 machine with an i5-12600K and 64GB of RAM. It has no GPU at all. I’m giving the LXC 40GB of RAM and 12 CPUs (the machine has 6 performance cores with 2 threads each.)
google/gemma-3n-e4b (6.9B). 14tok/s. ollama has 8GB resident.
gemma3/12b: (12B). 5.6 tok/s. Ollama is 10GB resident.
gemma3/27b: (27B). 2.7tok/s. Ollama has 19GB resident.
Summary: even without a GPU, my little server can run useful models fairly quickly. 14 tok/s on the server CPU feels reasonably fast and is a little over half the speed of the same model running on my Windows machine with an old GPU. The other big conclusion here is that tokens/second is correlated with number of weights in a model. Twice the weights, half the speed. (It’s not quite linear, but close.) I guess that’s obvious but I never understood it before.
Update (Hosting, Aug 1)
I don’t know that self-hosting an LLM makes sense. ChatGPT claims it would cost a few hundred dollars a month to rent a server with a GPU big enough to run gemma3/27b. Alternately you can pay companies to run gemma3/27b for you and access it for about $0.20 / 1M tokens. OpenAI and Claude are significantly more expensive, on the order of $10-$70 / 1M tokens. But then they are much higher quality systems with a lot more going on than just the LLM.
Outputs
I said I wasn’t evaluating output quality but here’s the Edmund White transcripts. They are all competent but the gemma3n/e4b model’s answer feels a little off, it talks about some lesser books and some of the tone feels a little off (like referring to “a close friend”). And the gemma3/12b model is just plain wrong in places, hallucinating the title “A Boy’s Own Apollo”. gemma3/27b also makes some factual mistakes: My Lives wasn’t published in 1981 but in 2005. Overall these answers seem good but have some real flaws.
gemma3n/e4b
Edmund White, born in 1940, is a towering figure in American literary
history, renowned for his unflinching explorations of gay male life,
sexuality, and the complexities of identity. His career, spanning over
four decades, has consistently pushed boundaries, blending memoir,
fiction, and literary criticism into a distinctive and highly acclaimed
body of work.
White’s early writing emerged during the burgeoning gay rights movement of
the 1970s. His first novel, Desire (1976), established his signature
style: a meticulously detailed, often self-deprecating narrative that
prioritized psychological realism over plot-driven storytelling. It
explored the anxieties and desires of a young, closeted gay man in New
York City, setting the tone for his future explorations of intimacy and
self-discovery. This period saw him become a prominent voice in the gay
literary scene, offering nuanced portrayals that challenged prevailing
stereotypes.
Throughout the 1980s and 90s, White’s work gained wider recognition. He
published several novels, including A Boy’s Own Story (1984), a
semi-autobiographical novel that delved into his childhood and
adolescence, and The Beautiful Room is Empty (1988), a poignant
exploration of grief and loss following the death of a close friend. These
works cemented his reputation as a master of psychological realism,
capable of capturing the subtle nuances of human emotion and experience.
He also began to publish more explicitly autobiographical works, blurring
the lines between fiction and memoir.
A significant turning point in White’s career came with the publication of
My Usual Profession (1997), a collection of essays that offered a candid
and often humorous account of his life as a writer. This collection
revealed a sharp wit and a willingness to expose his own vulnerabilities,
further solidifying his reputation as a literary intellectual. He
continued to publish both fiction and memoir, including The Farewell, My Love (1998), a deeply moving exploration of love and loss, and States of Desire (2000), a collection of essays that examined the complexities of
desire and identity.
In later years, White’s work has become increasingly introspective and
philosophical. He has explored themes of aging, mortality, and the legacy
of his own life. His novels, such as The Humbling (2008), a
fictionalized account of a mentally fragile painter, and Life is Elsewhere (2011), continue to grapple with the challenges of human
connection and the search for meaning. He has also written extensively
about other writers, offering insightful and often affectionate portraits
of literary giants like Marcel Proust and Henry James.
Edmund White’s literary career is characterized by his unwavering
commitment to honesty, his masterful prose, and his profound understanding
of the human condition. He has consistently challenged conventions and
pushed the boundaries of literary form, creating a body of work that is
both deeply personal and universally relevant. He remains a vital and
influential voice in American literature, celebrated for his ability to
illuminate the complexities of gay life and the enduring power of love and
loss.
gemma3/12b
Edmund White’s literary career is a fascinating and evolving chronicle of
a life lived openly and honestly, reflecting shifts in American culture
and the ongoing struggle for LGBTQ+ visibility. Spanning over four
decades, his work has traversed genres, from memoir and biography to
novels and essays, consistently marked by a keen intellect, elegant prose,
and a persistent exploration of identity, desire, and the complexities of
human connection.
White’s early work, heavily influenced by Proust and Fitzgerald,
established him as a significant voice in American literature. His first
novel, A Boy’s Own Apollo (1985), launched his career and became a
landmark in gay literature. The trilogy, comprising A Boy’s Own Apollo,
The Farewell Symphony, and The Plotting of Mrs. Lang, boldly explored
the coming-of-age and interior life of Thomas Scholes, a gay man grappling
with societal expectations, artistic ambition, and the AIDS crisis. These
novels, initially criticized for their perceived elitism, resonated deeply
with readers seeking authentic portrayals of gay experience, and
solidified White’s reputation for stylish and introspective prose.
The AIDS crisis profoundly shaped White’s subsequent work. States of the Union (1988), a semi-autobiographical novel, offered a poignant and
searing account of a young man’s experience with the disease, interwoven
with political commentary. This marked a shift towards greater social
engagement in his writing. Following the deaths of friends and loved ones,
White turned to biography, finding solace and intellectual stimulation in
exploring the lives of others. His biographies of Guillaume Apollinaire
(Guillaume Apollinaire: A Life) and Caravaggio (Caravaggio)
demonstrated his talent for bringing historical figures to life, revealing
their complexities and struggles within their respective contexts.
The turn of the millennium saw White further evolve his style and themes.
The Caliph’s House (2002), a memoir recounting his experiences with a
devastating and secretive relationship, was a raw and intensely personal
exploration of love, loss, and deception. It showcased a willingness to
confront painful truths and challenge conventional notions of memoir. This
work, and subsequent memoirs like Inside a Vicarious Mood: Re-reading a Life (2006) and How Odd to be Titular: A Collection of Photographs
(2009), revealed a sophisticated understanding of self-reflection and a
willingness to examine his own complicity in difficult situations.
In recent years, White’s work has become increasingly focused on the
complexities of relationships and the ongoing search for connection. Man in Profile: Essays (2011) and Let’s Not Talk About Cesars (2014)
offered meditations on art, literature, and the evolution of his personal
life. The Loveliest Home That Lady Ever Built (2018), a novel inspired
by the life of Charlotte Perkins Gilman, showcased his continued ability
to inhabit the perspectives of diverse characters and to grapple with
issues of gender, class, and societal expectations.
Throughout his career, Edmund White has remained a vital and influential
voice in American literature. His unflinching honesty, lyrical prose, and
ongoing exploration of identity and desire have consistently challenged
readers and broadened the landscape of literary representation. He has not
only chronicled his own journey but has also helped to illuminate the
experiences of countless others.
gemma3/27b
Edmund White is arguably one of the most important American novelists and
essayists to emerge in the late 20th and early 21st centuries, largely
credited with pioneering a new frankness in gay literature. His career is
marked by a persistent exploration of desire, memory, and the complexities
of identity, often through a distinctly autobiographical lens.
White began publishing in the 1970s, a period when openly gay fiction was
still relatively rare. His early novels, like A Boy’s Own Story (1982),
were groundbreaking for their unflinching portrayal of a young boy coming
to terms with his homosexuality. Told through the voice of a young
protagonist navigating childhood and adolescence, it broke ground by
presenting gay desire not as a pathology or tragedy, but as a natural, if
sometimes complicated, part of life. This novel established several key
themes that would recur throughout his work: the exploration of desire as
a driving force, the power of memory, and the often-painful process of
self-discovery.
Following A Boy’s Own Story, White published The Beautiful Room Is Empty (1985), a sequel that continued to chart the protagonist’s journey
into adulthood and the search for love and acceptance. This novel deepened
his exploration of the emotional and psychological landscape of gay life,
tackling themes of isolation, longing, and the challenges of finding
lasting connection.
His work took a different turn with The Farewell Tour (1993), a
fictionalized account of a closeted film star, and the subsequent novels
Bonfire (1998) and The Married Man (2000). These novels moved away
from the coming-of-age narrative and explored the complexities of love,
desire, and commitment within established relationships, often involving
characters grappling with the tension between public persona and private
truth.
Beyond fiction, White has also made significant contributions as a
memoirist and essayist. His memoir, My Lives (1981), was a critical and
commercial success, offering a candid and introspective account of his
upbringing, sexual awakening, and literary ambitions. It’s known for its
unflinching honesty and elegant prose. He’s continued to publish memoirs
and essays, reflecting on his life, work, and the evolution of gay
culture.
A key aspect of White’s career is his willingness to challenge
conventional literary boundaries. He often blends fiction and
autobiography, blurring the lines between personal experience and
imaginative creation. He’s been criticized for this perceived
self-indulgence, but his defenders argue that this blending is essential
to his artistic vision and allows him to explore universal themes of
identity, desire, and memory with exceptional depth and nuance.
Throughout his career, Edmund White has remained a vital and influential
voice in contemporary literature. His work has been praised for its
emotional honesty, stylistic elegance, and its unwavering commitment to
exploring the complexities of the human heart and the enduring search for
self-discovery. He paved the way for a more open and honest portrayal of
gay lives in literature, and his influence can be seen in the work of many
contemporary writers.