The On-Device LLM Revolution


The AI world is experiencing a fundamental shift. After years of cloud-centric inference dominated by massive data center GPUs, we're witnessing an accelerating migration of language models to edge devices. These are not the trillion-parameter behemoths that require server farms, but the "Goldilocks zone" models: 3B to 30B parameters — large enough to deliver genuinely useful AI capabilities,... » read more

Outlier-aware Quantization Framework Co-designed With Heterogeneous NVM For SLM Deployment on Edge Platforms (UCSD et al.)


  A new technical paper titled "QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design" was published by researchers at University of California San Diego and San Diego State University. Abstract "Deploying Small Language Models (SLMs) on edge platforms is critical for real-time, privacy-sensitive generative AI, yet constrained by memory, ... » read more

Overflowing Zoo: The Power Of Compilers


The term “model zoo” first gained prominence in the world of Artificial Intelligence/Machine Learning (AI/ML) beginning in the 2016-2017 timeframe. Originally used to describe open-source public repositories of working AI models — the most prominent of which today is Hugging Face — the term has since been adopted by nearly all vendors of AI chips and licensable Neural Processors Units (... » read more

KAN Acceleration: Algorithm Hardware Co-Design Approach (Georgia Tech, National Tsing Hua Univ., TSMC)


A new technical paper titled "Hardware Acceleration of Kolmogorov-Arnold Network (KAN) in Large-Scale Systems" was published by researchers at Georgia Institute of Technology, National Tsing Hua University and TSMC. Abstract "Recent developments have introduced Kolmogorov-Arnold Networks (KAN), an innovative architectural paradigm capable of replicating conventional deep neural network (DNN... » read more

SpiNNaker2 Neuromorphic Platform: HW-Aware Fine-Tuning of Spiking Q-Networks (TU Dresden Et Al.)


A new technical paper titled "Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform" was published by researchers at TU Dresden, ScaDS.AI and Centre for Tactile Internet with Human-in-the-Loop (CeTI). Excerpt "Spiking Neural Networks (SNNs) promise orders-of-magnitude lower power consumption and low-latency inference on neuromorphic hardware for a wide ran... » read more

LLMs On The Edge


Nearly all the data input for AI so far has been text, but that's about to change. In the future, that input likely will include video, voice, as well as other types of data, causing a massive increase in the amount of data that needs to be modeled and the compute resources necessary to make it all work. This is hard enough in hyperscale data centers, which are sprouting up everywhere to handle... » read more

Prevent AI Hardware Obsolescence And Optimize Efficiency With eFPGA Adaptability


Large Language Models (LLMs) and Generative AI are driving up memory requirements, presenting a significant challenge. Modern LLMs can have billions of parameters, demanding many gigabytes of memory. To address this issue, AI architects have devised clever solutions that dramatically reduce memory needs. Evolving techniques like lossless weight compression, structured sparsity, and new numer... » read more

On-Device Speaker Identification For Digital Television (DTV)


In recent years, the way we interact with our TVs has changed. Multiple button presses to navigate an on-screen keyboard have been replaced with direct interaction through our voices. While this has resulted in significant improvements to the Digital Television (DTV) user experience, more can be done to provide immersive and engaging experiences. Imagine you say, “recommend me a film” or... » read more

High-Level Synthesis Propels Next-Gen AI Accelerators


Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just ... » read more

Embrace The New!


The ResNet family of machine learning algorithms was introduced to the AI world in 2015. A slew of variations was rapidly discovered that at the time pushed the accuracy of ResNets close to the 80% threshold (78.57% Top 1 accuracy for ResNet-152 on ImageNet). This state-of-the-art performance at the time, coupled with the rather simple operator structure that was readily amenable to hardware ac... » read more

← Older posts