Why More CPUs Are Needed For Agentic AI


The shift from generative AI to agentic AI will significantly increase the amount of compute power needed in data centers. Queries to search for and analyze data from multiple sources will be performed simultaneously by agents and without human intervention, rather than a single request from a live person. Jeff Defilippi, senior director of product management at Arm, talks about the impact of r... » read more

Heterogeneous NPU Data Movement: What The Execution Flow Shows


Heterogeneous NPU designs bring together multiple specialized compute engines to support the range of operators required by modern AI models. This approach enables coverage across diverse workloads, but it also introduces a structural consequence: intermediate data must move between those engines. That movement consumes power, adds latency, and requires additional silicon resources, with effect... » read more

The Coming Breakup Between AI And The Cloud


For a decade, cloud AI has felt inevitable. It powers our voice assistants, photo libraries, recommendation engines, and a growing list of “smart” features we barely notice anymore. Yet beneath the convenience is a fragile dependency: if your connection stutters, your intelligence does too.​ We rarely question this arrangement, but we should. As models grow larger and expectations grow... » read more

AI Accelerators Usher In New Era For IC Test


Key Takeaways The parallelism in AI accelerators enables low latency but complicates failure isolation. HBM can account for 50% of package cost, so known-good stack assurance is critical. DFT and test cooperate to solve final test, singulated die test, SLT, and in-system test for data centers. AI accelerators are used for everything from training large language models to mak... » read more

The On-Device LLM Revolution


The AI world is experiencing a fundamental shift. After years of cloud-centric inference dominated by massive data center GPUs, we're witnessing an accelerating migration of language models to edge devices. These are not the trillion-parameter behemoths that require server farms, but the "Goldilocks zone" models: 3B to 30B parameters — large enough to deliver genuinely useful AI capabilities,... » read more

Voice is the New UI


Recent years have seen a paradigm shift in the user interface (UI) of our computers and client devices, and this is gaining momentum. Advancements in large language models (LLM), small language models (SLM), energy-efficient systems on chip (SoC), and on-device AI processing are making voice input the new “keyboard”. Read more here.   Fig.1: Voice Processing Pipeline On-De... » read more

Balancing Training, Quantization, And Hardware Integration In NPUs


Experts At The Table: AI/ML is driving a steep ramp in neural processing unit (NPU) design activity for everything from data centers to edge devices such as PCs and smartphones. Semiconductor Engineering sat down to discuss this with Jason Lawley, director of product marketing, AI IP at Cadence; Sharad Chole, chief scientist and co-founder at Expedera; Steve Roddy, chief marketing officer at Qu... » read more

Balancing Workloads In AI Processor Designs


A growing number of AI processors are being designed around specific workloads rather than standardized benchmarks, optimizing performance and power efficiency, but often with enough flexibility to adapt to future changes. While the fundamentals of matrix multiplication and software optimization still apply, those alone are no longer sufficient. Designs need to address specific data types, w... » read more

Workload-Specific Hardware Accelerators


Workload-specific hardware accelerators are becoming essential in large data centers for two reasons. One is that general-purpose processing elements cannot keep up with the workload demands or latency requirements. The second is that they need to be extremely efficient due to limited electricity from the grid and the high cost of cooling these devices. Sharad Chole, chief scientist and co-foun... » read more

Chiplet-Based NPUs to Accelerate Vehicular AI Perception Workloads


A new technical paper titled "Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception" was published by researchers at UC Irvine. Abstract "We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. The motivation stems from how chiplets technology i... » read more

← Older posts