Wi-Fi 7 Moves To The IoT


Wi-Fi 7 has been a staple in high-end applications such as notebook computers and AR/VR glasses for the past couple years, where high-speed connectivity and low latency are essential. Known alternatively as IEEE 802.11be, and Extremely High Throughput Wi-Fi, it is starting to migrate downstream into IoT devices such as smart door locks, thermostats, and robotic vacuum cleaners. But the reason i... » read more

PCIe 8.0: Preparing For The Next Doubling


By Monica Olvera and Gustavo Pimentel Every few years, the industry confronts the same challenge: can general-purpose I/O double again without overwhelming power budgets, overwhelming signal-integrity limits, or fragmenting the ecosystem? With PCIe 8.0, the answer appears to be yes—if the entire stack continues to advance together. Public PCI-SIG information outlines an objective of 256.0 ... » read more

Critical Factors For Storing Data In DRAM


DRAM is becoming more complicated to develop, and more difficult to manage inside AI data centers. In the past, latency, bandwidth, and capacity were the primary considerations. But as the amount of data that needs to be processed, moved, and stored continues to rise, a whole new set of factors is emerging. Steven Woo, fellow and distinguished inventor at Rambus, talks about latency under load,... » read more

Cost-Effective, Orthogonal Approach to Resilient Memory Design (Univ. of Central Florida, UT San Antonio, Rochester)


A new technical paper titled "SCREME: A Scalable Framework for Resilient Memory Design" was published by researchers at University of Central Florida, University of Texas at San Antonio and University of Rochester. Abstract "The continuing advancement of memory technology has not only fueled a surge in performance, but also substantially exacerbate reliability challenges. Traditional soluti... » read more

A Fundamental Rethinking Of Memory Hierarchy Design (Stanford University)


A new technical paper titled "The Future of Memory: Limits and Opportunities" was published by researchers at Stanford University and an independent researcher. Abstract "Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large ... » read more

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)


A new technical paper titled "Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System" was published by researchers at Rensselaer Polytechnic Institute and IBM. Abstract "Large Language Model (LLM) inference is increasingly constrained by memory bandwidth, with frequent access to the key-value (KV) cache dominating data movement. While attention sparsity red... » read more

Often Overlooked, PHYs Are Essential To High-Speed Data Movement


Over the past couple of decades, the semiconductor industry has evolved from a supporting role for traditional verticals like mobile, automotive, and PCs to a foundational role in those markets, as well as in AI factories and hyperscale data centers. Underlying this transformation is the physical layer (PHY), which has emerged as a critical enabler for data transfer and communications. The P... » read more

Expanding Server Memory Capabilities With Multiplexed Rank DIMM (MRDIMM) Technology


The scaling of computational power within a single, packaged semiconductor component continues to rise following a Moore’s law type curve enabling new and more capable applications including machine learning (ML), generative artificial intelligence (AI), and training and deployment of large language models (LLM). On-demand lifestyle applications like language translation, direction finding, a... » read more

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)


A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

Speeding Up Die-To-Die Interconnectivity


Disaggregating SoCs, coupled with the need to process more data faster, is forcing engineering teams to rethink the electronic plumbing in a system. Wires don't shrink, and just cramming more wires or thicker wires into a package are not viable solutions. Kevin Donnelly, vice president of strategic marketing at Eliyan, talks about how to speed up data movement between chiplets with bi-direction... » read more

← Older posts