Memory Wall Gets Higher


Key Takeaways An increasing percentage of the chip area is consumed by the same amount of SRAM for each node shrink. The problem is not limited to leading-edge AI, as it will eventually impact even small MCUs and MPUs. Architectural changes may be required. Stacking SRAM chiplets on logic is possible but expensive. SRAM is a vital piece of all computing systems, but its fail... » read more

Solving Real-World AI Bottlenecks


The race to build smarter and faster AI chips continues to surge. This is especially true in autonomous vehicles that interpret the world in milliseconds, edge accelerators that push trillions of operations per second, hyperscale data-center processors that drive massive workloads, and next-generation consumer devices that demand ever-higher intelligence. As modern system-on-chip (SoC) architec... » read more

A Quantum Leap in Architecture Design of Chiplet Cache Systems


CacheStudio is a rapid architecture and design platform for chiplet based cache coherent systems. Cache hierarchies and parameters are specified in minutes with an abstract Python front end, and accurate workload driven simulations are performed at millions of instructions per second to uncover stateful performance indicators that require long runtimes. The results are presented interacti... » read more

Memory Fundamentals For Engineers


Memory is one of a very few elite electronic components essential to any electronic system. Modern electronics perform extraordinarily complex duties that would be impossible without memory. Your computer obviously contains memory, but so does your car, your smartphone, your doorbell camera, your entertainment system, and any other gadget benefiting from digital electronics. This eBook prov... » read more

Turbocharging Cost-Conscious SoCs With Cache


Some design teams creating system-on-chip (SoC) devices are fortunate to work with the latest and greatest technology nodes coupled with a largely unconstrained budget for acquiring intellectual property (IP) blocks from trusted third-party vendors. However, many engineers are not so privileged. For every “spare no expense” project, there are a thousand “do the best you can with a limited... » read more

MPAM-Style Cache Partitioning With ATP-Engine And gem5


The Memory Partitioning and Monitoring (MPAM) Arm architecture supplement allows for memory resources (MPAM MSCs) to be partitioned using PARTID identifiers. This allows privileged software, like OSes and hypervisors to partition caches, memory controllers and interconnects on the hardware level. This allows for bandwidth and latency controls to be defined and enforced for memory requestors. ... » read more

Performance Boost In Powerful Real-Time Cortex-R Processor Using Data Prefetch Control


High-performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. An effective prefetching mechanism can improve cache hit rate significantly. Data prefetching boosts the execution performance by fetching data before it is needed. While prefetching improves performance substantially on many programs, it can significantly red... » read more

Interconnects: Exploring Semi-Metals (Penn State, IBM, Rice University)


A technical paper titled "Exploring Topological Semi-Metals for Interconnects" was published by researchers at Penn State, IBM, and Rice University, with funding by Semiconductor Research Corporation (SRC). Abstract "The size of transistors has drastically reduced over the years. Interconnects have likewise also been scaled down. Today, conventional copper (Cu)-based interconnects face a ... » read more

Side-Channel Attacks Via Cache On the RISC-V Processor Configuration


A technical paper titled "A cross-process Spectre attack via cache on RISC-V processor with trusted execution environment" was published by researchers at University of Electro-Communication, Academy of Cryptography Techniques, Technology Research Association of Secure IoT Edge Application based on RISC-V Open Architecture (TRASIO), and AIST. "This work proposed a cross-process exploitation ... » read more

Every Walk’s A Hit: Making Page Walks Single-Access Cache Hits


As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB p... » read more

← Older posts