Intelligence Per Watt: Measuring Local Inference Viability, Studying 20+ Models, 8 HW Accelerators (Stanford Univ.)


A new technical paper titled "Intelligence per Watt: Measuring Intelligence Efficiency of Local AI" was published by researchers at Stanford University and Together AI. Abstract: "Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to scale infrastruc... » read more

Identifying Divergences in HW Designs For High Performance Computing Workloads (LBNL et al.)


A new technical paper titled "Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads" was published by Lawrence Berkeley National Lab (LBNL), Foundation for Research and Technology - Hellas and University of Houston Clear Lake. Abstract "Developing efficient hardware accelerators for mathematical kernels used in scientific applications and machine learning has tra... » read more

Workload-Specific Hardware Accelerators


Workload-specific hardware accelerators are becoming essential in large data centers for two reasons. One is that general-purpose processing elements cannot keep up with the workload demands or latency requirements. The second is that they need to be extremely efficient due to limited electricity from the grid and the high cost of cooling these devices. Sharad Chole, chief scientist and co-foun... » read more

Preventing End-to-End Slowdowns In Accelerated Chip Multi-Processors (Cornell University, Intel Labs)


A new technical paper titled "RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors" was published by researchers at Cornell University and Intel Labs. Abstract "Recent chip multiprocessors incorporate several on-chip accelerators, marking the beginning of the Accelerated Chip Multi-Processor (XMP) era in datacenters. Despite the close proximity of accelerators and gener... » read more

HW-based Heterogeneous Memory Management for LLM Inferencing (KAIST, Stanford Unversity)


A new technical paper titled "Hardware-based Heterogeneous Memory Management for Large Language Model Inference" was published by researchers at KAIST and Stanford University. Abstract "A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferences suf... » read more

SW/HW Codesign For CXL Memory Disaggregation In Billion-Scale Nearest Neighbor Search (KAIST)


A technical paper titled “Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search” was published by researchers at the Korea Advanced Institute of Science and Technology (KAIST) and Panmnesia. Abstract: "We propose CXL-ANNS, a software-hardware collaborative approach to enable scalable approximate nearest neighbor search (ANNS) services. To this e... » read more

Formally Verifying Data-Oblivious Behavior In HW Using Standard Property Checking Techniques


A technical paper titled “A Scalable Formal Verification Methodology for Data-Oblivious Hardware” was published by researchers at RPTU Kaiserslautern-Landau and Stanford University. Abstract: "The importance of preventing microarchitectural timing side channels in security-critical applications has surged in recent years. Constant-time programming has emerged as a best-practice technique... » read more

An Open-Source Hardware Design And Specification Language To Improve Productivity And Verification 


A technical paper titled “PEak: A Single Source of Truth for Hardware Design and Verification” was published by researchers at Stanford University. Abstract: "Domain-specific languages for hardware can significantly enhance designer productivity, but sometimes at the cost of ease of verification. On the other hand, ISA specification languages are too static to be used during early stage d... » read more

Programmable HW Accelerators For BGV Fully Homomorphic Encryption In The Cloud


A technical paper titled “BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption” was published by researchers at COSIC KU Leuven, Galois Inc., and Niobium Microsystems. Abstract: "Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practic... » read more

Heterogeneous Multi-Core HW Architectures With Fine-Grained Scheduling of Layer-Fused DNNs


A technical paper titled "Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks" was published by researchers at KU Leuven and TU Munich. Abstract "To keep up with the ever-growing performance demand of neural networks, specialized hardware (HW) accelerators are shifting towards multi-core and chiplet architectures. So far, thes... » read more

← Older posts