Papers

Name	Finish Date
DeepSpeed: Extreme-scale model training for everyone	2026-03-05
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
2026-02-17
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks	2026-02-10
In-Datacenter Performance Analysis of a Tensor Processing Unit	2026-01-27
TABLA: A unified template-based framework for accelerating statistical machine learning	2026-01-23
Efficiently compiling efficient query plans for modern hardware	2025-12-08
Encapsulation of parallelism in the Volcano query processing system	2025-12-08
Parallel Database Systems: The Future of High Performance Database Processing	2025-12-08
The Case for Learned Index Structures	2025-12-08
C-store: a column-oriented DBMS	2025-12-08
Vectorwise: Beyond Column Stores	2025-12-07
R-trees: a dynamic index structure for spatial searching	2025-12-07
The Bw-Tree: A B-tree for new hardware platforms	2025-12-04
The Snowflake Elastic Data Warehouse	2025-11-19
A comparison of approaches to large-scale data analysis	2025-11-17
SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference	2025-11-12
Flame: Simplifying Topology Extension in Federated Learning	2025-11-11
Bigtable: A Distributed Storage System for Structured Data	2025-11-11
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models	2025-11-09
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training	2025-11-09
Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving	2025-11-07
SqueezeLLM: Dense-and-Sparse Quantization	2025-11-05
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks	2025-11-04
Fast Inference from Transformers via Speculative Decoding	2025-10-28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	2025-10-28
Cartridges: Lightweight and general-purpose long context representations via self-study	2025-10-26
UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency	2025-10-25
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	2025-10-24
A Berkeley View of Systems Challenges for AI	2025-10-14
Hidden Technical Debt in Machine Learning Systems	2025-10-14
DeepSeek-V3 Technical Report	2025-10-12
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	2025-10-06
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve	2025-10-05
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes	2025-10-04
Orca: A Distributed Serving System for Transformer-Based Generative Models	2025-10-03
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling	2025-10-03
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads	2025-09-29
TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters	2025-09-28
MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters	2025-09-28
An Empirical Evaluation of Columnar Storage Formats	2025-09-27
A variable warp size architecture	2025-09-27
Gandiva: Introspective Cluster Scheduling for Deep Learning	2025-09-23
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads	2025-09-21
INFaaS: Automated Model-less Inference Serving	2025-09-21
Scalable GPU graph traversal	2025-09-21
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives	2025-09-16
Clipper: A Low-Latency Online Prediction Serving System	2025-09-16
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models	2025-09-14
Accelerating Large Graph Algorithms on the GPU Using CUDA	2025-09-14
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning	2025-09-14
ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving	2025-09-12
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM	2025-09-09
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models	2025-09-09
PyTorch Distributed: Experiences on Accelerating Data Parallel Training	2025-09-07
Scaling Laws for Neural Language Models	2025-09-07
Optimization Techniques for GPU Programming	2025-09-06
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation	2025-09-01
Triton: an intermediate language and compiler for tiled neural network computations	2025-08-31
How to Read a Computer Science Research Paper	2025-08-30
How to Read a Paper	2025-08-30
Analyzing Modern NVIDIA GPU cores	2025-08-29
PyTorch: An Imperative Style, High-Performance Deep Learning Library	2025-08-25
TensorFlow: A system for large-scale machine learning	2025-08-24
A Few Useful Things to Know About Machine Learning	2025-08-22
What Goes Around Comes Around… And Around…	2025-08-20
Kafka: a Distributed Messaging System for Log Processing	2023-02-26
Blockstack: A Global Naming and Storage System Secured by Blockchains	2022-08-13
Bitcoin: A Peer-to-Peer Electronic Cash System	2022-08-10
Secure Untrusted Data Repository (SUNDR)	2022-08-05
Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS	2022-08-04
Scaling Memcache at Facebook	2022-07-17
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing	2022-07-16
No compromises: distributed transactions with consistency, availability, and performance	2022-07-14
Spanner: Google’s Globally-Distributed Database	2022-07-11
Frangipani: A Scalable Distributed File System	2022-07-10
Chain Replication for Supporting High Throughput and Availability	2022-06-29
ZooKeeper: Wait-free coordination for Internet-scale systems	2022-06-27
In Search of an Understandable Consensus Algorithm (Extended Version)	2022-06-19
The Go Programming Language and Environment	2022-06-06
The Design of a Practical System for Fault-Tolerant Virtual Machines	2022-06-05
The Google File System	2022-06-03
MapReduce: Simplified Data Processing on Large Clusters	2022-05-28
The Evolution of the Unix Time-sharing System	2022-05-25
The UNIX Time-Sharing System	2022-05-24
RCU Usage In the Linux Kernel: One Decade Later	2022-05-12
Meltdown: Reading Kernel Memory from User Space	2022-05-11
Eliminating Receive Livelock in an Interrupt-driven Kernel	2022-05-08
The benefits and costs of writing a POSIX kernel in a high-level language	2022-05-07
Dune: Safe User-level Access to Privileged CPU Features	2022-05-03
The Performance of micro-Kernel-Based Systems	2022-05-02
Virtual Memory Primitives for User Programs	2022-04-30
Journaling the Linux ext2fs Filesystem	2022-04-24

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
pdfs		pdfs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Papers

About

Uh oh!

Sponsor this project

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Papers

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Contributors

Uh oh!