Skip to content

Sorosliu1029/Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 

Repository files navigation

Papers

Name Finish Date
DeepSpeed: Extreme-scale model training for everyone 2026-03-05
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
2026-02-17
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks 2026-02-10
In-Datacenter Performance Analysis of a Tensor Processing Unit 2026-01-27
TABLA: A unified template-based framework for accelerating statistical machine learning 2026-01-23
Efficiently compiling efficient query plans for modern hardware 2025-12-08
Encapsulation of parallelism in the Volcano query processing system 2025-12-08
Parallel Database Systems: The Future of High Performance Database Processing 2025-12-08
The Case for Learned Index Structures 2025-12-08
C-store: a column-oriented DBMS 2025-12-08
Vectorwise: Beyond Column Stores 2025-12-07
R-trees: a dynamic index structure for spatial searching 2025-12-07
The Bw-Tree: A B-tree for new hardware platforms 2025-12-04
The Snowflake Elastic Data Warehouse 2025-11-19
A comparison of approaches to large-scale data analysis 2025-11-17
SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference 2025-11-12
Flame: Simplifying Topology Extension in Federated Learning 2025-11-11
Bigtable: A Distributed Storage System for Structured Data 2025-11-11
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models 2025-11-09
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training 2025-11-09
Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving 2025-11-07
SqueezeLLM: Dense-and-Sparse Quantization 2025-11-05
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks 2025-11-04
Fast Inference from Transformers via Speculative Decoding 2025-10-28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness 2025-10-28
Cartridges: Lightweight and general-purpose long context representations via self-study 2025-10-26
UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency 2025-10-25
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations 2025-10-24
A Berkeley View of Systems Challenges for AI 2025-10-14
Hidden Technical Debt in Machine Learning Systems 2025-10-14
DeepSeek-V3 Technical Report 2025-10-12
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving 2025-10-06
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve 2025-10-05
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes 2025-10-04
Orca: A Distributed Serving System for Transformer-Based Generative Models 2025-10-03
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling 2025-10-03
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads 2025-09-29
TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters 2025-09-28
MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters 2025-09-28
An Empirical Evaluation of Columnar Storage Formats 2025-09-27
A variable warp size architecture 2025-09-27
Gandiva: Introspective Cluster Scheduling for Deep Learning 2025-09-23
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads 2025-09-21
INFaaS: Automated Model-less Inference Serving 2025-09-21
Scalable GPU graph traversal 2025-09-21
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives 2025-09-16
Clipper: A Low-Latency Online Prediction Serving System 2025-09-16
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models 2025-09-14
Accelerating Large Graph Algorithms on the GPU Using CUDA 2025-09-14
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning 2025-09-14
ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving 2025-09-12
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM 2025-09-09
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models 2025-09-09
PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2025-09-07
Scaling Laws for Neural Language Models 2025-09-07
Optimization Techniques for GPU Programming 2025-09-06
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation 2025-09-01
Triton: an intermediate language and compiler for tiled neural network computations 2025-08-31
How to Read a Computer Science Research Paper 2025-08-30
How to Read a Paper 2025-08-30
Analyzing Modern NVIDIA GPU cores 2025-08-29
PyTorch: An Imperative Style, High-Performance Deep Learning Library 2025-08-25
TensorFlow: A system for large-scale machine learning 2025-08-24
A Few Useful Things to Know About Machine Learning 2025-08-22
What Goes Around Comes Around… And Around… 2025-08-20
Kafka: a Distributed Messaging System for Log Processing 2023-02-26
Blockstack: A Global Naming and Storage System Secured by Blockchains 2022-08-13
Bitcoin: A Peer-to-Peer Electronic Cash System 2022-08-10
Secure Untrusted Data Repository (SUNDR) 2022-08-05
Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS 2022-08-04
Scaling Memcache at Facebook 2022-07-17
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 2022-07-16
No compromises: distributed transactions with consistency, availability, and performance 2022-07-14
Spanner: Google’s Globally-Distributed Database 2022-07-11
Frangipani: A Scalable Distributed File System 2022-07-10
Chain Replication for Supporting High Throughput and Availability 2022-06-29
ZooKeeper: Wait-free coordination for Internet-scale systems 2022-06-27
In Search of an Understandable Consensus Algorithm (Extended Version) 2022-06-19
The Go Programming Language and Environment 2022-06-06
The Design of a Practical System for Fault-Tolerant Virtual Machines 2022-06-05
The Google File System 2022-06-03
MapReduce: Simplified Data Processing on Large Clusters 2022-05-28
The Evolution of the Unix Time-sharing System 2022-05-25
The UNIX Time-Sharing System 2022-05-24
RCU Usage In the Linux Kernel: One Decade Later 2022-05-12
Meltdown: Reading Kernel Memory from User Space 2022-05-11
Eliminating Receive Livelock in an Interrupt-driven Kernel 2022-05-08
The benefits and costs of writing a POSIX kernel in a high-level language 2022-05-07
Dune: Safe User-level Access to Privileged CPU Features 2022-05-03
The Performance of micro-Kernel-Based Systems 2022-05-02
Virtual Memory Primitives for User Programs 2022-04-30
Journaling the Linux ext2fs Filesystem 2022-04-24