saifmb0

Follow

Saif saifmb0

Follow

Efficient LLMs

9 followers · 12 following

Al Ain University
Abu Dhabi
21:19 (UTC +04:00)
saifmb.com

Highlights

Pro

saifmb0/README.md

Saif here 👋

I'm a 3rd year undergraduate focusing on efficient LLMs and ML Systems. Research Assistant @ Al Ain University.

Reachout at contact@saifmb.com, or see my Resume

arXiv

S. Abdellatif and A. Almasri, 2026. Dispatch-Aware Ragged Attention for Pruned Vision Transformers
S. Abdellatif, 2026. "Acceptance Dynamics in Speculative Decoding Across Cognitive Domains.

Under review

S. Abdellatif, 2026. QAttention: Tree-Sparse Attention and Acceptance Decay in Speculative Decoding.

Pinned Loading

QAttention QAttention Public

Query guided O(N*D) linear tree attention. Up to 8x and 17x faster over sglang's FlashInfer and DeFT respectively (680 nodes). Under review

Python
pytorch/pytorch pytorch/pytorch Public

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 101k 28k
sparse-vits sparse-vits Public

Tuned FlashAttention-2 kernel for short sequence pruned Vision Transformers. Up to 2.12x faster over FA2, plug and play for 12% throughput at small batch sizes (1-4) and high sparsity (50%+)

TeX
reelwise reelwise Public

GraphQL based SEO For Instagram reels. Equipped with Turing-optimized inference pipeline including custom FusedAttention, TensorRT exports, Quantization, and VAD gating.

Python