yiakwy-xpu-ml-framework-team

Follow

💭

I may be slow to respond.

Yiakwy yiakwy-xpu-ml-framework-team

💭

I may be slow to respond.

Follow

Hi I am LEI WANG. AI / LLM Architect, previously working in Graphcore IPU compiler team.

39 followers · 79 following

independent contributor @ HPC Users Alliance
United States
13:19 (UTC -12:00)
https://yiakwy.github.io/
in/lei-wang-1722a28a
@yiakwy2023
https://mp.weixin.qq.com/s/AVujFosiC15ZmSRvByYcRQ
https://mp.weixin.qq.com/s/13NKhY3GccjU9Emz-cRSHQ

Achievements

Achievements

Highlights

Pro

yiakwy-xpu-ml-framework-team/README.md

👋 Hi, I’m @yiakwy-xpu-ml-framework-team
👀 I’m interested in accelerating the word through algorithms, chips and intelligence. (compiler/transpiler, c++ ops development/optimization for critical path of overall performance and python bindings for HPC application.)
🌱 I’m currently working on core framework infrastracture and AI compilier technologies.
📫 Please drop me a message through yiak.wy@gmail.com

Popular repositories Loading

flash-float-jit-kernels flash-float-jit-kernels Public

Cuda 13 3
HPC-2025 HPC-2025 Public

9 5
NV_grouped_gemm NV_grouped_gemm Public

Forked from fanshiqing/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM for MoE.

Cuda 7
Toolkit-remote-pdb-for-pytorch-distributed Toolkit-remote-pdb-for-pytorch-distributed Public

Debugging torch distributed program

Python 7
AMD-CDNA3-ASM AMD-CDNA3-ASM Public

C++ 3
GC-OXFORD-CVPR2021-gbp-poplar GC-OXFORD-CVPR2021-gbp-poplar Public

Forked from joeaortiz/gbp-poplar

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

C++ 2