Transformer in C (from scratch)
Finally took the time to complete this, the penultimate step for the project. Writing this was easier than I thought, added AddNorm and FFN layers as well.
A low hanging fruit would be to parallelize the attention mechanism (model parallelism)
astle dsa
4,810 posts
Living in complexity, formalism, mathematics and computer science
Joined March 2022
- From scratch projects are underrated. Here's a list of things that I want to build from scratch (also making lists are fun): - Operating System/Kernel - Compiler/Interpreter - JS Framework - Database - Vector Database - Deep Learning Framework - Git - LLM (can't run it ofc)
- Nearly done with my deep learning framework in C. I have got: - A matrix library - Autograd engine - Batch, Layer and Model abstractions (?blocks) - Parallel/Concurrent Model Training (Data Parallelism) After cleaning up, i'll be getting close to my final goal with this project
- Implemented a numpy + autograd engine in C, and trained a simple MLP which learned the inverse function of a matrix. Fun, but much more left to do (which includes cleaning up).
- Replying to @izsI had read somewhere that Tolkien first created a language itself, and than followed it up with the novels ?
- While talking to an embedding systems engineer on C being an unsafe language, he mentioned that the way they avoid memory leaks is by simply not using malloc/alloc, which means their code to entirely deterministic at compile time. A reason why C cannot be replaced in embedded sys
- Got it to 1.536kB. Turns out I was allocating around 2 million % more memory than needed. Had to really hack around to getting from ~300mb->~7mb->1536B. The total overhead memory usage was finally 0% (3M%->400K%->0%). Tracking memory was fun tbhLearned a lesson in memory management. My previous code for training a single MLP in my autograd engine in C was taking around ~300Mb of memory. Had to optimise a lot and finally managed to bring it down to ~6Mb. Figuring out how to bring it down further
- Replying to @ludwigABAPThe problem is that it only extends to the very basic symbols. A little more abstraction and it's pandemonium
- Replying to @zmkzmkzI did something like this (I think ?) but mathematically:
- Replying to @mu_chrinovicDue to ML frameworks, it's steered away from the "math intensive" path and requires the very basics to understand Here I'm talking about surface level understanding ofc Also LLMs just output good well known models in zero shot so 🤷
- Learned a lesson in memory management. My previous code for training a single MLP in my autograd engine in C was taking around ~300Mb of memory. Had to optimise a lot and finally managed to bring it down to ~6Mb. Figuring out how to bring it down further
- Replying to @JoshuaLelonBut that's a very GenZ thing I feel (maybe millennial) due to advent of mobiles and facetime Otherwise I think all the defining moments in my previous generations, at least where I live, happened during broad daylight









