Calc Consulting
12.7K posts
Calculation Consulting is a boutique consultancy that specializes in machine learning, AI, and data science
- Where did I come up with the idea for weightwatcher ? I used to be a quant at Blackrock. And as a quant, my job was to find signal in the noise.
- Replying to @RealCandaceOThe NYC Hospital system is on the verge of collapse. This is NOT THE FLU.
- I was not expecting this from DeepSeek R1 The first 128 layers with weightwatcher Wow...many overfit layers! More to come...
- strongly recommended "Statistical physics, Bayesian inference and neural information processing"
- Statistical Mechanics tells us far more about Neural Networks than Statistical Learning Theory. In this blog post, I describe some work I am doing on a new approach to an old problem--how to derive the generalization capacity of a DNN calculatedcontent.com/2019/12/03/towโฆ
- The Muon optimizer moves in exactly the opposite of the weightwatcher approach. Looking at Moonlight-16B-A3b half the layers are 'underfit' according the weightwatcher HTSR alpha layer quality metric Of course, this is what they are trying to do, as they are trying to create
- Llama3-70b screams compress me! Why are so many large-scale, high-performing models filled with undertrained layers ? Turns out, smaller models require more epochs, and its cheaper and faster to train a really big model for a few epochs and to train the smaller one for more.
- Looking to train your own LLM from scratch ?ย Look no further. ChuXin is a 1.6B LLM trained on 2T tokens!ย And they have open-sourced everything. The model weights, architecture, training data, training process, etc. technical report: arxiv.org/abs/2405.04828
- WeightWatcher is an open-source diagnostic tool for analyzing deep neural networks. This past year, it was featured in Nature Communications, one of the top scientific journals in the world.
- Early stopping? How-a-about early freezing? When training a DNN, the weightwatcher tool can identify which layers are well trained and can be frozen, and which ones you should keep training. Stay tuned for this master's thesis, which shows this off.











