She dumped me last night.
Not because I don't listen.
Not because I'm always on my phone.
Not even because I forgot our anniversary (twice).
But because,
in her exact words:
"You only pay attention to the parts of what I say that you think are important."
I stared at her
anshuman
10.3K posts
maximizing shareholder value; mostly ml here;
Joined February 2020
- You're in a ML Engineer interview at Perplexity, and the interviewer asks: "Your RAG system is hallucinating in production. How do you diagnose what's broken - the retriever or the generator?" Here's how you can answer:
- You're in an ML inference engineer interview at Anthropic, and the interviewer asks: "Can you explain speculative decoding and why we'd want to use it?" Here's how you can answer:
- She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her
- "Just use OpenAI API" Until you need: - Custom fine-tuned models - <50ms p99 latency - $0.001/1K tokens (not $1.25/1K input) Then you build your own inference platform. Here's how to do that:
- I rejected a job offer yesterday. Not because of the salary. Not because of the tech stack. Not even because of the long hours they warned me about. But because, when I asked how they evaluate their AI systems, the hiring manager said: "We just ask it some questions and
- You're in a ML Engineer interview at Groq, and the interviewer asks: "How do you measure LLM inference performance? What metrics matter most for production systems?" Here's how you can answer
- career update: joined zomato as Machine Learning Engineer 2
- Techniques I’d master if I wanted to make LLMs faster + cheaper. 1. Quantization 2. KV-Cache Quantization 3. Flash Attention 4. Speculative Decoding 5. LoRA 6. Pruning 7. Knowledge Distillation 8. Weight Sharing 9. Sparse Attention 10. Batching & Dynamic Batching 11. Model
- You’re in a AI Engineer interview at Microsoft, and the interviewer asks: ‘Our team needs to build RAG over 10M documents. Which vector database and why?’ Here’s how you answer:
- software Engineers have a runway of 5 years left
- You're in a ML Engineer interview at Anthropic, and the interviewer asks: "Your LLM inference is running out of GPU memory with long conversations. How do you fix this?" Here's how you answer:
- ML concepts every data scientist should know for interviews: Bookmark this. 1. Bias-Variance Tradeoff 2. Cross-Validation Strategies 3. Regularization (L1, L2, Elastic Net) 4. Class Imbalance & Sampling Techniques 5. Feature Engineering & Selection 6. Overfitting vs



