So, neural networks are REALLY robust to errors, as it turns out. Like, I just discovered a bug in my kernel caching logic, where I was computing matmul with completely wrong strides, and it STILL LEARNED.
Like... WHAT WAS COMPUTED HERE IS NOT A DERIVATIVE BY ANY MEANS