Consider the vectorized and non-vectorized versions of the same hierarchical regression model. The main differences are that the vectorized version:
- uses
iarange instead of irange
- uses a single
observe from a 1 x batch_size dimensional distribution instead of batch_size separate observes from 1 dimensional distributions
- optimizes
1 x k dimensional params for a single intercepts sample site instead of k separate 1 dimensional params for k sample sites
Intuitively, I'd expect these to converge to the same loss; instead the vectorized version converges to ~1900 and the non-vectorized version converges to ~600. This same model written in webppl also converges to about ~600, so this might indicate an issue with scaling in the vectorized version?
Incidentally, the mean-field guide sigmas also converge to different values in the vectorized version although the mu point estimates are the same. The unvectorized version matches webppl but the vectorized version has much higher certainty.
It's of course very plausible that there's a bug in my implementation of the vectorized model!
Consider the vectorized and non-vectorized versions of the same hierarchical regression model. The main differences are that the vectorized version:
iarangeinstead ofirangeobservefrom a1 x batch_sizedimensional distribution instead ofbatch_sizeseparate observes from 1 dimensional distributions1 x kdimensional params for a singleinterceptssample site instead of k separate 1 dimensional params for k sample sitesIntuitively, I'd expect these to converge to the same loss; instead the vectorized version converges to ~1900 and the non-vectorized version converges to ~600. This same model written in webppl also converges to about ~600, so this might indicate an issue with scaling in the vectorized version?
Incidentally, the mean-field guide
sigmas also converge to different values in the vectorized version although themupoint estimates are the same. The unvectorized version matches webppl but the vectorized version has much higher certainty.It's of course very plausible that there's a bug in my implementation of the vectorized model!