-
-
Notifications
You must be signed in to change notification settings - Fork 198
Replace simplex transform with one based on the ILR transform #3171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 3e0ce29.
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
|
A quick validation: Using the Code and data
functions {
real multi_logit_normal_cholesky_lpdf(vector theta, vector mu,
matrix L_Sigma) {
int N = rows(theta);
vector[N] log_theta = log(theta);
return multi_normal_cholesky_lpdf(log_theta[1 : N - 1] - log_theta[N] | mu, L_Sigma)
- sum(log_theta);
}
}
data {
int<lower=1> N;
vector[N - 1] mu;
matrix[N - 1, N - 1] L_Sigma;
}
parameters {
simplex[N] x;
}
model {
target += multi_logit_normal_cholesky_lpdf(x | mu, L_Sigma);
}{
"N": 5,
"mu": [-0.0104, -0.006, -0.0062, -0.0081],
"L_Sigma": [
[20.8371, 0.0, 0.0, 0.0],
[65.0306, 58.6011, 0.0, 0.0],
[-63.5698, 25.1499, 30.1392, 0.0],
[32.0142, 22.2009, 69.8059, 21.1121]
]
}
Here are the stansummarys of the current transform ( And this PR ( |
|
@WardBrian there's a bug in the logic for the sampler divergence output calc. It says 154% above |
|
Yeah, it looks like the denominator never changes as you add chains. I’ll take a look up in the cmdstan repo |
SteveBronder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good with a few small suggestions
| for (int i = 0; i <= N; ++i) { | ||
| z.coeffRef(i) = exp(z.coeff(i) - max_val) / d; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (int i = 0; i <= N; ++i) { | |
| z.coeffRef(i) = exp(z.coeff(i) - max_val) / d; | |
| } | |
| z.array() = (z.array() - max_val).exp() / d; |
| const auto& x_val = to_ref(arena_x.val_op()); | ||
| const auto& x_adj = to_ref(arena_x.adj_op()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arena_x should not be an expression so you don't need to_ref here
| const auto& x_val = to_ref(arena_x.val_op()); | |
| const auto& x_adj = to_ref(arena_x.adj_op()); | |
| auto&& x_val = arena_x.val_op(); | |
| auto&& x_adj = arena_x.adj_op(); |
|
|
||
| for (Eigen::Index i = 0; i < M; ++i) { | ||
| // backprop for softmax | ||
| Eigen::VectorXd x_pre_softmax_adj | ||
| = -x_val.col(i) * x_adj.col(i).dot(x_val.col(i)) | ||
| + x_val.col(i).cwiseProduct(x_adj.col(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just make the memory x_pre_softmax_adj once
| for (Eigen::Index i = 0; i < M; ++i) { | |
| // backprop for softmax | |
| Eigen::VectorXd x_pre_softmax_adj | |
| = -x_val.col(i) * x_adj.col(i).dot(x_val.col(i)) | |
| + x_val.col(i).cwiseProduct(x_adj.col(i)); | |
| Eigen::VectorXd x_pre_softmax_adj(x_val.rows()); | |
| for (Eigen::Index i = 0; i < M; ++i) { | |
| // backprop for softmax | |
| x_pre_softmax_adj.noalias() | |
| = -x_val.col(i) * x_adj.col(i).dot(x_val.col(i)) | |
| + x_val.col(i).cwiseProduct(x_adj.col(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also change sum_to_zero_vector_backprop to just take in a matrix
| */ | ||
| template <typename T> | ||
| void sum_to_zero_vector_backprop(T&& y_adj, const Eigen::VectorXd& z_adj) { | ||
| const auto N = y_adj.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave a small comment so people know z_adj is y_adj.size() + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added @params to the docstring and put the sizes there
| += arena_z_adj * arena_z.coeff(k) * (1.0 - arena_z.coeff(k)); | ||
| } | ||
| reverse_pass_callback([arena_y, arena_x]() mutable { | ||
| const auto& res_val = to_ref(arena_x.val()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need to_ref on any of the arena_x vals
| const auto& res_val = to_ref(arena_x.val()); | ||
|
|
||
| // backprop for log jacobian contribution to log density | ||
| arena_x.adj().array() += lp.adj() / res_val.array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing, why are we accumulating into the return values adjoints here?
EDIT: nvm I see this is the rhs of the jacobian adjustment. Though I can't think of anywhere else we change the outputs adjoints in the reverse pass of it's function. I need to think about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid it by making a copy of the adjoints if we want, though I'm pretty sure that nothing else in the AD graph can depend on the adjoints of this by this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use auto lp_adj = so then it's just an expression and you don't need any memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean
auto lp_adj = lp.adj() / res_val.array();
then in the below you can do res_val * (arena_x.adj() + lp_adj).dot(rev_val)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arena_x.adj() is used two places, so you'd be duplicating the work, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved, the code no longer needs to modify the adjoint of the return
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
|
Is this waiting for my review? I can do it next week, otherwise what else is needed? |
|
@spinkney this is waiting on final review from @SteveBronder and then us deciding if we’re okay with not preserving the existing code, otherwise it should be good |
|
Ok, I still think it's worth another issue on saving different transforms and creating how a user would specify that. For this PR I vote that we can just remove the existing code. |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Summary
Closes #3170.
This updates the simplex transform and the stochastic matrices to use the ILR transform rather than stick breaking.
The stochastic matrices delegate more to the simplex functions for now, they could be rewritten for efficiency later.
Q: This completely replaces the existing functions, but we may want to preserve them under a different name. If we're fine replacing them, this also closes #3168
Tests
Existing simplex tests pass, and I added one more to test some edge behavior
Side Effects
Release notes
The simplex transform is now defined in terms of the isometric log ratio transform instead of stick breaking.
Checklist
Copyright holder: Simons Foundation
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit)make test-headers)make test-math-dependencies)make doxygen)make cpplint)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested