Support for adjacency matrices in autoregressive models#54
Support for adjacency matrices in autoregressive models#54francois-rozet merged 24 commits intoprobabilists:masterfrom
Conversation
|
Hello @adrianjav, thank you for this PR. I quickly went through the modifications which looked good to me. I'll do a proper review this week. I just have a question: Enabling to construct a |
|
Hi @francois-rozet, thanks for taking a look already to the PR!
While we chose to have a single flow layer, we consider these types of architectures in our article (Table 1), and the idea is that you have a finer access to the way the information flows through the network. I implemented this way so that it was direct to use the adjacency matrix on all the derivatives flows (i.e., MAF, NSF, ...) without having to add any more code, but I agree that it can be seen as rather arbitrary. I would propose two alternatives:
What do you think? |
|
Hello @adrianjav, The current philosophy of Zuko regarding pre-built flows (MAF, NSF, NAF, ...) is to make them very simple and avoid overloading them with many features. We don't want to confuse beginners. If the user needs more control, they are expected to build their own custom architecture using lower-level components such as transformations and distributions. In this case, we would expect the user to instantiate the list of flow = zuko.flows.Flow(
transform=MaskedAutoregressiveTransform(
features=5,
context=8,
adjacency=adjacency, # 5x5 matrix
),
base=base,
)Therefore, I think that this PR should only modify the
|
|
Hi, Yeah, I fully understand and respect that philosophy. I will revert the changes I made to MAF and keep them local to
I will add the necessary checks for the adjacency matrix. PS: By the way, what is the policy with respect to adding dependencies? I am thinking that using https://networkx.org/ for the checks would make the code much simpler and reliable. |
This reverts commit 04018ee686e1a5001c21a603b787af5b54153678.
|
I have added the necessary changes and reverted those involving MAF. As a summary, now it checks whether the matrix has cycles, and infers the order (for the string representation) as well as the number of passes (diameter of the graph). There is no need to check invertibility, since we assume ones in the diagonal and it has no cycles. For the algorithm, I have simplified the code from the networkx package to deal with our specific case. Let me know if I missed something. |
francois-rozet
left a comment
There was a problem hiding this comment.
Hello @adrianjav, thank you for implementing the adjacency checks. This is surprisingly simple (and efficient)! I read the PR carefully and left a few comments.
zuko/flows/autoregressive.py
Outdated
| order: LongTensor = None, | ||
| univariate: Callable[..., Transform] = MonotonicAffineTransform, | ||
| shapes: Sequence[Size] = ((), ()), | ||
| adjacency: BoolTensor = None, |
There was a problem hiding this comment.
| order: LongTensor = None, | |
| univariate: Callable[..., Transform] = MonotonicAffineTransform, | |
| shapes: Sequence[Size] = ((), ()), | |
| adjacency: BoolTensor = None, | |
| order: LongTensor = None, | |
| adjacency: BoolTensor = None, | |
| univariate: Callable[..., Transform] = MonotonicAffineTransform, | |
| shapes: Sequence[Size] = ((), ()), |
Move adjacency just below order. (also in the docstring)
zuko/flows/autoregressive.py
Outdated
| assert (order is None) or ( | ||
| adjacency is None | ||
| ), "Parameters `order` and `adjacency_matrix` are mutually exclusive." | ||
| assert (passes == features) or ( | ||
| adjacency is None | ||
| ), "When using `adjacency_matrix`, `passes` has to be the number of features." |
There was a problem hiding this comment.
Does not match the docstring.
There was a problem hiding this comment.
Could you expand on this? I am not sure I understand what you mean
There was a problem hiding this comment.
In the docstring, it is said that passes and order are ignored when adjacency is provided. This is not the case here.
There was a problem hiding this comment.
Oh yes, I see now, thanks!
zuko/flows/autoregressive.py
Outdated
| # Hyper network | ||
| self.hyper = MaskedMLP(adjacency, **kwargs) | ||
|
|
||
| def _check_adjacency(self, adjacency: BoolTensor) -> None: |
There was a problem hiding this comment.
Could you add a small description of what this function does? (e.g. checks that the graph is acyclic and returns the diameter + link to networkx).
zuko/flows/autoregressive.py
Outdated
| self.passes = len(all_generations) # Graph diameter | ||
| self.order = torch.tensor(reduce(list.__add__, all_generations)) |
There was a problem hiding this comment.
This function should return the diameter instead of modifying it in-place.
Also, currently, the order argument is a topological order, but the attribute self.order is not. It represents the "computation" order, or in this code the "generation" of each feature. So I think we should not assign the topological order to self.order, even though it is an interesting quantity.
I think simply setting self.order = None when adjacency is provided is good.
There was a problem hiding this comment.
The adjacency matrix is describing the DAG that the data-generating process follows, so any of its topological orders are indeed the generation order (i.e., if you group those variables that are generated simultaneously), isn't it? In the case of providing the ordering to the constructor, there is no grouping to be made (the adj. matrix is a full lower triangular matrix), so there is only one topological order. Or did I get something wrong?
There was a problem hiding this comment.
I think you misunderstood my comment. Currently, the attribute self.order is not a topological order so you should not assign a topological order to it.
For example, when passes=2 and features=5, self.order is [0,0,0,1,1], meaning that the 3 first features can be computed together (they are in the same generation).
There was a problem hiding this comment.
Yes, I did misunderstand it, my bad.
If that's the case, we could also compute the order in a pythonic way by just checking each variable in which generation level is:
self.order = torch.tensor(
[i for j in range(adjacency.size(0)) for i, x in enumerate(all_generations) if j in x]
)But of course it is your call, I have no strong opinions on this matter.
There was a problem hiding this comment.
I would prefer setting self.order to None. It is not used in the class anyway.
zuko/flows/autoregressive.py
Outdated
| adjacency = out_order[:, None] > in_order | ||
| else: | ||
| order = torch.as_tensor(order) | ||
| adjacency.mul_( # Remove the diagonal (if it is non-zero) |
There was a problem hiding this comment.
The class should raise an error is the diagonal is non-zero, not silently "fix" it.
There was a problem hiding this comment.
I opted for doing it silently since "in reality" the diagonal is non-zero (because of the transformer), and it needs to be made zero for the conditioner. I figured it out that fixing it would be less confusing, but I can also just raise an error.
There was a problem hiding this comment.
I see what you mean, but then we should impose that the matrix has ones on the diagonal, and raise a error if it has not.
The reason why I prefer raising an error is that if a user makes a mistake (e.g. they permute the rows but they forget to permute the columns), they might miss it if we "fix" it silently.
There was a problem hiding this comment.
That is actually a pretty good reason to checking it. I'll raise an error then 👍🏼
zuko/flows/autoregressive.py
Outdated
| del indegree_map[child] | ||
| all_generations.append(this_generation) | ||
|
|
||
| assert not indegree_map, "The adjacency matrix contains cycles" |
There was a problem hiding this comment.
| assert not indegree_map, "The adjacency matrix contains cycles" | |
| assert all(d == 0 for d in indegree), "The graph contains cycles." |
zuko/flows/autoregressive.py
Outdated
| def extra_repr(self) -> str: | ||
| base = self.univariate(*map(torch.randn, self.shapes)) | ||
| order = self.order.tolist() | ||
| order = self.order |
There was a problem hiding this comment.
| order = self.order |
zuko/flows/autoregressive.py
Outdated
| adjacency: Adjacency matrix that all layers of the flow should follow. | ||
| If different from :py:`None`, then `randperm` should be :py:`False`. |
There was a problem hiding this comment.
Remove adjacency in MAF.
zuko/flows/autoregressive.py
Outdated
| return '\n'.join([ | ||
| f'(base): {base}', | ||
| f'(order): {order}', | ||
| f'(adjacency): {adjacency}', |
There was a problem hiding this comment.
I think we should either show order or adjacency, not both.
|
Thanks a lot for the review. I think all changes are more or less clear and I will start implementing them. I actually would appreciate if, after the next changes, you could put the final touches (e.g., let the PS: I am going on vacation tomorrow for a week, so I will be a bit less active. |
|
Sure, I'll finish the PR! Tell me when I can take over. |
|
I just made most of the changes suggested in the review. As I said, I'd be grateful if you can take over and implement the string representation at your liking. Another comment, with regards to the diagonal of |
|
Hello @adrianjav, I have made a few modifications to For the I also really liked the way you added columns to the adjacency for the context and duplicated rows for the output so I made this part common for both If everything is good for you, I am happy to merge the PR! |
|
In the end I have reverted to ones on the diagonal. As you said, I think it is easier to understand that I will now merge this PR. Thank you for this contribution! |
|
Glad to contribute! |
This is an implementation of #53, in which I add support to provide an adjacency matrix to the autoregressive models, instead of using the ordering between variables.
I have added some asserts to make sure I cover all corner cases (e.g., providing both an ordering and an adjacency matrix), and added tests to make sure that the zero-entries of the adjacency matrix are indeed zero when computing the Jacobian matrix.
I might have missed something, and I am happy to fix it if that's the case.
PS: I read the policy about commit messages a bit too late, we can merge all commits into a single one and write an appropiate commit message.