Skip to content

Support for adjacency matrices in autoregressive models#54

Merged
francois-rozet merged 24 commits intoprobabilists:masterfrom
adrianjav:adj-matrix
Jul 30, 2024
Merged

Support for adjacency matrices in autoregressive models#54
francois-rozet merged 24 commits intoprobabilists:masterfrom
adrianjav:adj-matrix

Conversation

@adrianjav
Copy link
Copy Markdown
Contributor

@adrianjav adrianjav commented Jul 19, 2024

This is an implementation of #53, in which I add support to provide an adjacency matrix to the autoregressive models, instead of using the ordering between variables.

I have added some asserts to make sure I cover all corner cases (e.g., providing both an ordering and an adjacency matrix), and added tests to make sure that the zero-entries of the adjacency matrix are indeed zero when computing the Jacobian matrix.

I might have missed something, and I am happy to fix it if that's the case.

PS: I read the policy about commit messages a bit too late, we can merge all commits into a single one and write an appropiate commit message.

@francois-rozet
Copy link
Copy Markdown
Member

francois-rozet commented Jul 22, 2024

Hello @adrianjav, thank you for this PR. I quickly went through the modifications which looked good to me. I'll do a proper review this week. I just have a question:

Enabling to construct a MaskedAutoregressiveTransform directly from an adjacency matrix makes perfect sense. However, I am not sure if a series of masked autoregressive transformations (i.e. a MAF) with the same adjacency matrix makes sense. Could you explain in what case having several transformations with the same adjacency would be used?

@adrianjav
Copy link
Copy Markdown
Contributor Author

Hi @francois-rozet, thanks for taking a look already to the PR!

Could you explain in what case having several transformations with the same adjacency would be used?

While we chose to have a single flow layer, we consider these types of architectures in our article (Table 1), and the idea is that you have a finer access to the way the information flows through the network.

I implemented this way so that it was direct to use the adjacency matrix on all the derivatives flows (i.e., MAF, NSF, ...) without having to add any more code, but I agree that it can be seen as rather arbitrary. I would propose two alternatives:

  1. Add a check so that if the adjacency matrix is used, then the number of layers needs to be 1.
  2. Allow to specify either an adjacency matrix or a list of them (one per layer), so that the user has even finer control on the way the flows are structured.

What do you think?

@francois-rozet
Copy link
Copy Markdown
Member

francois-rozet commented Jul 24, 2024

Hello @adrianjav,

The current philosophy of Zuko regarding pre-built flows (MAF, NSF, NAF, ...) is to make them very simple and avoid overloading them with many features. We don't want to confuse beginners. If the user needs more control, they are expected to build their own custom architecture using lower-level components such as transformations and distributions.

In this case, we would expect the user to instantiate the list of MaskedAutoregressiveTransform themself given the adjacency matrices and then combine the transformations as a Flow. Note that this would be necessary anyway to build a generative (instead of abductive) flow. For example, for a single-layer abductive flow

flow = zuko.flows.Flow(
    transform=MaskedAutoregressiveTransform(
        features=5,
        context=8,
        adjacency=adjacency,  # 5x5 matrix
    ),
    base=base,
)

Therefore, I think that this PR should only modify the MaskedAutoregressiveTransform class. In addition,

  • There are constraints on the adjacency $A$ for the transformation to be invertible. I think checking that the adjacency satisfies these constraints would be great. For example, $A$ should be lower-triangular (with zeros on the diagonal) with respect to the variable ordering.

  • The number of "passes" for the inverse transformation can be inferred from the adjacency $A$. If there is a computationally efficient way to do it, it would also be a nice addition.

@adrianjav
Copy link
Copy Markdown
Contributor Author

adrianjav commented Jul 26, 2024

Hi,

Yeah, I fully understand and respect that philosophy. I will revert the changes I made to MAF and keep them local to MaskedAutoregressiveTransform. As for the other two points you suggested:

  • The matrix $A + I$ should be invertible, not $A$, right? (if we assume a zeroed diagonal).
  • If I understand correctly, the number of passess is just the diameter of the graph defined by $A$, right? I can add code to compute that, but it only saves on unecessary computations, isn't it?

I will add the necessary checks for the adjacency matrix.

PS: By the way, what is the policy with respect to adding dependencies? I am thinking that using https://networkx.org/ for the checks would make the code much simpler and reliable.

@adrianjav
Copy link
Copy Markdown
Contributor Author

I have added the necessary changes and reverted those involving MAF.

As a summary, now it checks whether the matrix has cycles, and infers the order (for the string representation) as well as the number of passes (diameter of the graph). There is no need to check invertibility, since we assume ones in the diagonal and it has no cycles.

For the algorithm, I have simplified the code from the networkx package to deal with our specific case.

Let me know if I missed something.

Copy link
Copy Markdown
Member

@francois-rozet francois-rozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adrianjav, thank you for implementing the adjacency checks. This is surprisingly simple (and efficient)! I read the PR carefully and left a few comments.

Comment on lines +92 to +95
order: LongTensor = None,
univariate: Callable[..., Transform] = MonotonicAffineTransform,
shapes: Sequence[Size] = ((), ()),
adjacency: BoolTensor = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
order: LongTensor = None,
univariate: Callable[..., Transform] = MonotonicAffineTransform,
shapes: Sequence[Size] = ((), ()),
adjacency: BoolTensor = None,
order: LongTensor = None,
adjacency: BoolTensor = None,
univariate: Callable[..., Transform] = MonotonicAffineTransform,
shapes: Sequence[Size] = ((), ()),

Move adjacency just below order. (also in the docstring)

Comment on lines +112 to +117
assert (order is None) or (
adjacency is None
), "Parameters `order` and `adjacency_matrix` are mutually exclusive."
assert (passes == features) or (
adjacency is None
), "When using `adjacency_matrix`, `passes` has to be the number of features."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not match the docstring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on this? I am not sure I understand what you mean

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstring, it is said that passes and order are ignored when adjacency is provided. This is not the case here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I see now, thanks!

# Hyper network
self.hyper = MaskedMLP(adjacency, **kwargs)

def _check_adjacency(self, adjacency: BoolTensor) -> None:
Copy link
Copy Markdown
Member

@francois-rozet francois-rozet Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a small description of what this function does? (e.g. checks that the graph is acyclic and returns the diameter + link to networkx).

Comment on lines +171 to +172
self.passes = len(all_generations) # Graph diameter
self.order = torch.tensor(reduce(list.__add__, all_generations))
Copy link
Copy Markdown
Member

@francois-rozet francois-rozet Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should return the diameter instead of modifying it in-place.

Also, currently, the order argument is a topological order, but the attribute self.order is not. It represents the "computation" order, or in this code the "generation" of each feature. So I think we should not assign the topological order to self.order, even though it is an interesting quantity.

I think simply setting self.order = None when adjacency is provided is good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adjacency matrix is describing the DAG that the data-generating process follows, so any of its topological orders are indeed the generation order (i.e., if you group those variables that are generated simultaneously), isn't it? In the case of providing the ordering to the constructor, there is no grouping to be made (the adj. matrix is a full lower triangular matrix), so there is only one topological order. Or did I get something wrong?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood my comment. Currently, the attribute self.order is not a topological order so you should not assign a topological order to it.

For example, when passes=2 and features=5, self.order is [0,0,0,1,1], meaning that the 3 first features can be computed together (they are in the same generation).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did misunderstand it, my bad.
If that's the case, we could also compute the order in a pythonic way by just checking each variable in which generation level is:

self.order = torch.tensor(
    [i for j in range(adjacency.size(0)) for i, x in enumerate(all_generations) if j in x]
)

But of course it is your call, I have no strong opinions on this matter.

Copy link
Copy Markdown
Member

@francois-rozet francois-rozet Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer setting self.order to None. It is not used in the class anyway.

adjacency = out_order[:, None] > in_order
else:
order = torch.as_tensor(order)
adjacency.mul_( # Remove the diagonal (if it is non-zero)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class should raise an error is the diagonal is non-zero, not silently "fix" it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for doing it silently since "in reality" the diagonal is non-zero (because of the transformer), and it needs to be made zero for the conditioner. I figured it out that fixing it would be less confusing, but I can also just raise an error.

Copy link
Copy Markdown
Member

@francois-rozet francois-rozet Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but then we should impose that the matrix has ones on the diagonal, and raise a error if it has not.

The reason why I prefer raising an error is that if a user makes a mistake (e.g. they permute the rows but they forget to permute the columns), they might miss it if we "fix" it silently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a pretty good reason to checking it. I'll raise an error then 👍🏼

del indegree_map[child]
all_generations.append(this_generation)

assert not indegree_map, "The adjacency matrix contains cycles"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert not indegree_map, "The adjacency matrix contains cycles"
assert all(d == 0 for d in indegree), "The graph contains cycles."

def extra_repr(self) -> str:
base = self.univariate(*map(torch.randn, self.shapes))
order = self.order.tolist()
order = self.order
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
order = self.order

Comment on lines +221 to +222
adjacency: Adjacency matrix that all layers of the flow should follow.
If different from :py:`None`, then `randperm` should be :py:`False`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove adjacency in MAF.

return '\n'.join([
f'(base): {base}',
f'(order): {order}',
f'(adjacency): {adjacency}',
Copy link
Copy Markdown
Member

@francois-rozet francois-rozet Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either show order or adjacency, not both.

@adrianjav
Copy link
Copy Markdown
Contributor Author

Thanks a lot for the review. I think all changes are more or less clear and I will start implementing them. I actually would appreciate if, after the next changes, you could put the final touches (e.g., let the __repr__ function to your liking) as it may be quicker that way.

PS: I am going on vacation tomorrow for a week, so I will be a bit less active.

@francois-rozet
Copy link
Copy Markdown
Member

Sure, I'll finish the PR! Tell me when I can take over.

@adrianjav
Copy link
Copy Markdown
Contributor Author

adrianjav commented Jul 27, 2024

I just made most of the changes suggested in the review. As I said, I'd be grateful if you can take over and implement the string representation at your liking.

Another comment, with regards to the diagonal of adjacency. It should be zero for the work to code, but I think the reason you gave is a super valid one, so I have opted to force the user to fill it with ones, and then I fill with zeroes. Feel free to change it if you think it is better doing it another way.

@francois-rozet
Copy link
Copy Markdown
Member

francois-rozet commented Jul 29, 2024

Hello @adrianjav, I have made a few modifications to __init__, extra_repr and the tests. For the tests, I replaced the fixed adjacency matrices with random ones, which can help to detect more issues. For the representation, I chose to display the number of passes when the adjacency is provided.

For the __init__, I chose to revert to zeros on the diagonal as this is the convention in your paper and in Wehenkel et al. (2021). However, we still test that the diagonal is full of zeros, to prevent user mistakes.

I also really liked the way you added columns to the adjacency for the context and duplicated rows for the output so I made this part common for both order and adjacency.

If everything is good for you, I am happy to merge the PR!

@francois-rozet
Copy link
Copy Markdown
Member

francois-rozet commented Jul 30, 2024

In the end I have reverted to ones on the diagonal. As you said, I think it is easier to understand that adjacency describes the transformation graph rather than the conditioning graph.

I will now merge this PR. Thank you for this contribution!

@francois-rozet francois-rozet merged commit 146bb4b into probabilists:master Jul 30, 2024
@adrianjav
Copy link
Copy Markdown
Contributor Author

Glad to contribute! ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for custom adjacency matrices in autoregressive models

2 participants