Improving Hypernetwork initialization

This question/issue was first pointed out in

> I have a question, why the hypernetwork is just two linear without activation?

_Originally posted by @vexilligera in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2284#discussioncomment-3854768_

I also asked myself the same today while reviewing the code and it makes no sense
I get the same/similar results just using one linear function with 4x less space needed for the .pt files.

Right now it is implemented like this
```
self.linear1 = torch.nn.Linear(dim, dim * 2)
self.linear2 = torch.nn.Linear(dim * 2, dim)
```

but if one wants to keep the two linear structure we might take advantage of it to reduce the optimization space by doing something like this
```
self.linear1 = torch.nn.Linear(dim, dim //n)
self.linear2 = torch.nn.Linear(dim //n, dim)
```

where n >= 2

this should also make it easier to combine hyper networks since we will be working in a subspace of the context.

and once we are at it, I would also suggest initializing with XAVIER initialization
std=0.01 / sqrt(dim)

Specifically, I suggest either remove the double linear and have:
```
p = 0.01
self.linear = torch.nn.Linear(dim, dim)
self.linear.weight.data.normal_(mean=0.0, std=p/math.sqrt(dim))
```

or for n >= 2

```
self.linear1 = torch.nn.Linear(dim, dim//n, bias=False)
self.linear2 = torch.nn.Linear(dim//n, dim)

std = math.sqrt(p)*np.sqrt(2/(dim+dim/n))
self.linear1.weight.data.normal_(mean=0.0, std=std)
self.linear2.weight.data.normal_(mean=0.0, std=std)
```

    

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Hypernetwork initialization #2740

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improving Hypernetwork initialization #2740

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions