Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for your PR!
There are still a lot of things to clean up in the modeling file. Make sure to follow the usual style guidelines we have in place.
2225dc1 to
8dbfda7
Compare
FrancescoSaverioZuppichini
left a comment
There was a problem hiding this comment.
Thanks for working on it, it looks like a very cool model. There some naming convention that must be applied
***Block/s->***Layer/s- use
Stagewhen we stack multiple***Layerstogether - when possible use
hidden_size/sinstead ofchannels
Some code can use some refactor to increase readability
| self.use_batch_norm = config.use_batch_norm | ||
| self.act1 = nn.ReLU() | ||
| self.conv1 = nn.Conv2d( | ||
| config.channels, | ||
| config.channels, | ||
| kernel_size=3, | ||
| stride=1, | ||
| padding=1, | ||
| bias=not self.use_batch_norm, | ||
| ) | ||
|
|
||
| self.act2 = nn.ReLU() | ||
| self.conv2 = nn.Conv2d( | ||
| config.channels, | ||
| config.channels, | ||
| kernel_size=3, | ||
| stride=1, | ||
| padding=1, | ||
| bias=not self.use_batch_norm, | ||
| ) | ||
|
|
||
| if self.use_batch_norm: | ||
| self.batch_norm1 = nn.BatchNorm2d(config.channels) | ||
| self.batch_norm2 = nn.BatchNorm2d(config.channels) |
There was a problem hiding this comment.
That could work, keep in mind I am not sure if we always prefer named layers vs unnamed ones (like stacking them in a sequential layer) so take my design comments with a grain of salt
| self.embeddings = DPTViTEmbeddings(config) | ||
| self.encoder = DPTViTEncoder(config) |
There was a problem hiding this comment.
Okay, why is not called DPTViTModel then?
FrancescoSaverioZuppichini
left a comment
There was a problem hiding this comment.
Thanks! A couple of comments about single variables and naming
|
Thanks for your reviews, addressed most comments. Main thing to update is:
Ideally we have names that are going to be used by all vision models. |
d723c89 to
8323505
Compare
FrancescoSaverioZuppichini
left a comment
There was a problem hiding this comment.
Thanks! Approved, there are still some minor comments about variables names
sgugger
left a comment
There was a problem hiding this comment.
There are still a few comments pending from the first review, highlighted them.
| The number of output channels for each of the four feature maps of the backbone. | ||
| channels (`int`, *optional*, defaults to 256): | ||
| The number of channels before fusion. | ||
| in_index (`int`, *optional*, defaults to -1): |
There was a problem hiding this comment.
I think you agreed to use head_in_indices here?
There was a problem hiding this comment.
This has not been addressed.
040c65c to
3603e21
Compare
2e84347 to
a1b9b2e
Compare
|
stop with all the emails I’m getting mad
…Sent from my iPhone
On Mar 11, 2022, at 8:37 AM, NielsRogge ***@***.***> wrote:
@NielsRogge commented on this pull request.
In src/transformers/models/dpt/modeling_dpt.py:
> + if output_attentions:
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
+
+ if output_hidden_states:
+ all_hidden_states = all_hidden_states + (hidden_states,)
+
+ if not return_dict:
+ return tuple(v for v in [hidden_states, all_hidden_states, all_self_attentions] if v is not None)
+ return BaseModelOutput(
+ last_hidden_state=hidden_states,
+ hidden_states=all_hidden_states,
+ attentions=all_self_attentions,
+ )
+
+
+class DPTReassembleBlocks(nn.Module):
Renamed to DPTReassembleStage
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are subscribed to this thread.
|
What does this PR do?
This PR adds DPT, Dense Prediction Transformers, to the library. It's some very nice work from Intel Labs that applies Transformers for dense prediction tasks such as semantic segmentation and depth estimation.
Feel free to play around with the notebook here.
I've defined 3 models:
DPTModelDPTForDepthEstimationDPTForSemanticSegmentation.DPTModel is the backbone only (ViT in this case). The head models use a neck (DPTNeck) combined with a task-specific head (either
DPTDepthEstimationHeadorDPTSemanticSegmentationHead).Important here:
logits.To do:
DPTFeatureExtractorout_indicesandin_indexnamesIntelorganization