Skip to content

Refactor S3D model#135

Closed
sophiazhi wants to merge 2 commits intofacebookresearch:mainfrom
sophiazhi:szhi-s3d_refactor
Closed

Refactor S3D model#135
sophiazhi wants to merge 2 commits intofacebookresearch:mainfrom
sophiazhi:szhi-s3d_refactor

Conversation

@sophiazhi
Copy link
Copy Markdown
Contributor

@sophiazhi sophiazhi commented Jul 6, 2022

Summary:
Followup to #104: Refactor S3D model to reduce code repetition

Test plan:
python -m pytest examples/mugen/test/retrieval/test_s3d.py -rP
Coverage:

examples/mugen/test/retrieval/test_s3d.py::TestS3D::test_forward PASSED                     [ 25%]
examples/mugen/test/retrieval/test_s3d.py::TestS3D::test_basicconv3d PASSED                 [ 50%]
examples/mugen/test/retrieval/test_s3d.py::TestS3D::test_sepconv3d PASSED                   [ 75%]
examples/mugen/test/retrieval/test_s3d.py::TestS3D::test_mixedconvsblock PASSED             [100%]

Name                                     Stmts   Miss  Cover
examples/mugen/retrieval/s3d.py             66      0   100%
examples/mugen/retrieval/video_clip.py      28     28     0%
TOTAL                                       94     28    70%


======================================== 4 passed in 5.03s ========================================

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 6, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 6, 2022

Codecov Report

Merging #135 (2a27e71) into main (855c9ed) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #135   +/-   ##
=======================================
  Coverage   90.67%   90.67%           
=======================================
  Files          46       46           
  Lines        2403     2403           
=======================================
  Hits         2179     2179           
  Misses        224      224           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 855c9ed...2a27e71. Read the comment docs.

@sophiazhi sophiazhi requested a review from ebsmothers July 6, 2022 03:35
@sophiazhi sophiazhi marked this pull request as ready for review July 6, 2022 03:36
Copy link
Copy Markdown
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for refactoring these classes, the way you're doing it now is really nice. Left a few nits but otherwise looks good.

BasicConv3d(480, 96, kernel_size=1, stride=1),
SepConv3d(96, 208, kernel_size=3, stride=1, padding=1),
class Branch1or2(nn.Sequential):
def __init__(self, in_planes, mid_planes, out_planes):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add type info (here and in other classes). Mypy won't complain about examples/ so it's not a hard requirement, but still will make the code clearer.

@sophiazhi sophiazhi requested a review from YosuaMichael July 6, 2022 18:17
@langong347
Copy link
Copy Markdown

@YosuaMichael Would love to get your eyes on this. This way of implementing inception blocks is different from torchvision. This version of using S3D as the backbone may as well be a good addition to TorchVision' inception class in the future.

Copy link
Copy Markdown
Contributor

@YosuaMichael YosuaMichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sophiazhi , thanks the PR! I have a few comments on making more general class for S3D and some suggestions for the implementation and naming.
Also similar with @ebsmothers suggestion, it would be great to have type info for public class constructor and forward.

@@ -30,17 +30,17 @@ def __init__(self, num_class):
BasicConv3d(64, 64, kernel_size=1, stride=1),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make a more generic class that able to handle both S3D and I3D (see https://arxiv.org/pdf/1705.07750.pdf). The idea (just to illustrate, the naming might not be good) is to enable user to choose the class for the conv_block (S3D: SepConv3D, I3D: Conv3D), max_pool_block, and inception_block (S3D: SepInceptionModule3D, I3D: InceptionModule3D) (examples: here or here).

It is also good to let the parameters also be configurable on the generic class, this will be useful if there will be a new variant with different parameters in the future. This is not a must, however since there are a lot of parameters, it might be a good idea to create a config class to store the parameters (example).

Once we have the generic module, we can have a builder to build the s3d model (example)

@YosuaMichael
Copy link
Copy Markdown
Contributor

YosuaMichael commented Jul 8, 2022

I also want to suggest to reuse Conv3dNormActivation from torchvision. I think this can be used to replace BasicConv3d by using

Conv3dNormActivation(64, 64, kernel_size=1, stride=1, norm_layer=partial(nn.BatchNorm3d, eps=1e-3, momentum=1e-3))

Also SepConv3d can also be simplified:

class SepConv3d(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
        super(SepConv3d, self).__init__(
            Conv3dNormActivation(
                in_planes,
                out_planes,
                kernel_size=(1, kernel_size, kernel_size),
                stride=(1, stride, stride),
                padding=(0, padding, padding),
                bias=False,
            ),
            Conv3dNormActivation(
                out_planes,
                out_planes,
                kernel_size=(kernel_size, 1, 1),
                stride=(stride, 1, 1),
                padding=(padding, 0, 0),
                bias=False,
            ),
        )

Another thing is that the padding is usually constructor with common formula of (kernel_size - 1) // 2 * dilation and I observe that it is the case here too. In this case, you can also not passing the padding manually :)

Take note that I have just updated Conv3dNormActivation so it can accept a tuple kernel_size. So in case you want to use it, please either use install torchvision from source or wait for tomorrow nightly version.

Copy link
Copy Markdown

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sophiazhi In continuation to our discussion at pytorch/vision#6402, I think this implementation is quite good and worth upstreaming to TorchVision.

I've added a few comments to indicate parts that would probably need to change to make it easier to upstream. My comments are minor and don't really affect your implementation; rather just suggest specific idioms that TorchVision uses.

There is a bit of boilerplate code that would also be required for creating model builder methods and weights (we will need to port the paper weights) but again nothing too much. We can guide you through the process during the PR. My recommendation is to open one on the repo and I can guide you from there.

class Branch0(nn.Sequential):
def __init__(self, in_planes, out_planes):
super(Branch0, self).__init__(
BasicConv3d(in_planes, out_planes, kernel_size=1, stride=1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TorchVision has a standard block for BasicConv3d. It's called Conv3dNormActivation.

SepConv3d can also be rewritten based on it. Not sure if we should call them separable conv because there are normalization and activation layers in between them but we can review naming later.

Comment on lines +140 to +143
self.branch0 = Branch0(in_planes, b0_out)
self.branch1 = Branch1or2(in_planes, b1_mid, b1_out)
self.branch2 = Branch1or2(in_planes, b2_mid, b2_out)
self.branch3 = Branch3(in_planes, b3_out)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the introduction of the Branch* classes? From what I understand, we can just use Sequential directly and simplify the code structure.

Mixed5b(),
Mixed5c(),
MixedConvsBlock(832, 256, 160, 320, 32, 128, 128),
MixedConvsBlock(832, 384, 192, 384, 48, 128, 128),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry putting a comment on a random position because Github doesn't let me comment on the constructor of the S3D class. We need to align the arguments you pass to make them similar to what TorchVision is currently using (see Video Resnet class, nothing to different).

@sophiazhi
Copy link
Copy Markdown
Contributor Author

Closing due to pytorch/vision#6412

@sophiazhi sophiazhi closed this Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants