Support S3D-backed video encoder by sophiazhi · Pull Request #104 · facebookresearch/multimodal

sophiazhi · 2022-06-20T22:04:29Z

Summary:
For the S3D-backed video encoder used by MUGEN video-text retrieval, added:

examples/mugen/retrieval/s3d.py: S3D model
examples/mugen/test/retrieval/test_s3d.py: S3D model unit tests
examples/mugen/retrieval/video_clip.py: video encoder
examples/mugen/test/retrieval/test_video_clip.py: video encoder tests

Test plan:
For a coverage report, temporarily remove the "examples/" lines from .coveragerc, then run python -m pytest --cov=examples/mugen/retrieval examples/mugen/test/retrieval/test_s3d.py -vv. Otherwise, run python -m pytest examples/mugen/test/retrieval/test_s3d.py::TestS3D -rP.
Similarly,
python -m pytest examples/mugen/test/retrieval/test_video_clip.py::TestVideoCLIP -rP

examples/s3d/s3d.py

langong347 · 2022-06-21T15:54:48Z

examples/s3d/s3d.py

+        self.preprocess_std = preprocess_std
+        self.model = S3D(400)
+        self.embedding_dim = list(self.model.fc.children())[0].in_channels
+        self.model.fc = nn.Identity()


Consider breaking S3D class into trunk and classification head for better composability to avoid model surgery like this. We can discuss this design with TorchVision to see if they could incorporate.

examples/s3d/s3d.py

examples/s3d/test_s3d.py

codecov-commenter · 2022-06-22T23:06:12Z

Codecov Report

Merging #104 (3716aaf) into main (b2e654e) will decrease coverage by 0.64%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
- Coverage   88.39%   87.75%   -0.65%     
==========================================
  Files          35       36       +1     
  Lines        1853     1911      +58     
==========================================
+ Hits         1638     1677      +39     
- Misses        215      234      +19

Impacted Files	Coverage Δ
...multimodal/modules/encoders/mdetr_image_encoder.py	`67.24% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2e654e...3716aaf. Read the comment docs.

examples/mugen/retrieval/s3d.py

RdoubleA · 2022-06-23T19:36:41Z

examples/mugen/retrieval/s3d.py

+        return x
+
+
+class Mixed3b(nn.Module):


do you know the reason for the Mixed names? although I can't think of a clearer name for these...

because they use a mix of 3D and 2D convolutions! paper page 3

maybe you could rename to MixedConvs3b? your call though

torchvision has similar classes called "Inception{A}{B}..." but over there they assume the conv sub unit is uniform, i.e., 3D or 2D or 1D. Here we use a mixture of 2D and 1D conv sub units in the same module. Eventually, this script should be upstreamed to TorchVision after generalizing their current implementation:

https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py#L176

Possibly a noob suggestion here, but all these Mixed* classes have very similar structure. Why not create a class per branch (except maybe branch0 since it's just a single op) that you can pass the primitive params (in/out dims, kernel_size, stride) to, then write a single class to handle the torch.cat portion of things? Then you don't have to write the same 4 sequentials a ton of times

Lan and I discussed not generalizing the S3D architecture yet, since that process requires more time to understand the existing torchvision VideoResnet code and generalize S3D in a consistent way (reusing torchvision classes, generalizing torchvision classes, following similar submodule structure). I like your suggestions but for now we didn’t want to generalize this S3D code if it has to be changed later anyway to a possibly different structure.

I am not necessarily thinking about how to generalize the S3D architecture as a whole, more about how to reduce the copypasta in all the Mixed* classes. Seems they could all come from a common class with a different set of params passed, no?

Sure I'll try that out in a new commit

examples/mugen/test/retrieval/test_s3d.py

examples/mugen/test/retrieval/test_video_clip.py

examples/mugen/retrieval/video_clip.py

ebsmothers · 2022-06-24T18:16:03Z

examples/mugen/retrieval/s3d.py

+        return x
+
+
+class Mixed3b(nn.Module):


Possibly a noob suggestion here, but all these Mixed* classes have very similar structure. Why not create a class per branch (except maybe branch0 since it's just a single op) that you can pass the primitive params (in/out dims, kernel_size, stride) to, then write a single class to handle the torch.cat portion of things? Then you don't have to write the same 4 sequentials a ton of times

examples/mugen/retrieval/video_clip.py

facebook-github-bot · 2022-06-27T15:06:46Z

@sophiazhi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

examples/mugen/retrieval/s3d.py

facebook-github-bot · 2022-06-27T18:48:49Z

@sophiazhi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

sophiazhi added 2 commits June 20, 2022 21:45

S3D encoder and tests

b0faf99

Merge remote-tracking branch 'upstream/main' into szhi-s3d

84d21da

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 20, 2022

sophiazhi requested review from RdoubleA, ankitade, ebsmothers and langong347 June 20, 2022 22:24

langong347 reviewed Jun 21, 2022

View reviewed changes

sophiazhi added 3 commits June 22, 2022 22:45

refactored into S3D, videoclip, and video transform

1b2a266

Merge remote-tracking branch 'upstream/main' into szhi-s3d

f2ef7ab

reorganize mugen/test dir

a160c73

sophiazhi added 2 commits June 23, 2022 15:12

remove transform code from this pr

f469bd3

refactor video encoder test to separate class

6417180

RdoubleA reviewed Jun 23, 2022

View reviewed changes

sophiazhi added 3 commits June 23, 2022 20:32

Merge remote-tracking branch 'origin/main' into szhi-s3d

fb1967a

remove sequential wrapper around S3D fc

023352a

adjust video encoder for new fc structure

3716aaf

langong347 reviewed Jun 24, 2022

View reviewed changes

examples/mugen/test/retrieval/test_video_clip.py Outdated Show resolved Hide resolved

langong347 reviewed Jun 24, 2022

View reviewed changes

examples/mugen/retrieval/video_clip.py Show resolved Hide resolved

ebsmothers reviewed Jun 24, 2022

View reviewed changes

sophiazhi added 2 commits June 27, 2022 14:56

minor changes to S3D docs and test

3b332a5

Merge remote-tracking branch 'upstream/main' into szhi-s3d

07b9697

sophiazhi marked this pull request as ready for review June 27, 2022 15:03

langong347 approved these changes Jun 27, 2022

View reviewed changes

examples/mugen/retrieval/s3d.py Outdated Show resolved Hide resolved

edit s3d docstring

bd42c2c

facebook-github-bot closed this in 62c135f Jun 27, 2022

sophiazhi mentioned this pull request Jul 6, 2022

Refactor S3D model #135

Closed

Conversation

sophiazhi commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Jun 27, 2022

Uh oh!

Uh oh!

facebook-github-bot commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sophiazhi commented Jun 20, 2022 •

edited

Loading

codecov-commenter commented Jun 22, 2022 •

edited

Loading