Added implementation for the LEAF audio frontend by SarthakYadav · Pull Request #1364 · speechbrain/speechbrain

SarthakYadav · 2022-04-08T15:15:10Z

This PR adds an implementation for the LEAF [1] audio frontend. Following is a summary of changes:

Added GaborConv1d [1] layer to speechbrain.nnet.CNN.py
Added learnable Gaussian lowpass pooling layer [1] to speechbrain.nnet.pooling.py
Added Per-channel energy normalization layer [1,2] to speechbrain.nnet.normalisation.py. Includes dependency ExponentialMovingAverage in speechbrain.nnet.ema.py
Added full LEAF frontend [1] to speechbrain.nnet.CNN.py

References

[1] Neil Zeghidour, Olivier Teboul, F{'e}lix de Chaumont Quitry & Marco Tagliasacchi, "LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION", in Proc. of ICLR 2021 online
[2] Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous, "Trainable Frontend For Robust and Far-Field Keyword Spotting", in Proc of ICASSP 2017 online

SarthakYadav · 2022-04-08T15:15:35Z

@TParcollet Take a look.

TParcollet · 2022-04-08T15:56:14Z

Thank you so much!

SarthakYadav · 2022-04-08T16:32:34Z

@TParcollet I just added fix for the pre-commit fail.

anautsch · 2022-04-13T09:57:19Z

@SarthakYadav black points to some (minor) formatting edits, please take a look.

SarthakYadav · 2022-04-13T12:21:34Z

@anautsch Done had to take upstream changes from speechbrain:develop to get black to work correctly. Should pass now.

anautsch · 2022-05-18T08:40:12Z

Hi @SarthakYadav sorry for the delay on my end.

Needed to start the checks manually, and there are errors with the doctest:

FAILED speechbrain/nnet/CNN.py::speechbrain.nnet.CNN.GaborConv1d
FAILED speechbrain/nnet/CNN.py::speechbrain.nnet.CNN.Leaf

You can check all doctests with: pytest --doctest-modules speechbrain or this particular one:
pytest --doctest-modules speechbrain/nnet/CNN.py

The testing on git runs these scripts, which you can also try on your machine:

./tests/.run-linters.sh
./tests/.run-doctests.sh
./tests/.run-unittests.sh

Crossing fingers it's a little bug only!

SarthakYadav · 2022-05-18T09:41:49Z

Hi @anautsch

Sorry for the delay on my end.

No worries!
I've made the relevant changes in the latest commit. Have also added a unit test for Leaf. It should work fine now! Please take a look

SarthakYadav · 2022-05-18T10:19:19Z

Hi @anautsch. I just resolved a tiny merge conflict. Can you give the workflows approval again?

anautsch · 2022-05-18T12:02:05Z

Hi @SarthakYadav yes, no worries - your PR lgtm - @TParcollet suggested that we run the code on our side once more and merge then

SarthakYadav · 2022-05-18T13:27:44Z

Great, sounds good!

mravanelli · 2022-05-18T19:24:45Z

@anautsch, any news on that?

TParcollet · 2022-05-18T19:27:02Z

@anautsch and I reviewed it. Now, I must find the time to test it ...

TParcollet · 2022-05-18T19:27:40Z

I will certainly have to review it again ...

TParcollet · 2022-05-26T16:23:34Z

recipes/Google-speech-commands/README.md

 |----------------- | ------------ |
 | xvector + augment v12 | 98.14% |
 | xvector + augment v35 | 97.43% |
+| xvector + augment + LEAF v35 | 96.79% |


results are worst ?

Yes. That's what I got in the first and only experiment. Leaf was evaluated on EfficientNetB0 and CNN14 architectures, so I have no known xvector baselines to go by.

TParcollet · 2022-05-26T16:26:56Z

speechbrain/nnet/CNN.py

+    return denominator * sinusoid * gaussian
+
+
+def gabor_impulse_response_legacy_complex(t, center, fwhm):


Yes. Leaf internally has some complex dtype operations, and I used to face problems with these operations in prior versions of torch (as well as in torch-xla, which to my best knowledge still doesn't support grad on those ops on TPUs). _legacy_complex is basically doing these operations as two float tensors instead of a complex dtype tensor.

This is also explained in the docs for LEAF/GaborConv1d

They can be removed if you like. But some people who might need to use a prev torch version (say <=1.9) for different reasons might find this extremely helpful, and it's controlled here using a simple boolean flag. Your call!

Any updates on this?

TParcollet · 2022-06-21T20:05:11Z

speechbrain/nnet/CNN.py

+        return in_channels
+
+
+class Leaf(nn.Module):


Hi @SarthakYadav, shouldn't this be a lobes instead ? I see it as a "complex" composition rather than a building block.

Sure, makes sense. I simply followed SincNet (which was a Module). I'll make it a lobe, and move it to speechbrain.lobes.features

TParcollet · 2022-06-21T20:07:52Z

speechbrain/nnet/CNN.py

    return int(padding)
+
+
+def gabor_impulse_response(t, center, fwhm):


Wondering if these functions shouldn't go somewhere else, as they are not related to NN stuff, but more DSP ? what about Speechbrain.processing.features or speechbrain.process.signal_processing

Sure. I'll move them to speechbrain.process.signal_processing

TParcollet · 2022-06-21T20:10:16Z

speechbrain/nnet/ema.py

+from torch import nn
+
+
+class ExponentialMovingAverage(nn.Module):


Shouldn't this go to speechbrain.nnet.normalization? That is literally a question ahah. The idea always is to reduce the number of files.

Well the idea was that EMA might find other use cases. But I'll move it to speechbrain.nnet.normalization, it goes well there too.

TParcollet · 2022-06-21T21:30:19Z

Once the comments have been addressed, I'll merge. I tested, and it works :-) Thanks for the huge work.

mravanelli · 2022-06-22T16:51:27Z

Also @SarthakYadav, could you please merge the latest version of the development here? We recently added many consistency tests that helps making sure the code is fine.

SarthakYadav · 2022-06-22T18:36:16Z

Once the comments have been addressed, I'll merge. I tested, and it works :-) Thanks for the huge work.

Also @SarthakYadav, could you please merge the latest version of the development here? We recently added many consistency tests that helps making sure the code is fine.

Sure @mravanelli, will do.

SarthakYadav · 2022-06-22T19:26:11Z

The latest commit incorporates all the suggestions. Have also updated the sample recipe, training is working.

@TParcollet Please take a look.

TParcollet · 2022-06-23T20:33:18Z

@SarthakYadav some tests are failing, I will let you fix that and then we merge ! I am fine with the code now :-)

SarthakYadav · 2022-06-23T21:23:12Z

@SarthakYadav some tests are failing, I will let you fix that and then we merge ! I am fine with the code now :-)

It was a failed recipe consistency test due to my .yaml not being in test/recipes.csv. I've updated it.

SarthakYadav · 2022-06-24T17:11:26Z

@mravanelli thanks for fixing the recipe tests. I was about to post asking how to do that.

It seems to me that it's failing due to no documentation for the forward methods in the modules I wrote? I'll fix that soon.

mravanelli · 2022-06-24T17:31:13Z

ok, that's a minor fix. I will wait for your commit then!

…

On Fri, Jun 24, 2022 at 1:11 PM Sarthak Yadav ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> thanks for fixing the recipe tests. I was about to post asking how to do that. It seems to me that it's failing due to no documentation for the forward methods in the modules I wrote? I'll fix that soon. — Reply to this email directly, view it on GitHub <#1364 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVQF4BFBWEQVMPUJOELVQXT4RANCNFSM5S43UE2Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mravanelli · 2022-06-24T21:49:34Z

I finally wrote the missing docstrings (we are accelerating a bit because we will release the new version of speechbrain soon). If all the tests pass, I think we can merge it!

mravanelli · 2022-06-24T22:33:48Z

thank you @SarthakYadav for this great job! Of course you are welcome to keep contributing to speechbrain if you want.

SarthakYadav · 2022-06-25T14:08:19Z

I finally wrote the missing docstrings (we are accelerating a bit because we will release the new version of speechbrain soon). If all the tests pass, I think we can merge it!

thank you @SarthakYadav for this great job! Of course you are welcome to keep contributing to speechbrain if you want.

Thanks a lot @mravanelli!

SarthakYadav added 2 commits April 8, 2022 15:58

added implementation for LEAF audio frontend

9cd5b74

updated module name for gaussian low pass pooling layer

3ec3fe1

pre-commit fixes

342e1a3

SarthakYadav added 2 commits April 13, 2022 13:12

Merge branch 'speechbrain:develop' into leaf

3f9d5d8

minor formatting edits

f56c0b3

SarthakYadav added 2 commits May 8, 2022 11:53

added LEAF recipe for Google-speech-commands

0f57a40

finalized recipe for LEAF + xvector

a22abbf

doc-test fixes, torch.jit.trace fix for EMA layer

390c78c

Merge branch 'develop' into leaf

5b02c0e

mravanelli added the ready to review Waiting on reviewer to provide feedback label May 18, 2022

mravanelli requested a review from anautsch May 18, 2022 19:24

TParcollet reviewed May 26, 2022

View reviewed changes

mravanelli force-pushed the develop branch from 25d399a to 421fb46 Compare May 31, 2022 16:58

anautsch changed the base branch from develop to develop-v2 June 1, 2022 15:39

TParcollet reviewed Jun 21, 2022

View reviewed changes

SarthakYadav added 2 commits June 22, 2022 20:37

Merge branch 'speechbrain:develop' into leaf

5a7991e

LEAF refactor

b326a84

added LEAF Speech Commands recipe

c14cd5a

fix tests

6382197

mravanelli added 2 commits June 24, 2022 14:09

add extra-req

67d7290

adding missing docstrings

939aab6

mravanelli merged commit a512bb6 into speechbrain:develop Jun 24, 2022

SarthakYadav deleted the leaf branch June 26, 2022 09:51

		return denominator * sinusoid * gaussian


		def gabor_impulse_response_legacy_complex(t, center, fwhm):

		return int(padding)


		def gabor_impulse_response(t, center, fwhm):

		from torch import nn


		class ExponentialMovingAverage(nn.Module):

Conversation

SarthakYadav commented Apr 8, 2022

References

Uh oh!

SarthakYadav commented Apr 8, 2022

Uh oh!

TParcollet commented Apr 8, 2022

Uh oh!

SarthakYadav commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anautsch commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SarthakYadav commented Apr 13, 2022

Uh oh!

anautsch commented May 18, 2022

Uh oh!

SarthakYadav commented May 18, 2022

Uh oh!

SarthakYadav commented May 18, 2022

Uh oh!

anautsch commented May 18, 2022

Uh oh!

SarthakYadav commented May 18, 2022

Uh oh!

mravanelli commented May 18, 2022

Uh oh!

TParcollet commented May 18, 2022

Uh oh!

TParcollet commented May 18, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SarthakYadav May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SarthakYadav May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SarthakYadav Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TParcollet commented Jun 21, 2022

Uh oh!

mravanelli commented Jun 22, 2022

Uh oh!

SarthakYadav commented Jun 22, 2022

Uh oh!

SarthakYadav commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet commented Jun 23, 2022

Uh oh!

SarthakYadav commented Jun 23, 2022

Uh oh!

SarthakYadav commented Jun 24, 2022

Uh oh!

mravanelli commented Jun 24, 2022 via email

Uh oh!

mravanelli commented Jun 24, 2022

SarthakYadav commented Apr 8, 2022 •

edited

Loading

anautsch commented Apr 13, 2022 •

edited

Loading

SarthakYadav May 26, 2022 •

edited

Loading

SarthakYadav May 26, 2022 •

edited

Loading

SarthakYadav Jun 22, 2022 •

edited

Loading

SarthakYadav commented Jun 22, 2022 •

edited

Loading