Added implementation for the LEAF audio frontend#1364
Added implementation for the LEAF audio frontend#1364mravanelli merged 15 commits intospeechbrain:developfrom SarthakYadav:leaf
Conversation
|
@TParcollet Take a look. |
|
Thank you so much! |
|
@TParcollet I just added fix for the pre-commit fail. |
|
@SarthakYadav |
|
@anautsch Done had to take upstream changes from |
|
Hi @SarthakYadav sorry for the delay on my end. Needed to start the checks manually, and there are errors with the doctest: You can check all doctests with: The testing on git runs these scripts, which you can also try on your machine: Crossing fingers it's a little bug only! |
|
Hi @anautsch
No worries! |
|
Hi @anautsch. I just resolved a tiny merge conflict. Can you give the workflows approval again? |
|
Hi @SarthakYadav yes, no worries - your PR lgtm - @TParcollet suggested that we run the code on our side once more and merge then |
|
Great, sounds good! |
|
@anautsch, any news on that? |
|
@anautsch and I reviewed it. Now, I must find the time to test it ... |
|
I will certainly have to review it again ... |
| |----------------- | ------------ | | ||
| | xvector + augment v12 | 98.14% | | ||
| | xvector + augment v35 | 97.43% | | ||
| | xvector + augment + LEAF v35 | 96.79% | |
There was a problem hiding this comment.
Yes. That's what I got in the first and only experiment. Leaf was evaluated on EfficientNetB0 and CNN14 architectures, so I have no known xvector baselines to go by.
speechbrain/nnet/CNN.py
Outdated
| return denominator * sinusoid * gaussian | ||
|
|
||
|
|
||
| def gabor_impulse_response_legacy_complex(t, center, fwhm): |
There was a problem hiding this comment.
Yes. Leaf internally has some complex dtype operations, and I used to face problems with these operations in prior versions of torch (as well as in torch-xla, which to my best knowledge still doesn't support grad on those ops on TPUs). _legacy_complex is basically doing these operations as two float tensors instead of a complex dtype tensor.
This is also explained in the docs for LEAF/GaborConv1d
They can be removed if you like. But some people who might need to use a prev torch version (say <=1.9) for different reasons might find this extremely helpful, and it's controlled here using a simple boolean flag. Your call!
There was a problem hiding this comment.
Any updates on this?
speechbrain/nnet/CNN.py
Outdated
| return in_channels | ||
|
|
||
|
|
||
| class Leaf(nn.Module): |
There was a problem hiding this comment.
Hi @SarthakYadav, shouldn't this be a lobes instead ? I see it as a "complex" composition rather than a building block.
There was a problem hiding this comment.
Sure, makes sense. I simply followed SincNet (which was a Module). I'll make it a lobe, and move it to speechbrain.lobes.features
speechbrain/nnet/CNN.py
Outdated
| return int(padding) | ||
|
|
||
|
|
||
| def gabor_impulse_response(t, center, fwhm): |
There was a problem hiding this comment.
Wondering if these functions shouldn't go somewhere else, as they are not related to NN stuff, but more DSP ? what about Speechbrain.processing.features or speechbrain.process.signal_processing
There was a problem hiding this comment.
Sure. I'll move them to speechbrain.process.signal_processing
speechbrain/nnet/ema.py
Outdated
| from torch import nn | ||
|
|
||
|
|
||
| class ExponentialMovingAverage(nn.Module): |
There was a problem hiding this comment.
Shouldn't this go to speechbrain.nnet.normalization? That is literally a question ahah. The idea always is to reduce the number of files.
There was a problem hiding this comment.
Well the idea was that EMA might find other use cases. But I'll move it to speechbrain.nnet.normalization, it goes well there too.
|
Once the comments have been addressed, I'll merge. I tested, and it works :-) Thanks for the huge work. |
|
Also @SarthakYadav, could you please merge the latest version of the development here? We recently added many consistency tests that helps making sure the code is fine. |
Sure @mravanelli, will do. |
|
The latest commit incorporates all the suggestions. Have also updated the sample recipe, training is working. @TParcollet Please take a look. |
|
@SarthakYadav some tests are failing, I will let you fix that and then we merge ! I am fine with the code now :-) |
It was a failed recipe consistency test due to my .yaml not being in |
|
@mravanelli thanks for fixing the recipe tests. I was about to post asking how to do that. It seems to me that it's failing due to no documentation for the forward methods in the modules I wrote? I'll fix that soon. |
|
ok, that's a minor fix. I will wait for your commit then!
…On Fri, Jun 24, 2022 at 1:11 PM Sarthak Yadav ***@***.***> wrote:
@mravanelli <https://github.com/mravanelli> thanks for fixing the recipe
tests. I was about to post asking how to do that.
It seems to me that it's failing due to no documentation for the forward
methods in the modules I wrote? I'll fix that soon.
—
Reply to this email directly, view it on GitHub
<#1364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVQF4BFBWEQVMPUJOELVQXT4RANCNFSM5S43UE2Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I finally wrote the missing docstrings (we are accelerating a bit because we will release the new version of speechbrain soon). If all the tests pass, I think we can merge it! |
|
thank you @SarthakYadav for this great job! Of course you are welcome to keep contributing to speechbrain if you want. |
Thanks a lot @mravanelli! |
This PR adds an implementation for the LEAF [1] audio frontend. Following is a summary of changes:
speechbrain.nnet.CNN.pyspeechbrain.nnet.pooling.pyspeechbrain.nnet.normalisation.py. Includes dependencyExponentialMovingAverageinspeechbrain.nnet.ema.pyspeechbrain.nnet.CNN.pyReferences
[1] Neil Zeghidour, Olivier Teboul, F{'e}lix de Chaumont Quitry & Marco Tagliasacchi, "LEAF: A LEARNABLE FRONTEND FOR AUDIO CLASSIFICATION", in Proc. of ICLR 2021 online
[2] Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous, "Trainable Frontend For Robust and Far-Field Keyword Spotting", in Proc of ICASSP 2017 online