[Quicktour Audio] Improve && remove ffmpeg dependency by patrickvonplaten · Pull Request #16723 · huggingface/transformers

patrickvonplaten · 2022-04-12T11:14:08Z

What does this PR do?

As discussed in #16563 , it's not good if the official quicktour example depends on Quicktour. Let's rather let datasets handle the audio loading and resampling here. IMO, it's also important to directly showcase here how to resample the audio.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

patrickvonplaten · 2022-04-12T11:18:35Z

docs/source/es/quicktour.mdx

 >>> dataset = datasets.load_dataset("PolyAI/minds14", name="en-US", split="train")  # doctest: +IGNORE_RESULT
 ```

+Debemos asegurarnos de que la frecuencia de muestreo del conjunto de datos coincide con la frecuencia de muestreo con la que se entrenó `facebook/wav2vec2-base-960h`.


@osanseviero could you take a look at my amazing Spanish skills here? haha

Yes, your Spanish was flawless 🚀 nice!

DeepL really is not bad then 😅

HuggingFaceDocBuilderDev · 2022-04-12T11:37:04Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for working on this!

sgugger · 2022-04-12T13:03:09Z

docs/source/en/quicktour.mdx

- {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, 
- {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, 
- {'text': 'HOW DO I FURN A JOINA COUT'}]
+>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]]


Can we select the four first indices then still pass a dataset? Using a list like this contradicts the text above (you can pass a whole dataset to a pileline).

I think the blocker for this is that the pipeline expects

A raw ndarray (here d["array"])

A dict of {"raw": ndarray, "sampling_rate": 16_000} but datasets creates {"array": ndarray, "sampling_rate": 16_000}

speech_recognizer(dataset["audio"]["array"]) ....

Should work no ? (since the sampling rate is already taken care of beforehand)

dataset["audio"] loads + resamples the whole dataset. For audio it's important to first slice the list and then call the columns.

We could use dataset.select(...) to remedy this but it'd still not work, e.g.:

from datasets import load_dataset ds = load_dataset("PolyAI/minds14", "en-US", split="train") ds = ds.select(range(4)) ds["audio"]["array"] # <- gives TypeError: list indices must be integers or slices, not str

Think the best I can do here is to rewrite the text a bit, no?

cc @lhoestq @albertovilla @polinaeterna @anton-l here as well - think that's a common problem we have with ds["audio"]["array"]

Ok for me to rewrite the text and say it can take list/array/dataset. But the previous example looked simpler :-p

Agree - @lhoestq @albertovilla do you have a nifty trick here by any chance to allow one to pass ds[:4]["audio"]["array"] ?

You can't do ds[:4]["audio"]["array"]. Indeed in the general case, the sampling rate may not be the same for all samples: sampling_rate is an optional parameter of Audio. So the format of ds[:4]["audio"] is a list of dicts {"array":..., "sampling_rate":..., "path":...}

We can update the pipeline to accept both raw and array. Seems like the path of least resistance here, wdyt ?

Changing the text and merging for now though

sgugger · 2022-04-12T13:03:30Z

docs/source/es/quicktour.mdx

-[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, 
- {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, 
- {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"},
+>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]]


stevhliu

Thanks for improving Patrick!

) * [Quicktour Audio] Improve && remove ffmpeg dependency * final fix * final touches

[Quicktour Audio] Improve && remove ffmpeg dependency

50f4a6c

patrickvonplaten mentioned this pull request Apr 12, 2022

Error when running "Quick Tour" code snippets #16563

Closed

4 tasks

patrickvonplaten requested review from Narsil, sgugger and stevhliu April 12, 2022 11:17

patrickvonplaten commented Apr 12, 2022

View reviewed changes

final fix

b370bbb

sgugger approved these changes Apr 12, 2022

View reviewed changes

stevhliu approved these changes Apr 12, 2022

View reviewed changes

final touches

482ed02

patrickvonplaten merged commit 9a2995e into huggingface:main Apr 18, 2022

patrickvonplaten deleted the remove_ffmpeg_quigkhouc_ branch April 18, 2022 14:50

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

[Quicktour Audio] Improve && remove ffmpeg dependency (huggingface#16723

cd5bd02

) * [Quicktour Audio] Improve && remove ffmpeg dependency * final fix * final touches

Conversation

patrickvonplaten commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhoestq Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

patrickvonplaten commented Apr 12, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 12, 2022 •

edited

Loading

lhoestq Apr 13, 2022 •

edited

Loading