[Quicktour Audio] Improve && remove ffmpeg dependency#16723
[Quicktour Audio] Improve && remove ffmpeg dependency#16723patrickvonplaten merged 3 commits intohuggingface:mainfrom
Conversation
| >>> dataset = datasets.load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT | ||
| ``` | ||
|
|
||
| Debemos asegurarnos de que la frecuencia de muestreo del conjunto de datos coincide con la frecuencia de muestreo con la que se entrenó `facebook/wav2vec2-base-960h`. |
There was a problem hiding this comment.
@osanseviero could you take a look at my amazing Spanish skills here? haha
There was a problem hiding this comment.
Yes, your Spanish was flawless 🚀 nice!
There was a problem hiding this comment.
DeepL really is not bad then 😅
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for working on this!
| {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, | ||
| {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, | ||
| {'text': 'HOW DO I FURN A JOINA COUT'}] | ||
| >>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] |
There was a problem hiding this comment.
Can we select the four first indices then still pass a dataset? Using a list like this contradicts the text above (you can pass a whole dataset to a pileline).
There was a problem hiding this comment.
I think the blocker for this is that the pipeline expects
- A raw
ndarray(hered["array"]) - A dict of
{"raw": ndarray, "sampling_rate": 16_000}butdatasetscreates{"array": ndarray, "sampling_rate": 16_000}
speech_recognizer(dataset["audio"]["array"])
....Should work no ? (since the sampling rate is already taken care of beforehand)
There was a problem hiding this comment.
dataset["audio"] loads + resamples the whole dataset. For audio it's important to first slice the list and then call the columns.
We could use dataset.select(...) to remedy this but it'd still not work, e.g.:
from datasets import load_dataset
ds = load_dataset("PolyAI/minds14", "en-US", split="train")
ds = ds.select(range(4))
ds["audio"]["array"] # <- gives TypeError: list indices must be integers or slices, not strThink the best I can do here is to rewrite the text a bit, no?
cc @lhoestq @albertovilla @polinaeterna @anton-l here as well - think that's a common problem we have with ds["audio"]["array"]
There was a problem hiding this comment.
Ok for me to rewrite the text and say it can take list/array/dataset. But the previous example looked simpler :-p
There was a problem hiding this comment.
Agree - @lhoestq @albertovilla do you have a nifty trick here by any chance to allow one to pass ds[:4]["audio"]["array"] ?
There was a problem hiding this comment.
You can't do ds[:4]["audio"]["array"]. Indeed in the general case, the sampling rate may not be the same for all samples: sampling_rate is an optional parameter of Audio. So the format of ds[:4]["audio"] is a list of dicts {"array":..., "sampling_rate":..., "path":...}
There was a problem hiding this comment.
We can update the pipeline to accept both raw and array. Seems like the path of least resistance here, wdyt ?
There was a problem hiding this comment.
Changing the text and merging for now though
| [{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, | ||
| {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, | ||
| {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, | ||
| >>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] |
stevhliu
left a comment
There was a problem hiding this comment.
Thanks for improving Patrick!
What does this PR do?
Fixes #16563
As discussed in #16563 , it's not good if the official quicktour example depends on Quicktour. Let's rather let
datasetshandle the audio loading and resampling here. IMO, it's also important to directly showcase here how to resample the audio.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.