This transformer is used to transform audio files into WAV format with control over Audio Channels (AC) and Audio Rate (AR). It is based on NeMo's Speech Data Processor (SDP) Toolkit.
To transform your audio files using this ETL, follow these steps:
-
Navigate to the Directory
Go to the directory where the specification (pod.yaml) file exists.cd ais-etl/transformers/NeMo/FFmpeg/ -
Configure AIStore Endpoint
Ensure yourAIS_ENDPOINTis pointed to the correct AIStore cluster. -
Edit Configuration
Edit theAR(Audio Rate) andAC(Audio Channels) values in thepod.yamlfile to match your desired output settings. -
Initialize the ETL
Run the following command to create the ETL in the AIStore cluster:ais etl init spec --from-file etl_spec.yaml
There are two ways to transform data using this ETL:
Transform a single object and save the output to a file:
ais etl object <etl-name> <bucket-name>/<object-name> <output-file>.wav<etl-name>: Name of the ETL you initialized.<bucket-name>: Name of the bucket containing your audio file.<object-name>: Name of the audio file to transform.<output-file>.wav: Filename for the transformed WAV file.
This command transforms the specified object and saves it as a WAV file.
Transform multiple objects in parallel and save them to another bucket. This method is faster and leverages AIStore's parallelization capabilities.
To see all options for the bucket command, run:
ais etl bucket -hais etl bucket <etl-name> <source-bucket> <destination-bucket> \
--cont-on-err \
--num-workers 500 \
--ext "{wav:wav,opus:wav,m4a:wav}" \
--prefix=<virtual-sub-directory> \
--prepend="transformed/"<etl-name>: Name of the ETL you initialized.<source-bucket>: Bucket containing the original audio files.<destination-bucket>: Bucket where transformed files will be saved.--cont-on-err: Continue processing even if errors occur.--ext "{wav:wav,opus:wav,m4a:wav}": Specify input and output file extensions. If you dont specify this, the transformed objects will have the same name and extension.--num-workers 500: (Optional) Number of parallel workers (adjust as needed).--prefix=<virtual-sub-directory>: (Optional) Process only files within this sub-directory.--prepend="transformed/": (Optional) Prepend this path to the destination objects.
This command transforms all data in the <source-bucket> (optionally within the specified virtual sub-directory) and saves it to the <destination-bucket>, optionally under the transformed/ sub-directory.
For best performance with large audio files, enable direct file access so that FFmpeg reads directly from the target’s mountpath instead of receiving bytes over the network:
Set ETL_DIRECT_FQN=true in the etl_spec.yaml environment variables (already enabled by default in the provided spec). This requires argument: fqn to be set as well.
With direct file access:
- The target passes the local file path to the ETL pod
- FFmpeg reads the file directly from disk via
-i /path/to/file - No data is loaded into Python’s memory — zero-copy input
This transformer achieves significantly better performance than traditional FFmpeg methods by leveraging AIStore’s parallelization across multiple nodes and ETL communication mechanisms.
- Up to 5x faster than traditional FFmpeg.
- Performance scales linearly with the number of AIStore targets, as objects are distributed across more transformation pods.
We benchmarked the transformation of 300 audio files (each 10 MiB) using different ETL communication mechanisms:
| ETL Mode | Time Taken |
|---|---|
| hpull | 46 sec |
| hpush | 48 sec |
| hpull with FQN | 50 sec |
For comparison, we tested against:
-
Python-based FFmpeg script ('benchmark.py' which is based on NeMo's Speech Data Processor):
- Time Taken: 4 min 2.78 sec (Sequential processing)
-
FFmpeg Linux CLI Utility:
time bash -c ' mkdir -p tmp1 for i in {0..299}; do ffmpeg -i audio000.m4a -map 0:a -ac 2 -ar 44100 -c:a pcm_s16le "tmp1/output_audio$i.wav" done '
- Time Taken: 3 min 23.71 sec