🐛 Bug
In a project that I am working on, I need to keep the spectrogram and the signal aligned. I provided a small script below.
The issues that I am running into:
- At the moment, the
length variable in istft has no constraints; therefore, it can extend into the padding. This has the potential for silent errors.
- The
istft function errors with custom padding.
To Reproduce
import torch
import torchaudio
import numpy.testing
n_fft = 16
hop_length = 4
win_length = n_fft
num_frames = 32
signal = torch.randn(hop_length * num_frames)
# Create a spectrogram that aligns with the signal
spectrogram = torch.stft(
torch.nn.functional.pad(signal, ((n_fft - hop_length) // 2, (n_fft - hop_length) // 2)),
n_fft=n_fft,
hop_length=hop_length,
win_length=n_fft,
window=torch.hann_window(n_fft),
center=False)
assert spectrogram.shape[1] == num_frames
# Reconstruct the signal and ensure it matches the original signal
reconstructed_signal = torchaudio.functional.istft(
spectrogram,
n_fft=n_fft,
hop_length=hop_length,
win_length=n_fft,
window=torch.hann_window(n_fft),
center=False)
assert reconstructed_signal.shape[0] == num_frames * hop_length
numpy.testing.assert_almost_equal(reconstructed_signal.numpy(), signal.numpy(), decimal=6)
Traceback (most recent call last):
File "boo.py", line 25, in <module>
center=False)
File "/Users/michaelp/Code/Text-to-Speech/venv/lib/python3.7/site-packages/torchaudio/functional.py", line 189, in istft
assert window_envelop_lowest > 1e-11, "window overlap add min: %f" % (window_envelop_lowest)
AssertionError: window overlap add min: 0.000000
Potential Solution
I can get this script to work if I change half_n_fft = n_fft // 2 to half_n_fft = (n_fft - hop_length) // 2. This signals to me that it might help for istft to accept a parameter indicating how much padding was applied.
Is that something you'd be interested in?
🐛 Bug
In a project that I am working on, I need to keep the spectrogram and the signal aligned. I provided a small script below.
The issues that I am running into:
lengthvariable inistfthas no constraints; therefore, it can extend into the padding. This has the potential for silent errors.istftfunction errors with custom padding.To Reproduce
Potential Solution
I can get this script to work if I change
half_n_fft = n_fft // 2tohalf_n_fft = (n_fft - hop_length) // 2. This signals to me that it might help foristftto accept a parameter indicating how much padding was applied.Is that something you'd be interested in?