added 'norm' kwarg to make Audio scaling normalization optional#11161
added 'norm' kwarg to make Audio scaling normalization optional#11161drscotthawley wants to merge 7 commits intoipython:masterfrom
Conversation
|
BTW, there's no need to make a new PR if you spot another thing you need to change. Just push to the same branch, and the existing PR is updated. This is how we work if a reviewer asks for a change as well. |
|
Looks good; can I ask you to add a whatsnew snippet for it? |
|
@takluyver Sure thing! Just did. |
|
Hmm, the thing about clipping makes me wonder if this is actually a good API. E.g. if you pass in data as 16-bit integers, which appears to be the native data type of .wav files, it will clip them to +/- 1 and you won't hear anything useful. Maybe if you don't want the data normalised by max absolute value, you should pass in a number to use instead? I.e. in your case, you would pass in I don't know what to call that parameter, or if that even makes any sense. |
|
Oh, ok, interesting. I've only ever used floating-point -1 to 1. I looked at the code for scipy.io.wavfile.write, and it looks as though one could clip or scale by the appropriate maximum-possible value(s), based on detecting the datatype of the input signal. I could work on that in the next day or two. |
|
I took out the clipping, added some dtype checking for float vs. integer type. Note that in Python 3, there technically is no maximum value for ints, although 9223372036854775807 is what is returned by Because there is now no clipping, it is possible to generate runtime errors if data exceeds the appropriate range. Users will need to clip on their own before calling Audio object. I could put the clipping back in, at least to cover floats and int16's. ...? or add another boolean kwarg called "clip" to turn clipping on or off? |
|
I was just about to look into this myself, when I saw this PR. Would love to have this merged in if possible. |
|
Sorry for the delay. I'm happy with the API, but I'm not actually sure that this would raise an error with out-of-range values - it looks like numpy's type conversion accepts out-of-range values and wraps around: Presumably this would result in distorted audio or noise depending on how far beyond the range the numbers are. I think throwing an error on unexpected input would be preferable. I can't see any functionality in numpy to cast and check for overflows, so we may have to construct the array first, check it for overflows, and then cast to int16. |
|
And I think an error is preferable to clipping. If you're outside either range (+-1 or int16), there's a good chance that you're on totally the wrong scale, not just a little bit off, so the resulting audio will be terrible. It's much easier to figure out the problem from an error message than from mangled audio. |
|
Hmm.. I personally think that if norm=False, internally, clipping is the right thing to do, like with One way around this is to have the user specify the max value through the API. So instead of |
|
I was just digging to see if it's possible to disable normalization, so 👍 for this whole thread! |
|
I would like to help with this, but unsure how. Thanks @drscotthawley for your previous efforts! My understanding of the current state:
Regarding clipping, I think that if the boundaries are made explicit then an error makes sense. @takluyver would you accept a PR for option 2 or 3? |
|
Or if you prefer I can just continue @drscotthawley's work and fix as per your comments for out-of-range values (option 1). |
|
Hey folks, I'm ok with whatever you decide to do. For my own work I've moved on, and I just always use a modified version of the Audio class where I've disabled the normalization. I understand that may not fit in the grand scheme of what all other users need. |
|
FWIW, my usage case is indeed floating-point [-1, 1] for norm=False case, so I would be happy with option 2. But I can see option 3 being more general. If we do go with that, I think wav_format=float by default is a good idea.
Also, I am not a fan of raising an error if values are out of bound. I would advocate clipping to the min/max values in that case. This is what some audio drivers would do anyway.
Eran
… On Mar 6, 2019, at 6:48 PM, Matan Gover ***@***.***> wrote:
I would like to help with this, but unsure how. Thanks @drscotthawley <https://github.com/drscotthawley> for your previous efforts! My understanding of the current state:
Right now Audio supports arrays (or lists) of several data types (floating-point, signed ints), because it always normalizes by the largest input sample. Unsigned ints (PCM8) are not supported. See common data formats <https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html#id1>.
Adding norm=False requires Audio to know the input data boundaries (min/max sample value, or only max if supporting only signed data types). Possible solutions:
Guess the data boundaries based on the numpy array dtype. This is what scipy.io.wavfile.write does. Cons: implicit and might be confusing. Requires detailed documentation. Also, for list input we can either guess based on the list values (as @drscotthawley <https://github.com/drscotthawley> did), which is even more implicit, or throw an error for norm=False.
Support only floating-point audio in range [-1, 1] for norm=False. Easiest to document and implement. Very straightforward for users. Seems like this is what most users want, although I didn't conduct a survey :)
Require user to specify data format ('Explicit is better than implicit'...). This is what PySoundFile does (subtype is set manually <https://pysoundfile.readthedocs.io/en/0.9.0/#soundfile.read>). @takluyver <https://github.com/takluyver> and @eranegozy <https://github.com/eranegozy> have suggested adding a maxvalue parameter. I suggest instead adding a wav_format parameter that takes values 'float', 'PCM_16', 'PCM_24', 'PCM_32', 'PCM_U8' (values from soundfile.available_subtypes('wav')). Pros: more readable than 'maxvalue', supports all common formats including unsigned PCM8 (maxvalue=1337 is possible but definitely not needed). wav_format could be set to float by default.
Regarding clipping, I think that if the boundaries are made explicit then an error makes sense.
@takluyver <https://github.com/takluyver> would you accept a PR for option 2 or 3?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#11161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI2rTqXKUJ1S5UqhW3FscjYr_ZsGM-2Dks5vUFPegaJpZM4UOqGV>.
|
|
I'm fine with option 2 (only support floats -1 to +1). Option 3 strikes me as offputting complexity, as I don't know about the details of the wav format - but if that's preferred by others, I'm happy to review a PR. I still maintain that out-of-range values should cause an error rather than be clipped. As above, if your data goes beyond +- 1, it may well be on the wrong scale completely, not just slightly out. |
|
On Mar 12, 2019, at 4:38 PM, Thomas Kluyver ***@***.***> wrote:
I'm fine with option 2 (only support floats -1 to +1). Option 3 strikes me as offputting complexity, as I don't know about the details of the wav format - but if that's preferred by others, I'm happy to review a PR.
I agree with this.
I still maintain that out-of-range values should cause an error rather than be clipped. As above, if your data goes beyond +- 1, it may well be on the wrong scale completely, not just slightly out.
That’s fine. That argument makes sense.
-Eran
|
|
OK, great. Thanks for your input everyone. Working on option 2. |
|
@drscotthawley feel free to close this one as #11650 has been merged 🎉 |
|
@matangover ok, done! |
|
This is still not in the docs, and I get an error using the most up-to-date version that "init() got an unexpected keyword argument 'norm'". Am I missing something? |
|
Hi @musicnerd, the argument name is |
|
Thanks @matangover I was on an older version of the docs. Oops! Thank you :) |
Following up on Issue #8608