Skip to content

added 'norm' kwarg to make Audio scaling normalization optional#11161

Closed
drscotthawley wants to merge 7 commits intoipython:masterfrom
drscotthawley:master
Closed

added 'norm' kwarg to make Audio scaling normalization optional#11161
drscotthawley wants to merge 7 commits intoipython:masterfrom
drscotthawley:master

Conversation

@drscotthawley
Copy link

Following up on Issue #8608

@takluyver
Copy link
Member

BTW, there's no need to make a new PR if you spot another thing you need to change. Just push to the same branch, and the existing PR is updated. This is how we work if a reviewer asks for a change as well.

@takluyver
Copy link
Member

Looks good; can I ask you to add a whatsnew snippet for it?

@drscotthawley
Copy link
Author

@takluyver Sure thing! Just did.
Also, am now clipping the non-normed audio just to be on the safe side; otherwise you get runtime errors if the abs() value of the audio exceeds 1.

@takluyver
Copy link
Member

Hmm, the thing about clipping makes me wonder if this is actually a good API. E.g. if you pass in data as 16-bit integers, which appears to be the native data type of .wav files, it will clip them to +/- 1 and you won't hear anything useful.

Maybe if you don't want the data normalised by max absolute value, you should pass in a number to use instead? I.e. in your case, you would pass in 1.0, to say that your waveform is within the range +/- 1. If your data goes outside that range, it would then throw an error.

I don't know what to call that parameter, or if that even makes any sense.

@drscotthawley
Copy link
Author

Oh, ok, interesting. I've only ever used floating-point -1 to 1. I looked at the code for scipy.io.wavfile.write, and it looks as though one could clip or scale by the appropriate maximum-possible value(s), based on detecting the datatype of the input signal. I could work on that in the next day or two.

@drscotthawley
Copy link
Author

drscotthawley commented May 28, 2018

I took out the clipping, added some dtype checking for float vs. integer type.

Note that in Python 3, there technically is no maximum value for ints, although 9223372036854775807 is what is returned by np.iinfo(np.int).max.
Assuming majority of use cases for norm=False will either have data float or as (if it were) int16, the code now accepts this.

Because there is now no clipping, it is possible to generate runtime errors if data exceeds the appropriate range. Users will need to clip on their own before calling Audio object.

I could put the clipping back in, at least to cover floats and int16's. ...? or add another boolean kwarg called "clip" to turn clipping on or off?

@eranegozy
Copy link

I was just about to look into this myself, when I saw this PR. Would love to have this merged in if possible.

@takluyver
Copy link
Member

Sorry for the delay. I'm happy with the API, but I'm not actually sure that this would raise an error with out-of-range values - it looks like numpy's type conversion accepts out-of-range values and wraps around:

In [15]: np.int16([2**18], wrap=False)                                          
Out[15]: array([0], dtype=int16)

Presumably this would result in distorted audio or noise depending on how far beyond the range the numbers are. I think throwing an error on unexpected input would be preferable. I can't see any functionality in numpy to cast and check for overflows, so we may have to construct the array first, check it for overflows, and then cast to int16.

@takluyver
Copy link
Member

And I think an error is preferable to clipping. If you're outside either range (+-1 or int16), there's a good chance that you're on totally the wrong scale, not just a little bit off, so the resulting audio will be terrible. It's much easier to figure out the problem from an error message than from mangled audio.

@eranegozy
Copy link

Hmm.. I personally think that if norm=False, internally, clipping is the right thing to do, like with
np.clip(x, -1, 1)
But really, if a user sets norm=False, they are then responsible for understanding their data's range, and what the API expects. I don't think that the current proposed implementation, of accepting values either (-1, 1) for float or (-2^15, +2^15) for non-float is quite right.

One way around this is to have the user specify the max value through the API. So instead of norm=<True | False>, pass in max-value. So max-value=None would be the current default, meaning "Audio calculates the max value from the data and normalizes by that" and max-value=<number> would mean use this number as the max-value.

@astrocox
Copy link

I was just digging to see if it's possible to disable normalization, so 👍 for this whole thread!
I agree with @eranegozy that clipping would be preferable to an error if a user explicitly sets norm=False, but all these options seem reasonable and it would be awesome to see any of them merged 😄

@matangover
Copy link
Contributor

matangover commented Mar 6, 2019

I would like to help with this, but unsure how. Thanks @drscotthawley for your previous efforts! My understanding of the current state:

  • Right now Audio supports arrays (or lists) of several data types (floating-point, signed ints), because it always normalizes by the largest input sample. Unsigned ints (PCM8) are not supported. See common data formats.

  • Adding norm=False requires Audio to know the input data boundaries (min/max sample value, or only max if supporting only signed data types). Possible solutions:

    1. Guess the data boundaries based on the numpy array dtype. This is what scipy.io.wavfile.write does. Cons: implicit and might be confusing. Requires detailed documentation. Also, for list input we can either guess based on the list values (as @drscotthawley did), which is even more implicit, or throw an error for norm=False.

    2. Support only floating-point audio in range [-1, 1] for norm=False. Easiest to document and implement. Very straightforward for users. Seems like this is what most users want, although I didn't conduct a survey. :) Informative error can be thrown in case data is out of range, very easy for users to normalize their data if needed.

    3. Require user to specify data format ('Explicit is better than implicit'...). This is what PySoundFile does (subtype is set manually). @takluyver and @eranegozy have suggested adding a maxvalue parameter. I suggest instead adding a wav_format parameter that takes values 'float', 'PCM_16', 'PCM_24', 'PCM_32', 'PCM_U8' (values from soundfile.available_subtypes('wav')). Pros: more readable than 'maxvalue', supports all common formats including unsigned PCM8 (maxvalue=1337 is possible but definitely not needed). wav_format could be set to float by default.

Regarding clipping, I think that if the boundaries are made explicit then an error makes sense.

@takluyver would you accept a PR for option 2 or 3?

@matangover
Copy link
Contributor

Or if you prefer I can just continue @drscotthawley's work and fix as per your comments for out-of-range values (option 1).

@drscotthawley
Copy link
Author

Hey folks, I'm ok with whatever you decide to do. For my own work I've moved on, and I just always use a modified version of the Audio class where I've disabled the normalization. I understand that may not fit in the grand scheme of what all other users need.
If you want to delete/close this PR and submit a different one, that's fine with me, or keep this going... Your call.

@eranegozy
Copy link

eranegozy commented Mar 8, 2019 via email

@takluyver
Copy link
Member

I'm fine with option 2 (only support floats -1 to +1). Option 3 strikes me as offputting complexity, as I don't know about the details of the wav format - but if that's preferred by others, I'm happy to review a PR.

I still maintain that out-of-range values should cause an error rather than be clipped. As above, if your data goes beyond +- 1, it may well be on the wrong scale completely, not just slightly out.

@eranegozy
Copy link

eranegozy commented Mar 13, 2019 via email

@matangover
Copy link
Contributor

OK, great. Thanks for your input everyone. Working on option 2.

@matangover
Copy link
Contributor

@drscotthawley feel free to close this one as #11650 has been merged 🎉

@drscotthawley
Copy link
Author

@matangover ok, done!

@Carreau Carreau added this to the no action milestone Mar 21, 2019
@musicnerd
Copy link

This is still not in the docs, and I get an error using the most up-to-date version that "init() got an unexpected keyword argument 'norm'". Am I missing something?

@matangover
Copy link
Contributor

Hi @musicnerd, the argument name is normalize. I can see it here in the docs. Does this work?

@musicnerd
Copy link

Thanks @matangover I was on an older version of the docs. Oops! Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants