added 'norm' kwarg to make Audio scaling normalization optional by drscotthawley · Pull Request #11161 · ipython/ipython

drscotthawley · 2018-05-25T22:45:24Z

Following up on Issue #8608

takluyver · 2018-05-27T10:19:11Z

BTW, there's no need to make a new PR if you spot another thing you need to change. Just push to the same branch, and the existing PR is updated. This is how we work if a reviewer asks for a change as well.

takluyver · 2018-05-27T10:24:41Z

Looks good; can I ask you to add a whatsnew snippet for it?

drscotthawley · 2018-05-28T00:50:12Z

@takluyver Sure thing! Just did.
Also, am now clipping the non-normed audio just to be on the safe side; otherwise you get runtime errors if the abs() value of the audio exceeds 1.

takluyver · 2018-05-28T13:05:18Z

Hmm, the thing about clipping makes me wonder if this is actually a good API. E.g. if you pass in data as 16-bit integers, which appears to be the native data type of .wav files, it will clip them to +/- 1 and you won't hear anything useful.

Maybe if you don't want the data normalised by max absolute value, you should pass in a number to use instead? I.e. in your case, you would pass in 1.0, to say that your waveform is within the range +/- 1. If your data goes outside that range, it would then throw an error.

I don't know what to call that parameter, or if that even makes any sense.

drscotthawley · 2018-05-28T17:15:29Z

Oh, ok, interesting. I've only ever used floating-point -1 to 1. I looked at the code for scipy.io.wavfile.write, and it looks as though one could clip or scale by the appropriate maximum-possible value(s), based on detecting the datatype of the input signal. I could work on that in the next day or two.

drscotthawley · 2018-05-28T21:25:29Z

I took out the clipping, added some dtype checking for float vs. integer type.

Note that in Python 3, there technically is no maximum value for ints, although 9223372036854775807 is what is returned by np.iinfo(np.int).max.
Assuming majority of use cases for norm=False will either have data float or as (if it were) int16, the code now accepts this.

Because there is now no clipping, it is possible to generate runtime errors if data exceeds the appropriate range. Users will need to clip on their own before calling Audio object.

I could put the clipping back in, at least to cover floats and int16's. ...? or add another boolean kwarg called "clip" to turn clipping on or off?

eranegozy · 2018-08-09T15:10:40Z

I was just about to look into this myself, when I saw this PR. Would love to have this merged in if possible.

takluyver · 2018-08-09T20:07:39Z

Sorry for the delay. I'm happy with the API, but I'm not actually sure that this would raise an error with out-of-range values - it looks like numpy's type conversion accepts out-of-range values and wraps around:

In [15]: np.int16([2**18], wrap=False)                                          
Out[15]: array([0], dtype=int16)

Presumably this would result in distorted audio or noise depending on how far beyond the range the numbers are. I think throwing an error on unexpected input would be preferable. I can't see any functionality in numpy to cast and check for overflows, so we may have to construct the array first, check it for overflows, and then cast to int16.

takluyver · 2018-08-09T20:12:45Z

And I think an error is preferable to clipping. If you're outside either range (+-1 or int16), there's a good chance that you're on totally the wrong scale, not just a little bit off, so the resulting audio will be terrible. It's much easier to figure out the problem from an error message than from mangled audio.

eranegozy · 2018-08-10T14:41:09Z

Hmm.. I personally think that if norm=False, internally, clipping is the right thing to do, like with
np.clip(x, -1, 1)
But really, if a user sets norm=False, they are then responsible for understanding their data's range, and what the API expects. I don't think that the current proposed implementation, of accepting values either (-1, 1) for float or (-2^15, +2^15) for non-float is quite right.

One way around this is to have the user specify the max value through the API. So instead of norm=<True | False>, pass in max-value. So max-value=None would be the current default, meaning "Audio calculates the max value from the data and normalizes by that" and max-value=<number> would mean use this number as the max-value.

astrocox · 2018-08-20T21:08:25Z

I was just digging to see if it's possible to disable normalization, so 👍 for this whole thread!
I agree with @eranegozy that clipping would be preferable to an error if a user explicitly sets norm=False, but all these options seem reasonable and it would be awesome to see any of them merged 😄

matangover · 2019-03-06T23:48:24Z

I would like to help with this, but unsure how. Thanks @drscotthawley for your previous efforts! My understanding of the current state:

Right now Audio supports arrays (or lists) of several data types (floating-point, signed ints), because it always normalizes by the largest input sample. Unsigned ints (PCM8) are not supported. See common data formats.
Adding norm=False requires Audio to know the input data boundaries (min/max sample value, or only max if supporting only signed data types). Possible solutions:
1. Guess the data boundaries based on the numpy array dtype. This is what scipy.io.wavfile.write does. Cons: implicit and might be confusing. Requires detailed documentation. Also, for list input we can either guess based on the list values (as @drscotthawley did), which is even more implicit, or throw an error for norm=False.
2. Support only floating-point audio in range [-1, 1] for norm=False. Easiest to document and implement. Very straightforward for users. Seems like this is what most users want, although I didn't conduct a survey. :) Informative error can be thrown in case data is out of range, very easy for users to normalize their data if needed.
3. Require user to specify data format ('Explicit is better than implicit'...). This is what PySoundFile does (subtype is set manually). @takluyver and @eranegozy have suggested adding a maxvalue parameter. I suggest instead adding a wav_format parameter that takes values 'float', 'PCM_16', 'PCM_24', 'PCM_32', 'PCM_U8' (values from soundfile.available_subtypes('wav')). Pros: more readable than 'maxvalue', supports all common formats including unsigned PCM8 (maxvalue=1337 is possible but definitely not needed). wav_format could be set to float by default.

Regarding clipping, I think that if the boundaries are made explicit then an error makes sense.

@takluyver would you accept a PR for option 2 or 3?

matangover · 2019-03-07T18:04:49Z

Or if you prefer I can just continue @drscotthawley's work and fix as per your comments for out-of-range values (option 1).

drscotthawley · 2019-03-07T18:10:47Z

Hey folks, I'm ok with whatever you decide to do. For my own work I've moved on, and I just always use a modified version of the Audio class where I've disabled the normalization. I understand that may not fit in the grand scheme of what all other users need.
If you want to delete/close this PR and submit a different one, that's fine with me, or keep this going... Your call.

eranegozy · 2019-03-08T02:41:04Z

FWIW, my usage case is indeed floating-point [-1, 1] for norm=False case, so I would be happy with option 2. But I can see option 3 being more general. If we do go with that, I think wav_format=float by default is a good idea. Also, I am not a fan of raising an error if values are out of bound. I would advocate clipping to the min/max values in that case. This is what some audio drivers would do anyway. Eran

…

On Mar 6, 2019, at 6:48 PM, Matan Gover ***@***.***> wrote: I would like to help with this, but unsure how. Thanks @drscotthawley <https://github.com/drscotthawley> for your previous efforts! My understanding of the current state: Right now Audio supports arrays (or lists) of several data types (floating-point, signed ints), because it always normalizes by the largest input sample. Unsigned ints (PCM8) are not supported. See common data formats <https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html#id1>. Adding norm=False requires Audio to know the input data boundaries (min/max sample value, or only max if supporting only signed data types). Possible solutions: Guess the data boundaries based on the numpy array dtype. This is what scipy.io.wavfile.write does. Cons: implicit and might be confusing. Requires detailed documentation. Also, for list input we can either guess based on the list values (as @drscotthawley <https://github.com/drscotthawley> did), which is even more implicit, or throw an error for norm=False. Support only floating-point audio in range [-1, 1] for norm=False. Easiest to document and implement. Very straightforward for users. Seems like this is what most users want, although I didn't conduct a survey :) Require user to specify data format ('Explicit is better than implicit'...). This is what PySoundFile does (subtype is set manually <https://pysoundfile.readthedocs.io/en/0.9.0/#soundfile.read>). @takluyver <https://github.com/takluyver> and @eranegozy <https://github.com/eranegozy> have suggested adding a maxvalue parameter. I suggest instead adding a wav_format parameter that takes values 'float', 'PCM_16', 'PCM_24', 'PCM_32', 'PCM_U8' (values from soundfile.available_subtypes('wav')). Pros: more readable than 'maxvalue', supports all common formats including unsigned PCM8 (maxvalue=1337 is possible but definitely not needed). wav_format could be set to float by default. Regarding clipping, I think that if the boundaries are made explicit then an error makes sense. @takluyver <https://github.com/takluyver> would you accept a PR for option 2 or 3? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI2rTqXKUJ1S5UqhW3FscjYr_ZsGM-2Dks5vUFPegaJpZM4UOqGV>.

takluyver · 2019-03-12T20:38:09Z

I'm fine with option 2 (only support floats -1 to +1). Option 3 strikes me as offputting complexity, as I don't know about the details of the wav format - but if that's preferred by others, I'm happy to review a PR.

I still maintain that out-of-range values should cause an error rather than be clipped. As above, if your data goes beyond +- 1, it may well be on the wrong scale completely, not just slightly out.

eranegozy · 2019-03-13T14:22:20Z

On Mar 12, 2019, at 4:38 PM, Thomas Kluyver ***@***.***> wrote: I'm fine with option 2 (only support floats -1 to +1). Option 3 strikes me as offputting complexity, as I don't know about the details of the wav format - but if that's preferred by others, I'm happy to review a PR.

I agree with this.

I still maintain that out-of-range values should cause an error rather than be clipped. As above, if your data goes beyond +- 1, it may well be on the wrong scale completely, not just slightly out.

That’s fine. That argument makes sense. -Eran

matangover · 2019-03-14T15:09:08Z

OK, great. Thanks for your input everyone. Working on option 2.

matangover · 2019-03-19T21:26:47Z

@drscotthawley feel free to close this one as #11650 has been merged 🎉

drscotthawley · 2019-03-20T07:11:31Z

@matangover ok, done!

musicnerd · 2020-01-20T15:33:54Z

This is still not in the docs, and I get an error using the most up-to-date version that "init() got an unexpected keyword argument 'norm'". Am I missing something?

matangover · 2020-01-20T17:03:57Z

Hi @musicnerd, the argument name is normalize. I can see it here in the docs. Does this work?

musicnerd · 2020-01-21T17:17:31Z

Thanks @matangover I was on an older version of the docs. Oops! Thank you :)

drscotthawley added 2 commits May 25, 2018 16:21

added 'norm' kwarg for normalization, default=True

815b666

Other location of 'scaled' endowed with norm kwarg

92cd190

drscotthawley mentioned this pull request May 25, 2018

added 'norm' kwarg for normalization, default=True #11160

Closed

drscotthawley added 2 commits May 27, 2018 19:35

clip -1,-1 for non-normed audio, avoid RunTime err

dc49fb4

Added PR .rst descripting norm kward to Audio obj

369c16a

drscotthawley added 2 commits May 27, 2018 19:52

Added PR .rst descripting norm kwarg to Audio obj

80c3c74

Merge branch 'master' of github.com:drscotthawley/ipython

b317a6e

For norm=False, check dtype for max val; no clip

de7d8ee

robclewley mentioned this pull request Aug 7, 2018

Add audio normalization parameter when available robclewley/TqdmAudioRicker#2

Open

matangover mentioned this pull request Mar 6, 2019

Volume normalization in IPython.display.Audio should be optional #8608

Closed

matangover mentioned this pull request Mar 15, 2019

Add 'normalize' argument to Audio to make normalization optional #11650

Merged

drscotthawley closed this Mar 20, 2019

Carreau added this to the no action milestone Mar 21, 2019

Uh oh!

Conversation

drscotthawley commented May 25, 2018

Uh oh!

takluyver commented May 27, 2018

Uh oh!

takluyver commented May 27, 2018

Uh oh!

drscotthawley commented May 28, 2018

Uh oh!

takluyver commented May 28, 2018

Uh oh!

drscotthawley commented May 28, 2018

Uh oh!

drscotthawley commented May 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eranegozy commented Aug 9, 2018

Uh oh!

takluyver commented Aug 9, 2018

Uh oh!

takluyver commented Aug 9, 2018

Uh oh!

eranegozy commented Aug 10, 2018

Uh oh!

astrocox commented Aug 20, 2018

Uh oh!

matangover commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matangover commented Mar 7, 2019

Uh oh!

drscotthawley commented Mar 7, 2019

Uh oh!

eranegozy commented Mar 8, 2019 via email

Uh oh!

takluyver commented Mar 12, 2019

Uh oh!

eranegozy commented Mar 13, 2019 via email

Uh oh!

matangover commented Mar 14, 2019

Uh oh!

matangover commented Mar 19, 2019

Uh oh!

drscotthawley commented Mar 20, 2019

Uh oh!

musicnerd commented Jan 20, 2020

Uh oh!

matangover commented Jan 20, 2020

Uh oh!

musicnerd commented Jan 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

drscotthawley commented May 28, 2018 •

edited

Loading

matangover commented Mar 6, 2019 •

edited

Loading