Fix WMFDecoder delivering audio data as float (#1315)#1316
Fix WMFDecoder delivering audio data as float (#1315)#1316benmoran56 merged 7 commits intopyglet:masterfrom
Conversation
|
It actually looks like this should work, I think the issue is this: pyglet/pyglet/media/drivers/xaudio2/interface.py Lines 29 to 37 in 9bf8939 I am forcing it to If I hardcode |
|
@caffeinepills Could we do a small mapper to find the right values? Something like? wfx.wFormatTag = {32: WAVE_FORMAT_IEEE_FLOAT, 16: WAVE_FORMAT_PCM}[audio_format.sample_size]A simple lookup table to cover 8, 16, 24, and 32 of course. If you think it's worth supporting 32bit audio. |
Doing that instructs the voice to play float samples, which causes voice initialization to fail when playing the "healthy" sound, as it only has 16 bits/sample |
pyglet atm is hardwired to a somewhat common standard (At least going by XAudio2 and OpenAL) of playing back samples in u8 or s16 format, only 1 or 2 channels supported. Testing for a sample size of 32 is technically not enough to be sure we are dealing with float samples (There appear to be more formats that require the WAVEFORMATEXTENSIBLE structure because they only use 20 or 24 of a given 32 bits) |
|
For reference: #1315 (comment) mf_mediatype.SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
mf_mediatype.SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)does not specify the sample size (which can probably be done through |
|
Thanks for the info, that makes sense. I vaguely remember running into what you are describing a while ago, with trying to support 24bit Wav files. |
|
The built-in wav decoder did take precedence on both Windows and Linux, but failed since it can't handle format "3", which i guess is exactly the same format in the |
|
What about trying to convert to float, via: If that fails, fallback to PCM, if both fail, we just mark as unsupported. We would probably have to expand our audio format to include the format type such as float or uint, that way the WAVEFORMATEX knows which to use for the source voice. |
|
Unless updated too, it would render OpenAL inoperable on Windows (which admittably is very unlikely to be used there) and amend pyglet's decoder contract by "On Windows, decoders may return float samples", which i guess is also fine for the WMFDecoder. |
We could query OpenAL for the support. However, in these cases, perhaps we need to have the drivers define which data types it supports. That way it can use the In the worst case (for WMF anyways), we can always try to convert Float to PCM. If we drop the However I am unsure about this, as this silently lowers their audio quality. (On the other hand, at least it's not crashing their app or preventing them from using their audio), so perhaps a warning should be placed when it downsamples? |
|
Created a few more files that should cover most of the wav formats and current behavior:
¹ Since Python 3.12. This is a funny edge case, as, outside of Windows, s24 or s32 files that previously may have been decoded by FFmpeg (albeit without an AudioFormat) will now produce an AudioFormat and then an error during OpenAL format selection. Not too interesting, nothing is getting played either way. The FFmpeg decoder handles a more limited amount of sample sizes and simply provides an In reference to ², this PR breaks playability of Integer PCM files that would actually play before, if the sample size is >16 bit. Officially this working is not guaranteed, but i can undo the AudioFormat guards. I also noticed that floats appear to be the go-to sample format in WebAudio, which is relevant for pyglet 3.0 and worth considering. The only problem with supporting it is that OpenAL can't be entirely guaranteed to have the
Pyglet states (although in a poor place, the My primary intention with this PR was to fix the very loud and distorted result from interpreting float audio as integer PCM in Xaudio2. But seeing that it also breaks some other formats as visible in the table, now i'm not sure what to do:
|
- Be explicit about 16bit
…cifying target format
…ove duplicate channel count readout
|
After a discussion with @benmoran56, what i'm trying to do now is to have decoders be mindful of the data they return, keeping it u8, s16 and at 1 or 2 channels. Supposedly s16 is fine for normal audio playback, anything higher is audible only in specific conditions: 1, 2 These commits make the WMFDecoder very likely to fall into the resampling/decoding branch (as the specific conditions it tests for are covered by the standard wave decoder and i'm not sure another format exists that produces the The FFMpegDecoder would always apply a resampler anyways and now always resamples to s16, unless the decoded data is u8. It also had some limits on formats and sample sizes, which have been removed. The wave decoder now raises an exception when encountering a channel count |
There was the idea that we can add maximum bitrate (and potential other) arguments to the MediaDecoder class inits. This can be a solution for users that know their audio driver (backend) will support Source data in that format. It will also require a user to specifically make an instance of the Decoder, so would apply to people who are targeting a specific platform and formats. IMO, lets leave this to a next PR so as not to over complicate things. |
|
@benmoran56 @caffeinepills @Square789 Mmh, I think there is a better way of handling this. Pyglet and all of its supported audio drivers are perfectly capable of playing 32 bit int and float formats (well, OpenAL needs extensions for this, but at least the float32 extension is usually present, including on macos). Note that 24-bit audio is most often put into a 32bit int container with the least significant bits being padded with zeros. So this is handled by 32bit int support as well. What is needed is indeed a way to differentiate between 32 int and 32 float within Pyglet. One simple way to do this is to give the I have implemented all of this (but only for FFmpeg) in my Zipped Album Player, where I (1) detect which of Pyglet's drivers are available on a given system, (2) monkey patch all of them to support 32bit int and float (if available), (3) make each driver know the difference between int float representation, and (4) subclass FFmpegSource to deal with all of this correctly. I also gave FFmpegSource a class variable that can be set from the outside to always force a certain output format irrespective of the input (like you do now forcing everything to 16bit) if someone wishes to do that. Have a look at https://github.com/zipped-album/zap/blob/main/zap/player.py for details. If this is something that you are interested in having in Pyglet natively, I am happy to send a pull request. Just let me know. |
|
Hi @fladd, I pinged you on the Discord server, but we can continue the discussion here if you prefer. |
WAVEFORMATEXstructs (this might cause breakage in isolated cases, but worth it to prevent further ear-splitting audio incidents).