Skip to content

Fix WMFDecoder delivering audio data as float (#1315)#1316

Merged
benmoran56 merged 7 commits intopyglet:masterfrom
Square789:master
Jun 7, 2025
Merged

Fix WMFDecoder delivering audio data as float (#1315)#1316
benmoran56 merged 7 commits intopyglet:masterfrom
Square789:master

Conversation

@Square789
Copy link
Copy Markdown
Contributor

@Square789 Square789 commented May 9, 2025

  • Do not consider float data to be uncompressed [EDIT: bit misleading, of course it's still "uncompressed", but indigestable by pyglet], change the debug messages accordingly
  • Additionally safeguard WAVEFORMATEX structs (this might cause breakage in isolated cases, but worth it to prevent further ear-splitting audio incidents).

@caffeinepills
Copy link
Copy Markdown
Collaborator

caffeinepills commented May 9, 2025

It actually looks like this should work, I think the issue is this:

def create_xa2_waveformat(audio_format):
wfx = lib.WAVEFORMATEX()
wfx.wFormatTag = lib.WAVE_FORMAT_PCM
wfx.nChannels = audio_format.channels
wfx.nSamplesPerSec = audio_format.sample_rate
wfx.wBitsPerSample = audio_format.sample_size
wfx.nBlockAlign = wfx.wBitsPerSample * wfx.nChannels // 8
wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign
return wfx

I am forcing it to WAVE_FORMAT_PCM when the format is not guaranteed to be PCM.

If I hardcode wfx.wFormatTag = lib.WAVE_FORMAT_PCM to wfx.wFormatTag = 3 (which is WAVE_FORMAT_IEEE_FLOAT, from the source WAVEFORMATEX) then it does not distort.

@benmoran56
Copy link
Copy Markdown
Member

benmoran56 commented May 10, 2025

@caffeinepills Could we do a small mapper to find the right values? Something like?

wfx.wFormatTag = {32: WAVE_FORMAT_IEEE_FLOAT, 16: WAVE_FORMAT_PCM}[audio_format.sample_size]

A simple lookup table to cover 8, 16, 24, and 32 of course. If you think it's worth supporting 32bit audio.
I think it's pretty rare, just due to how huge those files can be.

@Square789
Copy link
Copy Markdown
Contributor Author

Square789 commented May 10, 2025

It actually looks like this should work, I think the issue is this:

def create_xa2_waveformat(audio_format):
wfx = lib.WAVEFORMATEX()
wfx.wFormatTag = lib.WAVE_FORMAT_PCM
wfx.nChannels = audio_format.channels
wfx.nSamplesPerSec = audio_format.sample_rate
wfx.wBitsPerSample = audio_format.sample_size
wfx.nBlockAlign = wfx.wBitsPerSample * wfx.nChannels // 8
wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign
return wfx

I am forcing it to WAVE_FORMAT_PCM when the format is not guaranteed to be PCM.

If I hardcode wfx.wFormatTag = lib.WAVE_FORMAT_PCM to wfx.wFormatTag = 3 (which is WAVE_FORMAT_IEEE_FLOAT, from the source WAVEFORMATEX) then it does not distort.

Doing that instructs the voice to play float samples, which causes voice initialization to fail when playing the "healthy" sound, as it only has 16 bits/sample

@Square789
Copy link
Copy Markdown
Contributor Author

Square789 commented May 10, 2025

@caffeinepills Could we do a small mapper to find the right values? Something like?

wfx.wFormatTag = {32: WAVE_FORMAT_IEEE_FLOAT, 16: WAVE_FORMAT_PCM}[audio_format.sample_size]

A simple lookup table to cover 8, 16, 24, and 32 of course. If you think it's worth supporting 32bit audio. I think it's pretty rare, just due to how huge those files can be.

pyglet atm is hardwired to a somewhat common standard (At least going by XAudio2 and OpenAL) of playing back samples in u8 or s16 format, only 1 or 2 channels supported. Testing for a sample size of 32 is technically not enough to be sure we are dealing with float samples (There appear to be more formats that require the WAVEFORMATEXTENSIBLE structure because they only use 20 or 24 of a given 32 bits) (Additionally, there seem to be very rare cases where 16bit floats are used, so allowing both PCM and Float might cause confusion there) [EDIT: 16bit float wav files do not seem to be a thing]. 32bit audio probably delivers higher quality, so i can understand why it should be supported, but to be safe i would want to add another attribute to AudioFormat specifying whether we are dealing with PCM or Float, or outright create generalized formats that can be handled by all audio backends.

@Square789
Copy link
Copy Markdown
Contributor Author

Square789 commented May 10, 2025

For reference: #1315 (comment)
My intention with this PR is to make the WMFDecoder similar to the FFMpegDecoder, both resampling f32 to u16.
Although it does happen implicitly, as

mf_mediatype.SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
mf_mediatype.SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)

does not specify the sample size (which can probably be done through MF_MT_AUDIO_BITS_PER_SAMPLE as per https://learn.microsoft.com/en-us/windows/win32/medfound/uncompressed-audio-media-types). 16 bits apparently just happens to be the default.

@benmoran56
Copy link
Copy Markdown
Member

Thanks for the info, that makes sense. I vaguely remember running into what you are describing a while ago, with trying to support 24bit Wav files.
We should probably just be defaulting to the built-in/standard library decoder on Windows for .wav files as well (fraction less resource usage). That would also help enforce the baseline.

@Square789
Copy link
Copy Markdown
Contributor Author

The built-in wav decoder did take precedence on both Windows and Linux, but failed since it can't handle format "3", which i guess is exactly the same format in the WAVEFORMATEX header, so WAVE_FORMAT_IEEE_FLOAT.

@caffeinepills
Copy link
Copy Markdown
Collaborator

caffeinepills commented May 10, 2025

What about trying to convert to float, via:

mf_mediatype.SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
mf_mediatype.SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float)

If that fails, fallback to PCM, if both fail, we just mark as unsupported.

We would probably have to expand our audio format to include the format type such as float or uint, that way the WAVEFORMATEX knows which to use for the source voice.

@Square789
Copy link
Copy Markdown
Contributor Author

Unless updated too, it would render OpenAL inoperable on Windows (which admittably is very unlikely to be used there) and amend pyglet's decoder contract by "On Windows, decoders may return float samples", which i guess is also fine for the WMFDecoder.
Still, creating a platform discrepancy like that and assuming that 32bit <=> float is flaky. According to the XAudio2 docs i linked, 32bit size PCM samples also exist.
A good solution would be to create more certainty on the data by including sample types in the audio format that can then with certainly be resolved to e.g. PA_SAMPLE_S16LE, AL_FORMAT_{STEREO|MONO}16 and WAVE_FORMAT_PCM + 16.
Before something like that, imo it's safest to stick to "8 bits means u8" and "16 bits means s16" and enforce that in the decoders.
For OpenAL playing back floats requires an extension and may not be entirely guaranteed. I am not sure just how commonplace openal-soft is.

@caffeinepills
Copy link
Copy Markdown
Collaborator

caffeinepills commented May 14, 2025

Unless updated too, it would render OpenAL inoperable on Windows (which admittably is very unlikely to be used there) and amend pyglet's decoder contract by "On Windows, decoders may return float samples", which i guess is also fine for the WMFDecoder. Still, creating a platform discrepancy like that and assuming that 32bit <=> float is flaky. According to the XAudio2 docs i linked, 32bit size PCM samples also exist. A good solution would be to create more certainty on the data by including sample types in the audio format that can then with certainly be resolved to e.g. PA_SAMPLE_S16LE, AL_FORMAT_{STEREO|MONO}16 and WAVE_FORMAT_PCM + 16. Before something like that, imo it's safest to stick to "8 bits means u8" and "16 bits means s16" and enforce that in the decoders. For OpenAL playing back floats requires an extension and may not be entirely guaranteed. I am not sure just how commonplace openal-soft is.

We could query OpenAL for the support. However, in these cases, perhaps we need to have the drivers define which data types it supports. That way it can use the AudioFormat effectively (or even ask the decoder to give it a supported format?).

In the worst case (for WMF anyways), we can always try to convert Float to PCM. If we drop the or guid_compressed == MFAudioFormat_Float, it should put WMF on to the path to convert the source to PCM.
With testing his example, the WAVEFORMATEX after does convert down: WAVEFORMATEX(wFormatTag=1, nChannels=2, nSamplesPerSec=44100, nAvgBytesPersec=176400, nBlockAlign=4, wBitsPerSample=16, cbSize=0). The wFormatTag became 1. If you want to add an exception to ensure the format is always PCM after converting, then that's quick fix.

However I am unsure about this, as this silently lowers their audio quality. (On the other hand, at least it's not crashing their app or preventing them from using their audio), so perhaps a warning should be placed when it downsamples?

@Square789
Copy link
Copy Markdown
Contributor Author

Created a few more files that should cover most of the wav formats and current behavior:

PCM Integer, 8 bit: u8 PCM Integer, 16 bit: s16 PCM Integer, 24 bit: s24 PCM Integer, 32 bit: s32 PCM Integer, 64 bit: s64 IEEE754 Float, 32 bit: f32 IEEE754 Float, 64 bit: f64
Used decoder wave wave wave¹ wave¹ wave¹ WMF/FFmpeg WMF/FFmpeg
Pre-PR, XAudio2 Ok Ok Ok² Ok² WinError when creating Voice Painful sound WinError when creating Voice
Pre-PR, OpenAL Ok Ok MediaException, previously no AudioFormat MediaException, previously no AudioFormat MediaException, previously no AudioFormat Downsampled to s16 MediaException, previously no AudioFormat
With this PR, XAudio2 Ok Ok MediaException MediaException MediaException Downsampled to s16, implicitly Downsampled to s16, implicitly

¹ Since Python 3.12. This is a funny edge case, as, outside of Windows, s24 or s32 files that previously may have been decoded by FFmpeg (albeit without an AudioFormat) will now produce an AudioFormat and then an error during OpenAL format selection. Not too interesting, nothing is getting played either way.
² We only use the WAVEFORMATEX struct (with the format tag WAVE_FORMAT_PCM). The Xaudio2 docs state: "If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16.". This type of data should not be returned from the decoder and works with XAudio2 unofficially.

The FFmpeg decoder handles a more limited amount of sample sizes and simply provides an AudioFormat of None if it encounters an unsupported format. It should be easy to alter it to also downsample 24bit, integer 32bit and 64bit audio.

In reference to ², this PR breaks playability of Integer PCM files that would actually play before, if the sample size is >16 bit. Officially this working is not guaranteed, but i can undo the AudioFormat guards.

I also noticed that floats appear to be the go-to sample format in WebAudio, which is relevant for pyglet 3.0 and worth considering. The only problem with supporting it is that OpenAL can't be entirely guaranteed to have the AL_EXT_float32 extension, so if it were to be supported, pyglet's requirements need to be updated to mention that the extension is required. That change is probably not a good idea inbetween minor versions.

so perhaps a warning should be placed when it downsamples

Pyglet states (although in a poor place, the AudioFormat docstring only) and assumes that it only ever handles samples in the u8 and s16 formats, a limit primarily given by the extensionless OpenAL support range. The downsampling not happening is, in my eyes, an oversight in both the wave and WMF decoder.

My primary intention with this PR was to fix the very loud and distorted result from interpreting float audio as integer PCM in Xaudio2. But seeing that it also breaks some other formats as visible in the table, now i'm not sure what to do:

  1. Decide that raising the MediaExceptions is correct and accept the PR as-is (fixing the decoders seperately and keeping the formats to u8, s16 for now.)
  2. Downsample float audio to s16 only and keep the wave+WMF grey zones intact (undo the XAudio2+DirectSound format checks)
  3. Downsample float audio to s16, change the WaveDecoder to raise a MediaError when >16bit, so the FFmpegDecoder takes over again
  4. Change the WMFDecoder so it will downsample all Integer PCM formats >16bit to 16bit
  5. Expand the FFmpegDecoder to downsample s24, s32, f64 and s64
  6. Modify all audio backends to support at least f32 right now (maybe s24, s32 as well), taking OpenAL extension presence for granted. f64 and s64 can not be played back by any backend (and are also complete overkill), so will still need to be downsampled or otherwise protected against.

@Square789
Copy link
Copy Markdown
Contributor Author

After a discussion with @benmoran56, what i'm trying to do now is to have decoders be mindful of the data they return, keeping it u8, s16 and at 1 or 2 channels. Supposedly s16 is fine for normal audio playback, anything higher is audible only in specific conditions: 1, 2
So far, only the wave, FFmpeg and WMF decoders check for both channel count and sample size.
There was another idea on making AudioDrivers report formats they are capable of playing back, which decoders can then respect, but so far that is not implemented. Should it be implemented in this PR?

These commits make the WMFDecoder very likely to fall into the resampling/decoding branch (as the specific conditions it tests for are covered by the standard wave decoder and i'm not sure another format exists that produces the MFAudioFormat_PCM subtype.)
The WMFDecoder is now probably too verbose in its target format description. @caffeinepills let me know whether this is fine.

The FFMpegDecoder would always apply a resampler anyways and now always resamples to s16, unless the decoded data is u8. It also had some limits on formats and sample sizes, which have been removed.

The wave decoder now raises an exception when encountering a channel count not in (1, 2) or a sample width (in bytes) not in (1, 2), to have a decoder with resampling capabilities take over.

@benmoran56
Copy link
Copy Markdown
Member

There was another idea on making AudioDrivers report formats they are capable of playing back, which decoders can then respect, but so far that is not implemented. Should it be implemented in this PR?

There was the idea that we can add maximum bitrate (and potential other) arguments to the MediaDecoder class inits. This can be a solution for users that know their audio driver (backend) will support Source data in that format. It will also require a user to specifically make an instance of the Decoder, so would apply to people who are targeting a specific platform and formats.

IMO, lets leave this to a next PR so as not to over complicate things.

@benmoran56 benmoran56 merged commit 47e315f into pyglet:master Jun 7, 2025
16 checks passed
@fladd
Copy link
Copy Markdown

fladd commented Mar 9, 2026

@benmoran56 @caffeinepills @Square789

Mmh, I think there is a better way of handling this. Pyglet and all of its supported audio drivers are perfectly capable of playing 32 bit int and float formats (well, OpenAL needs extensions for this, but at least the float32 extension is usually present, including on macos). Note that 24-bit audio is most often put into a 32bit int container with the least significant bits being padded with zeros. So this is handled by 32bit int support as well. What is needed is indeed a way to differentiate between 32 int and 32 float within Pyglet. One simple way to do this is to give the audio_format definition a property called sample_type and then include this in the logic to select the right output format given the input format. In the case one of the 32bit format is not available (usually int for OpenAL), one can convert to the other (usually float for OpenAL) without quality loss. Only in case both are not available (probably OpenAL on Windows), one should convert down to 16bit, but this should optimally involve dithering with noise-shaping in order to retain most of the dynamic range of 24/32 bit.

I have implemented all of this (but only for FFmpeg) in my Zipped Album Player, where I (1) detect which of Pyglet's drivers are available on a given system, (2) monkey patch all of them to support 32bit int and float (if available), (3) make each driver know the difference between int float representation, and (4) subclass FFmpegSource to deal with all of this correctly. I also gave FFmpegSource a class variable that can be set from the outside to always force a certain output format irrespective of the input (like you do now forcing everything to 16bit) if someone wishes to do that.

Have a look at https://github.com/zipped-album/zap/blob/main/zap/player.py for details. If this is something that you are interested in having in Pyglet natively, I am happy to send a pull request. Just let me know.

@benmoran56
Copy link
Copy Markdown
Member

Hi @fladd, I pinged you on the Discord server, but we can continue the discussion here if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants