Fix WMFDecoder delivering audio data as float (#1315) by Square789 · Pull Request #1316 · pyglet/pyglet

Square789 · 2025-05-09T23:00:43Z

Do not consider float data to be uncompressed [EDIT: bit misleading, of course it's still "uncompressed", but indigestable by pyglet], change the debug messages accordingly
Additionally safeguard WAVEFORMATEX structs (this might cause breakage in isolated cases, but worth it to prevent further ear-splitting audio incidents).

caffeinepills · 2025-05-09T23:46:30Z

It actually looks like this should work, I think the issue is this:

pyglet/pyglet/media/drivers/xaudio2/interface.py

Lines 29 to 37 in 9bf8939

    
           def create_xa2_waveformat(audio_format): 
        
               wfx = lib.WAVEFORMATEX() 
        
               wfx.wFormatTag = lib.WAVE_FORMAT_PCM 
        
               wfx.nChannels = audio_format.channels 
        
               wfx.nSamplesPerSec = audio_format.sample_rate 
        
               wfx.wBitsPerSample = audio_format.sample_size 
        
               wfx.nBlockAlign = wfx.wBitsPerSample * wfx.nChannels // 8 
        
               wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign 
        
               return wfx

I am forcing it to WAVE_FORMAT_PCM when the format is not guaranteed to be PCM.

If I hardcode wfx.wFormatTag = lib.WAVE_FORMAT_PCM to wfx.wFormatTag = 3 (which is WAVE_FORMAT_IEEE_FLOAT, from the source WAVEFORMATEX) then it does not distort.

benmoran56 · 2025-05-10T00:19:43Z

@caffeinepills Could we do a small mapper to find the right values? Something like?

wfx.wFormatTag = {32: WAVE_FORMAT_IEEE_FLOAT, 16: WAVE_FORMAT_PCM}[audio_format.sample_size]

A simple lookup table to cover 8, 16, 24, and 32 of course. If you think it's worth supporting 32bit audio.
I think it's pretty rare, just due to how huge those files can be.

Square789 · 2025-05-10T00:23:08Z

It actually looks like this should work, I think the issue is this:

pyglet/pyglet/media/drivers/xaudio2/interface.py

Lines 29 to 37 in 9bf8939

def create_xa2_waveformat(audio_format):

wfx = lib.WAVEFORMATEX()

wfx.wFormatTag = lib.WAVE_FORMAT_PCM

wfx.nChannels = audio_format.channels

wfx.nSamplesPerSec = audio_format.sample_rate

wfx.wBitsPerSample = audio_format.sample_size

wfx.nBlockAlign = wfx.wBitsPerSample * wfx.nChannels // 8

wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign

return wfx

I am forcing it to WAVE_FORMAT_PCM when the format is not guaranteed to be PCM.

If I hardcode wfx.wFormatTag = lib.WAVE_FORMAT_PCM to wfx.wFormatTag = 3 (which is WAVE_FORMAT_IEEE_FLOAT, from the source WAVEFORMATEX) then it does not distort.

Doing that instructs the voice to play float samples, which causes voice initialization to fail when playing the "healthy" sound, as it only has 16 bits/sample

Square789 · 2025-05-10T00:36:14Z

@caffeinepills Could we do a small mapper to find the right values? Something like?
wfx.wFormatTag = {32: WAVE_FORMAT_IEEE_FLOAT, 16: WAVE_FORMAT_PCM}[audio_format.sample_size]
A simple lookup table to cover 8, 16, 24, and 32 of course. If you think it's worth supporting 32bit audio. I think it's pretty rare, just due to how huge those files can be.

pyglet atm is hardwired to a somewhat common standard (At least going by XAudio2 and OpenAL) of playing back samples in u8 or s16 format, only 1 or 2 channels supported. Testing for a sample size of 32 is technically not enough to be sure we are dealing with float samples (There appear to be more formats that require the WAVEFORMATEXTENSIBLE structure because they only use 20 or 24 of a given 32 bits) ~~(Additionally, there seem to be very rare cases where 16bit floats are used, so allowing both PCM and Float might cause confusion there)~~ [EDIT: 16bit float wav files do not seem to be a thing]. 32bit audio probably delivers higher quality, so i can understand why it should be supported, but to be safe i would want to add another attribute to AudioFormat specifying whether we are dealing with PCM or Float, or outright create generalized formats that can be handled by all audio backends.

Square789 · 2025-05-10T00:46:12Z

For reference: #1315 (comment)
My intention with this PR is to make the WMFDecoder similar to the FFMpegDecoder, both resampling f32 to u16.
Although it does happen implicitly, as

mf_mediatype.SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
mf_mediatype.SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)

does not specify the sample size (which can probably be done through MF_MT_AUDIO_BITS_PER_SAMPLE as per https://learn.microsoft.com/en-us/windows/win32/medfound/uncompressed-audio-media-types). 16 bits apparently just happens to be the default.

benmoran56 · 2025-05-10T00:49:19Z

Thanks for the info, that makes sense. I vaguely remember running into what you are describing a while ago, with trying to support 24bit Wav files.
We should probably just be defaulting to the built-in/standard library decoder on Windows for .wav files as well (fraction less resource usage). That would also help enforce the baseline.

Square789 · 2025-05-10T01:04:12Z

The built-in wav decoder did take precedence on both Windows and Linux, but failed since it can't handle format "3", which i guess is exactly the same format in the WAVEFORMATEX header, so WAVE_FORMAT_IEEE_FLOAT.

caffeinepills · 2025-05-10T02:46:49Z

What about trying to convert to float, via:

mf_mediatype.SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)
mf_mediatype.SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float)

If that fails, fallback to PCM, if both fail, we just mark as unsupported.

We would probably have to expand our audio format to include the format type such as float or uint, that way the WAVEFORMATEX knows which to use for the source voice.

Square789 · 2025-05-10T03:21:03Z

Unless updated too, it would render OpenAL inoperable on Windows (which admittably is very unlikely to be used there) and amend pyglet's decoder contract by "On Windows, decoders may return float samples", which i guess is also fine for the WMFDecoder.
Still, creating a platform discrepancy like that and assuming that 32bit <=> float is flaky. According to the XAudio2 docs i linked, 32bit size PCM samples also exist.
A good solution would be to create more certainty on the data by including sample types in the audio format that can then with certainly be resolved to e.g. PA_SAMPLE_S16LE, AL_FORMAT_{STEREO|MONO}16 and WAVE_FORMAT_PCM + 16.
Before something like that, imo it's safest to stick to "8 bits means u8" and "16 bits means s16" and enforce that in the decoders.
For OpenAL playing back floats requires an extension and may not be entirely guaranteed. I am not sure just how commonplace openal-soft is.

caffeinepills · 2025-05-14T04:03:56Z

Unless updated too, it would render OpenAL inoperable on Windows (which admittably is very unlikely to be used there) and amend pyglet's decoder contract by "On Windows, decoders may return float samples", which i guess is also fine for the WMFDecoder. Still, creating a platform discrepancy like that and assuming that 32bit <=> float is flaky. According to the XAudio2 docs i linked, 32bit size PCM samples also exist. A good solution would be to create more certainty on the data by including sample types in the audio format that can then with certainly be resolved to e.g. PA_SAMPLE_S16LE, AL_FORMAT_{STEREO|MONO}16 and WAVE_FORMAT_PCM + 16. Before something like that, imo it's safest to stick to "8 bits means u8" and "16 bits means s16" and enforce that in the decoders. For OpenAL playing back floats requires an extension and may not be entirely guaranteed. I am not sure just how commonplace openal-soft is.

We could query OpenAL for the support. However, in these cases, perhaps we need to have the drivers define which data types it supports. That way it can use the AudioFormat effectively (or even ask the decoder to give it a supported format?).

In the worst case (for WMF anyways), we can always try to convert Float to PCM. If we drop the or guid_compressed == MFAudioFormat_Float, it should put WMF on to the path to convert the source to PCM.
With testing his example, the WAVEFORMATEX after does convert down: WAVEFORMATEX(wFormatTag=1, nChannels=2, nSamplesPerSec=44100, nAvgBytesPersec=176400, nBlockAlign=4, wBitsPerSample=16, cbSize=0). The wFormatTag became 1. If you want to add an exception to ensure the format is always PCM after converting, then that's quick fix.

However I am unsure about this, as this silently lowers their audio quality. (On the other hand, at least it's not crashing their app or preventing them from using their audio), so perhaps a warning should be placed when it downsamples?

Square789 · 2025-05-15T01:12:29Z

Created a few more files that should cover most of the wav formats and current behavior:

	PCM Integer, 8 bit: u8	PCM Integer, 16 bit: s16	PCM Integer, 24 bit: s24	PCM Integer, 32 bit: s32	PCM Integer, 64 bit: s64	IEEE754 Float, 32 bit: f32	IEEE754 Float, 64 bit: f64
Used decoder	wave	wave	wave¹	wave¹	wave¹	WMF/FFmpeg	WMF/FFmpeg
Pre-PR, XAudio2	Ok	Ok	Ok²	Ok²	WinError when creating Voice	Painful sound	WinError when creating Voice
Pre-PR, OpenAL	Ok	Ok	MediaException, previously no AudioFormat	MediaException, previously no AudioFormat	MediaException, previously no AudioFormat	Downsampled to s16	MediaException, previously no AudioFormat
With this PR, XAudio2	Ok	Ok	MediaException	MediaException	MediaException	Downsampled to s16, implicitly	Downsampled to s16, implicitly

¹ Since Python 3.12. This is a funny edge case, as, outside of Windows, s24 or s32 files that previously may have been decoded by FFmpeg (albeit without an AudioFormat) will now produce an AudioFormat and then an error during OpenAL format selection. Not too interesting, nothing is getting played either way.
² We only use the WAVEFORMATEX struct (with the format tag WAVE_FORMAT_PCM). The Xaudio2 docs state: "If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16.". This type of data should not be returned from the decoder and works with XAudio2 unofficially.

The FFmpeg decoder handles a more limited amount of sample sizes and simply provides an AudioFormat of None if it encounters an unsupported format. It should be easy to alter it to also downsample 24bit, integer 32bit and 64bit audio.

In reference to ², this PR breaks playability of Integer PCM files that would actually play before, if the sample size is >16 bit. Officially this working is not guaranteed, but i can undo the AudioFormat guards.

I also noticed that floats appear to be the go-to sample format in WebAudio, which is relevant for pyglet 3.0 and worth considering. The only problem with supporting it is that OpenAL can't be entirely guaranteed to have the AL_EXT_float32 extension, so if it were to be supported, pyglet's requirements need to be updated to mention that the extension is required. That change is probably not a good idea inbetween minor versions.

so perhaps a warning should be placed when it downsamples

Pyglet states (although in a poor place, the AudioFormat docstring only) and assumes that it only ever handles samples in the u8 and s16 formats, a limit primarily given by the extensionless OpenAL support range. The downsampling not happening is, in my eyes, an oversight in both the wave and WMF decoder.

My primary intention with this PR was to fix the very loud and distorted result from interpreting float audio as integer PCM in Xaudio2. But seeing that it also breaks some other formats as visible in the table, now i'm not sure what to do:

Decide that raising the MediaExceptions is correct and accept the PR as-is (fixing the decoders seperately and keeping the formats to u8, s16 for now.)
Downsample float audio to s16 only and keep the wave+WMF grey zones intact (undo the XAudio2+DirectSound format checks)
Downsample float audio to s16, change the WaveDecoder to raise a MediaError when >16bit, so the FFmpegDecoder takes over again
Change the WMFDecoder so it will downsample all Integer PCM formats >16bit to 16bit
Expand the FFmpegDecoder to downsample s24, s32, f64 and s64
Modify all audio backends to support at least f32 right now (maybe s24, s32 as well), taking OpenAL extension presence for granted. f64 and s64 can not be played back by any backend (and are also complete overkill), so will still need to be downsampled or otherwise protected against.

- Be explicit about 16bit

…cifying target format

…ove duplicate channel count readout

Square789 · 2025-06-05T01:55:08Z

After a discussion with @benmoran56, what i'm trying to do now is to have decoders be mindful of the data they return, keeping it u8, s16 and at 1 or 2 channels. Supposedly s16 is fine for normal audio playback, anything higher is audible only in specific conditions: 1, 2
So far, only the wave, FFmpeg and WMF decoders check for both channel count and sample size.
There was another idea on making AudioDrivers report formats they are capable of playing back, which decoders can then respect, but so far that is not implemented. Should it be implemented in this PR?

These commits make the WMFDecoder very likely to fall into the resampling/decoding branch (as the specific conditions it tests for are covered by the standard wave decoder and i'm not sure another format exists that produces the MFAudioFormat_PCM subtype.)
The WMFDecoder is now probably too verbose in its target format description. @caffeinepills let me know whether this is fine.

The FFMpegDecoder would always apply a resampler anyways and now always resamples to s16, unless the decoded data is u8. It also had some limits on formats and sample sizes, which have been removed.

The wave decoder now raises an exception when encountering a channel count not in (1, 2) or a sample width (in bytes) not in (1, 2), to have a decoder with resampling capabilities take over.

benmoran56 · 2025-06-05T02:01:27Z

There was another idea on making AudioDrivers report formats they are capable of playing back, which decoders can then respect, but so far that is not implemented. Should it be implemented in this PR?

There was the idea that we can add maximum bitrate (and potential other) arguments to the MediaDecoder class inits. This can be a solution for users that know their audio driver (backend) will support Source data in that format. It will also require a user to specifically make an instance of the Decoder, so would apply to people who are targeting a specific platform and formats.

IMO, lets leave this to a next PR so as not to over complicate things.

fladd · 2026-03-09T00:29:21Z

@benmoran56 @caffeinepills @Square789

Mmh, I think there is a better way of handling this. Pyglet and all of its supported audio drivers are perfectly capable of playing 32 bit int and float formats (well, OpenAL needs extensions for this, but at least the float32 extension is usually present, including on macos). Note that 24-bit audio is most often put into a 32bit int container with the least significant bits being padded with zeros. So this is handled by 32bit int support as well. What is needed is indeed a way to differentiate between 32 int and 32 float within Pyglet. One simple way to do this is to give the audio_format definition a property called sample_type and then include this in the logic to select the right output format given the input format. In the case one of the 32bit format is not available (usually int for OpenAL), one can convert to the other (usually float for OpenAL) without quality loss. Only in case both are not available (probably OpenAL on Windows), one should convert down to 16bit, but this should optimally involve dithering with noise-shaping in order to retain most of the dynamic range of 24/32 bit.

I have implemented all of this (but only for FFmpeg) in my Zipped Album Player, where I (1) detect which of Pyglet's drivers are available on a given system, (2) monkey patch all of them to support 32bit int and float (if available), (3) make each driver know the difference between int float representation, and (4) subclass FFmpegSource to deal with all of this correctly. I also gave FFmpegSource a class variable that can be set from the outside to always force a certain output format irrespective of the input (like you do now forcing everything to 16bit) if someone wishes to do that.

Have a look at https://github.com/zipped-album/zap/blob/main/zap/player.py for details. If this is something that you are interested in having in Pyglet natively, I am happy to send a pull request. Just let me know.

benmoran56 · 2026-03-09T08:00:07Z

Hi @fladd, I pinged you on the Discord server, but we can continue the discussion here if you prefer.

Fix WMFDecoder delivering data as float (pyglet#1315)

577cae9

Square789 added 6 commits June 5, 2025 00:07

Have wave decoder fail on sample widths that are not 8 or 16

6444edb

WMF Decoder: test for sample size and resample non-8/16bit to 16 bit

445e316

- Be explicit about 16bit

FFmpeg decoder: Resample everything to S16

4381b0e

wave decoder: Raise error on bad channel count

3d662fb

WMFDecoder: Check for channel count mismatch and be more explicit spe…

9e6ffb0

…cifying target format

WMFDecoder: Protect against missing BITS_PER_SAMPLE attribute and rem…

cb3d450

…ove duplicate channel count readout

benmoran56 approved these changes Jun 7, 2025

View reviewed changes

benmoran56 merged commit 47e315f into pyglet:master Jun 7, 2025
16 checks passed

fladd mentioned this pull request Mar 11, 2026

Better 32-bit audio support #1415

Open

Conversation

Square789 commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caffeinepills commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benmoran56 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Square789 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Square789 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Square789 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benmoran56 commented May 10, 2025

Uh oh!

Square789 commented May 10, 2025

Uh oh!

caffeinepills commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Square789 commented May 10, 2025

Uh oh!

caffeinepills commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Square789 commented May 15, 2025

Uh oh!

Square789 commented Jun 5, 2025

Uh oh!

benmoran56 commented Jun 5, 2025

Uh oh!

Uh oh!

fladd commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benmoran56 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Square789 commented May 9, 2025 •

edited

Loading

caffeinepills commented May 9, 2025 •

edited

Loading

benmoran56 commented May 10, 2025 •

edited

Loading

Square789 commented May 10, 2025 •

edited

Loading

Square789 commented May 10, 2025 •

edited

Loading

Square789 commented May 10, 2025 •

edited

Loading

caffeinepills commented May 10, 2025 •

edited

Loading

caffeinepills commented May 14, 2025 •

edited

Loading

fladd commented Mar 9, 2026 •

edited

Loading