Skip to content

Conversation

@iamgreaser
Copy link
Contributor

@iamgreaser iamgreaser commented Dec 6, 2023

Summary

This reworks the renderer to be internally chunky rather than planar. In doing so, the only lookup tables I needed boiled down to something that only needs to be in a single 32-bit integer and shifted around.

I focused entirely on the "turn 32 bits of VRAM data into 4 or 8 pixels" part. There shouldn't be any regression in functionality.

I isolated this implementation and the implementation it replaces so I could run multiple iterations of a 320x200 output as fast as possible. My results from running this under the prestigious compatibility standard known as My Machine™ indicates that this implementation is almost twice as fast as what it replaces, except in the case of high-resolution 8bpp where for some reason it renders at about 7x the speed...?! [EDIT: This was using -Os optimisation (optimised for size). On -Ofast the gap was about 5x.]

I'm not sure why there's such a horrible slowdown for hires 8bpp in the current renderer but at least the new one seems to take almost twice as long as doing lores 8bpp, so at least that's somewhat sane...

Checklist

  • I have discussed this with core contributors already

References

I referred to the shifter definitions in the 82C451 VGA clone datasheet.

@OBattler OBattler merged commit cd3d268 into 86Box:master Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants