rebase to master#4
Merged
Merged
Conversation
PC Enterprises GameMaster, Resound jr (adlib) MS Booster/PC Enterprises jrBus-Mouse, PC Enterprises GameMaster (bus mouse) Various (generic) RTC Corel LS2000 SCSI
Some improvements to IBM PCjr
akmed772
pushed a commit
that referenced
this pull request
Mar 18, 2026
* Phase 4: Color/alpha combine pipeline for ARM64 Voodoo JIT Implement the complete color and alpha combine stages translating x86-64 codegen lines ~1689-2228 to ARM64/NEON instructions. Color select pipeline: - CC_LOCALSELECT_ITER_RGB, CC_LOCALSELECT_TEX, CC_LOCALSELECT_COLOR1 - Local color select override via tex_a bit 7 (TBZ/TBNZ branching) Chroma key test: - Compare selected RGB source against params->chromaKey (24-bit mask) - Skip pixel on match using CBZ forward branch Alpha pipeline: - Alpha select: A_SEL_ITER_A (with CLAMP), A_SEL_TEX, A_SEL_COLOR1 - CCA local select: ITER_A, COLOR0, ITER_Z (with CLAMP) - Alpha mask test via TBZ on bit 0 - Full CCA combine: zero_other, sub_clocal, mselect (ZERO/ALOCAL/ AOTHER/ALOCAL2/TEX), reverse_blend, multiply+shift, add, clamp, invert_output Color combine pipeline: - cc_zero_other, cc_sub_clocal using NEON 4x16 arithmetic - cc_mselect: ZERO, CLOCAL, AOTHER, ALOCAL, TEX, TEXRGB - Reverse blend (XOR with 0xFF + add 1) - Signed multiply via SMULL+SSHR+SQXTN (3 insns vs 5 on SSE2) - cc_add (add clocal back) - SQXTUN pack + cc_invert_output - Result saved to v13 for fog stage Fix skip position patching: chroma uses CBZ (PATCH_FORWARD_CBxZ), alpha mask uses TBZ (PATCH_FORWARD_TBxZ). * Update checklist: mark Phase 4 color/alpha combine items complete * Add JIT validation logging for ARM64 Voodoo codegen Add rate-limited diagnostic logging at three critical JIT pipeline points to verify code generation and execution during testing: 1. Cache HIT (first 20 occurrences) - logs block reuse with mode params 2. Code GENERATE (unlimited) - logs every JIT compilation with full config 3. Code EXECUTE (first 50 occurrences) - logs JIT dispatch with coordinates All logging is gated behind VOODOO_JIT_DEBUG / VOODOO_JIT_DEBUG_EXEC defines (set to 1) and uses pclog() for output to the 86Box log file. Set to 0 to disable before release. * Fix Phase 1 stack frame size and comment Validation found that the prologue comment claimed d14/d15 were saved at SP-144, but the actual code only saves d8-d13 (3 NEON register pairs). Since v14/v15 are never used in the generated code, this reduces the frame size from 144 to 128 bytes, saving 16 bytes per JIT call. Changes: - Remove misleading "SP-144: d14, d15" comment - Reduce frame size from 144 to 128 bytes (prologue and epilogue) - Stack remains 16-byte aligned (128 = 8 × 16) Addresses finding from voodoo-arch validation of Phase 1. * Update changelog: Phase 4 validation and frame size fix * Update checklist: mark validation complete for phases 1-4 All 4 phases validated against official 3dfx specifications: - Phase 1: ARM64 ABI compliance verified, frame size optimized - Phase 2: All depth test modes validated - Phase 3: Texture pipeline validated, upstream bug fix verified - Phase 4: Color/alpha combine validated, ARM64 improvements noted
akmed772
pushed a commit
that referenced
this pull request
Mar 18, 2026
…+R2-08) R2-07: Eliminate ebp_store memory round-trip in bilinear path. Hold bilinear lookup index in w17 (IP1 scratch) instead of STR/LDR through STATE_ebp_store. Saves 1 STR + 1 LDR per bilinear-textured pixel. R2-12: Replace 8 FMOV+DUP_V4H_LANE(x,x,0) pairs with single DUP_V4H_GPR. All sites broadcast values in 0-255 range (alpha, LOD frac, detail blend), so 16-bit GPR-to-vector broadcast is semantically identical. Sites: TC_MSELECT_DETAIL (TMU0/1), TC_MSELECT_LOD_FRAC (TMU0/1), CC_MSELECT_AOTHER, CC_MSELECT_ALOCAL (both paths), CC_MSELECT_TEX. R2-08: Cache original LOD in w11 before ADD w6,w6,#4 in point-sample path. Eliminates LDR w11,[x0,#STATE_lod] reload in S clamp/wrap section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.