Skip to content

rebase to master#4

Merged
akmed772 merged 4 commits into
akmed772:masterfrom
86Box:master
Feb 22, 2025
Merged

rebase to master#4
akmed772 merged 4 commits into
akmed772:masterfrom
86Box:master

Conversation

@akmed772

Copy link
Copy Markdown
Owner

No description provided.

jriwanek and others added 4 commits February 21, 2025 16:41
PC Enterprises GameMaster, Resound jr (adlib)

MS Booster/PC Enterprises jrBus-Mouse, PC Enterprises GameMaster (bus mouse)

Various (generic) RTC

Corel LS2000 SCSI
@akmed772 akmed772 merged commit 23045cc into akmed772:master Feb 22, 2025
akmed772 pushed a commit that referenced this pull request Mar 18, 2026
* Phase 4: Color/alpha combine pipeline for ARM64 Voodoo JIT

Implement the complete color and alpha combine stages translating
x86-64 codegen lines ~1689-2228 to ARM64/NEON instructions.

Color select pipeline:
- CC_LOCALSELECT_ITER_RGB, CC_LOCALSELECT_TEX, CC_LOCALSELECT_COLOR1
- Local color select override via tex_a bit 7 (TBZ/TBNZ branching)

Chroma key test:
- Compare selected RGB source against params->chromaKey (24-bit mask)
- Skip pixel on match using CBZ forward branch

Alpha pipeline:
- Alpha select: A_SEL_ITER_A (with CLAMP), A_SEL_TEX, A_SEL_COLOR1
- CCA local select: ITER_A, COLOR0, ITER_Z (with CLAMP)
- Alpha mask test via TBZ on bit 0
- Full CCA combine: zero_other, sub_clocal, mselect (ZERO/ALOCAL/
  AOTHER/ALOCAL2/TEX), reverse_blend, multiply+shift, add, clamp,
  invert_output

Color combine pipeline:
- cc_zero_other, cc_sub_clocal using NEON 4x16 arithmetic
- cc_mselect: ZERO, CLOCAL, AOTHER, ALOCAL, TEX, TEXRGB
- Reverse blend (XOR with 0xFF + add 1)
- Signed multiply via SMULL+SSHR+SQXTN (3 insns vs 5 on SSE2)
- cc_add (add clocal back)
- SQXTUN pack + cc_invert_output
- Result saved to v13 for fog stage

Fix skip position patching: chroma uses CBZ (PATCH_FORWARD_CBxZ),
alpha mask uses TBZ (PATCH_FORWARD_TBxZ).


* Update checklist: mark Phase 4 color/alpha combine items complete


* Add JIT validation logging for ARM64 Voodoo codegen

Add rate-limited diagnostic logging at three critical JIT pipeline
points to verify code generation and execution during testing:

1. Cache HIT (first 20 occurrences) - logs block reuse with mode params
2. Code GENERATE (unlimited) - logs every JIT compilation with full config
3. Code EXECUTE (first 50 occurrences) - logs JIT dispatch with coordinates

All logging is gated behind VOODOO_JIT_DEBUG / VOODOO_JIT_DEBUG_EXEC
defines (set to 1) and uses pclog() for output to the 86Box log file.
Set to 0 to disable before release.


* Fix Phase 1 stack frame size and comment

Validation found that the prologue comment claimed d14/d15 were saved
at SP-144, but the actual code only saves d8-d13 (3 NEON register pairs).
Since v14/v15 are never used in the generated code, this reduces the
frame size from 144 to 128 bytes, saving 16 bytes per JIT call.

Changes:
- Remove misleading "SP-144: d14, d15" comment
- Reduce frame size from 144 to 128 bytes (prologue and epilogue)
- Stack remains 16-byte aligned (128 = 8 × 16)

Addresses finding from voodoo-arch validation of Phase 1.


* Update changelog: Phase 4 validation and frame size fix


* Update checklist: mark validation complete for phases 1-4

All 4 phases validated against official 3dfx specifications:
- Phase 1: ARM64 ABI compliance verified, frame size optimized
- Phase 2: All depth test modes validated
- Phase 3: Texture pipeline validated, upstream bug fix verified
- Phase 4: Color/alpha combine validated, ARM64 improvements noted
akmed772 pushed a commit that referenced this pull request Mar 18, 2026
…+R2-08)

R2-07: Eliminate ebp_store memory round-trip in bilinear path. Hold bilinear
lookup index in w17 (IP1 scratch) instead of STR/LDR through STATE_ebp_store.
Saves 1 STR + 1 LDR per bilinear-textured pixel.

R2-12: Replace 8 FMOV+DUP_V4H_LANE(x,x,0) pairs with single DUP_V4H_GPR.
All sites broadcast values in 0-255 range (alpha, LOD frac, detail blend),
so 16-bit GPR-to-vector broadcast is semantically identical. Sites:
TC_MSELECT_DETAIL (TMU0/1), TC_MSELECT_LOD_FRAC (TMU0/1),
CC_MSELECT_AOTHER, CC_MSELECT_ALOCAL (both paths), CC_MSELECT_TEX.

R2-08: Cache original LOD in w11 before ADD w6,w6,#4 in point-sample path.
Eliminates LDR w11,[x0,#STATE_lod] reload in S clamp/wrap section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants