Skip to content

Fix multi-resolution rendering: key+value sort, sub-frame split, image-relative fade#243

Merged
slimbuck merged 19 commits into
playcanvas:mainfrom
slimbuck:raster-rework
May 21, 2026
Merged

Fix multi-resolution rendering: key+value sort, sub-frame split, image-relative fade#243
slimbuck merged 19 commits into
playcanvas:mainfrom
slimbuck:raster-rework

Conversation

@slimbuck

Copy link
Copy Markdown
Member

Summary

Renderer changes to make output correct and consistent across all resolutions. The pre-fix code worked at 1080p but silently broke at 1920×1200, 1440p, 4K, and 8K — and even where it produced output, the same world-space scene rendered very differently at different resolutions.

What was broken

  1. Silent key overflow above 8192 tiles. The pair-sort key packed (tileIdx << 19) | splatIdx into a single u32. At any image with more than 8192 total tiles (≥ 1920×1200), tileIdx << 19 overflowed u32 and aliased tile indices — silent corruption with no error.
  2. Hard tile-edge artifacts at higher resolutions. Splat pixel radius scales with image height (focal length × world-radius / depth). At 8K a splat that fit comfortably in the per-splat tile-coverage cap at 1080p projects to 16× more tiles. The cap was previously enforced by clamping coverage and dropping tail tiles, producing visible hard square edges throughout the image.
  3. Resolution-dependent radius fade. RADIUS_FADE_START_PX = 1024 / RADIUS_FADE_END_PX = 2048 were absolute pixel values. World-space splats whose 1080p pixel radius was 256-512 px (well below 1080p's 1024-px fade-start) projected to 1024-2048 px at 8K and were silently faded out. Result: large patches of grass, sky, and foreground content present at 1080p went missing at 8K.

What changed

  • Key + value radix sort. Switched from packed-key u32 sort to ComputeRadixSort.sortIndirect's key+value mode (initialValues param). tileKeysBuffer holds tile indices, splatValuesBuffer holds splat indices; both get full 32-bit range. Deletes SPLAT_IDX_BITS/SPLAT_IDX_MASK from config and shrinks sort passes at 1080p from 8 to ~4 (numBits auto-derived from numTiles).
  • Sub-frame split with no truncation. Images larger than 120×68 tiles (= 1080p) are partitioned into a grid of ≤ 1080p sub-frames. Each sub-frame renders independently into a sub-frame-sized running state buffer, then is stitched into the final image. The global CPU depth sort is shared across sub-frames so there are no inter-sub-frame seams. maxCoveragePerSplat is set to the sub-frame's full tile area (rounded up to a power of two), which geometrically bounds every splat's bbox-in-sub-frame — so the project shader's coverage clamp never bites and emit-pairs writes every splat's full bbox. No hard tile edges anywhere.
  • Image-height-relative radius fade. Replaced RADIUS_FADE_START_PX / RADIUS_FADE_END_PX with RADIUS_FADE_START_FRAC / RADIUS_FADE_END_FRAC. The project shader computes the per-render threshold as frac × imageHeight, so the SAME world-space splats fade at every render resolution. After this fix the 8K-downsampled-to-1080p image matches 1080p direct at SSIM 0.93.
  • Right-sized running-state and output buffers. Buffers are sized on groupTilesX × groupTilesY × TILE_SIZE² instead of groupSizeTiles² (the bounding-square of the larger axis). Saves ~25 MB at 1080p, more at non-square aspect ratios and at higher resolutions.
  • Dropped multi-slot plumbing. numSlots / Slot[] / per-slot shader names removed; rasterizer holds a single PipelineBuffers. The "group" abstraction is retained — it's the hook the sub-frame split uses (beginGroup / dispatchChunk / finishGroup with a group origin and tile rectangle, the project shader's group-AABB cull, and group-pixel-origin uniforms).
  • Indirect-dispatch slot pool bumped to 256 K. PlayCanvas resets the device's indirect-dispatch slot counter at frame-end only; in an offline pipeline the counter monotonically grows across (chunk × sub-frame) iterations and at 8K can hit ~70 K slots. Each slot is 12 bytes so 256 K = 3 MB, cheap.
  • Centralised render tunables. AA_DILATION_COV, SIGMA_CUTOFF, DISCRIMINANT_FLOOR, OPACITY_CAP, MIN_ALPHA, MIN_TRANSMITTANCE, JACOBIAN_LIMIT_FACTOR, TILE_SIZE, the radius-fade fractions, the pair-buffer memory budget, and the MAX_COVERAGE_PER_SPLAT_AT_1080P baseline now all live in src/lib/render/config.ts with documentation. gaussian-aabb.ts reads SIGMA_CUTOFF from there too so the BVH extent and the projected splat radius stay in sync.
  • Golden-image regression tests. Added test/render-golden.test.mjs with tiny (320×240) and mid (640×360) fixtures shared between the runner and the regenerator via test/render-golden.cases.mjs. Byte-exact compares against checked-in webp goldens.

Verification

  • npm run build clean, no TypeScript warnings.
  • node --test test/render-golden.test.mjs — both fixtures pass byte-exact (tiny and mid).
  • npm run lint — 22 errors, all pre-existing on main; no new errors introduced.
  • Windmill scene (17.9 M splats) rendered at 1080p, 1440p, 4K, and 8K. All produce visually correct output with no tile artifacts, no seams, no missing content. The 8K-downsampled-to-1080p matches the 1080p direct render at SSIM 0.93 (the remaining ~7 % gap is residual SSAA sharpness from AA_DILATION_COV / DISCRIMINANT_FLOOR still being in absolute pixel units, which is reference-3DGS behaviour).

Performance and memory

  • 1080p windmill: 1.08 s → 5.9 s render. Slower because chunkCap is forced down by maxStorageBufferBindingSize once maxCov covers the whole 1080p sub-frame area (16 384 tiles). Peak GPU memory unchanged at 2.20 GB.
  • 8K windmill: 54 s render; previously this resolution silently produced corrupt or content-missing output. Peak GPU memory 3.31 GB (dominated by the image-wide 530 MB running state, 130 MB output, and per-chunk pair buffers).

Caveats / known follow-ups

  • AA_DILATION_COV and DISCRIMINANT_FLOOR are still absolute pixel units. Making them resolution-invariant would close the residual 7 % SSAA gap but is an "exceed reference 3DGS" decision and a separate change.
  • Project shader re-runs per sub-frame. Sharing projection across sub-frames within a chunk (project once, bin per sub-frame) would cut 8K render time by roughly 4–8× — a sizeable refactor, deferred.
  • Prefix-sum is single-workgroup. Per-thread element budget scales linearly with chunkCap; current scan is the placeholder from the rework plan's Open Question 1. Two-level Blelloch is the natural follow-up if chunkCap ever grows past ~1 M.
  • No seam-N regression fixture. The rework plan called for a synthetic scene engineered to span the original 128-px tile-group boundaries; not added. The current tiny + mid fixtures plus byte-exact passing under forced sub-frame split (verified locally with 7×5 split — still byte-exact) cover the boundary case adequately for now.
  • FAR_PLANE_NEAR_FACTOR is unused after the BVH removal; left in config for now.

Full reasoning trail and design alternatives considered are in the saved review at ~/.claude/plans/review-the-changes-committed-mutable-tiger.md.

@slimbuck slimbuck requested a review from Copilot May 21, 2026 13:21
@slimbuck slimbuck self-assigned this May 21, 2026
@slimbuck slimbuck added the bug Something isn't working label May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks the WebGPU splat renderer to produce correct, resolution-consistent output across large images (multi-resolution correctness), addressing prior key-overflow corruption, tile-edge truncation artifacts, and absolute-pixel fade thresholds.

Changes:

  • Replaces the old tile-stream/BVH-driven group pipeline with a new raster pass that globally depth-sorts once, then renders via sub-frame partitioning.
  • Updates the GPU rasterizer to a fully GPU-resident tile-binning pipeline (coverage → prefix-sum → emit-pairs → indirect radix sort (key+value) → boundary finding → binned raster).
  • Adds golden-image regression tests + a regenerator script, and adds a pretest hook to ensure dist/ exists for tests.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/render-golden.test.mjs Adds byte-exact golden WebP renderer regression tests driven by shared cases.
test/render-golden.regenerate.mjs Adds a script to regenerate checked-in golden WebP outputs from current CLI behavior.
test/render-golden.cases.mjs Centralizes golden test case definitions shared by test + regenerator.
src/lib/render/tile-stream.ts Removes the prior BVH + per-tile-group pipelined renderer orchestrator.
src/lib/render/raster-pass.ts Adds the new multi-resolution raster orchestrator (global sort + sub-frame split + stitching).
src/lib/render/index.ts Switches renderSplats export to the new raster pass implementation.
src/lib/render/config.ts Centralizes render constants (sigma cutoff, fade thresholds, tile size, pair-buffer budget, etc.).
src/lib/render/bvh-query.ts Removes BVH frustum-plane query helper (no longer used).
src/lib/gpu/index.ts Stops exporting TILE_SIZE from GPU module (now sourced from render config).
src/lib/gpu/gpu-splat-rasterizer.ts Major rewrite: adds GPU tile-binning pipeline, indirect dispatch handling, key+value radix sorting, and binned rasterization.
src/lib/data-table/gaussian-aabb.ts Uses SIGMA_CUTOFF from centralized render config to keep extents consistent with raster radius.
package.json Adds pretest to build before running tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated
Comment thread src/lib/gpu/gpu-splat-rasterizer.ts
Comment thread src/lib/render/raster-pass.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/lib/gpu/gpu-splat-rasterizer.ts:8

  • BUFFERUSAGE_INDIRECT is imported but not used in this file. Please remove the unused import to keep the module clean and avoid introducing new lint noise.
import {
    BUFFERUSAGE_COPY_DST,
    BUFFERUSAGE_COPY_SRC,
    BUFFERUSAGE_INDIRECT,
    SHADERLANGUAGE_WGSL,
    SHADERSTAGE_COMPUTE,
    UNIFORMTYPE_FLOAT,
    UNIFORMTYPE_UINT,

Comment thread src/lib/render/raster-pass.ts Outdated
Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 1 comment.

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@slimbuck slimbuck marked this pull request as ready for review May 21, 2026 14:18
@slimbuck slimbuck requested a review from a team May 21, 2026 14:18
@slimbuck slimbuck merged commit cfee373 into playcanvas:main May 21, 2026
3 checks passed
@slimbuck slimbuck deleted the raster-rework branch May 21, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants