Skip to content

BlasEntry::m_linkedInstances: std::vector → std::unordered_set for O(1) unlink#142

Open
BinqAdams wants to merge 1 commit into
NVIDIAGameWorks:mainfrom
BinqAdams:fix/m_linkedInstances-unordered-set
Open

BlasEntry::m_linkedInstances: std::vector → std::unordered_set for O(1) unlink#142
BinqAdams wants to merge 1 commit into
NVIDIAGameWorks:mainfrom
BinqAdams:fix/m_linkedInstances-unordered-set

Conversation

@BinqAdams
Copy link
Copy Markdown
Contributor

Summary

  • BlasEntry::unlinkInstance() was O(N) (std::find + swap-and-pop on a vector). With many same-geometry transient instances sharing one BlasEntry, end-of-frame GC ran N unlinks at O(N) each — O(N²) per frame.
  • Switching m_linkedInstances to std::unordered_set<RtInstance*> makes unlink O(1) average. linkInstance becomes .insert(); unlinkInstance becomes .erase(). No public-API behavior change.
  • Present since 98314db1 (Aug 2022). Doesn't bite typical NV target games (Portal RTX, HL2 RTX) because meshes there generally have unique vertex content and don't collide into one BlasEntry via exactMatch. Surfaces in games with many same-geometry single-bone draws — Painkiller's FFP indexed-blend skinning is one such pattern, where hundreds of particle billboards emitted from skinned joints share geometry+material and collapse onto one BlasEntry.

Compatibility

  • getLinkedInstances() previously returned const std::vector<RtInstance*>&. It now returns const std::unordered_set<RtInstance*>&.
  • All in-tree consumers only use .size() (rtx_accel_manager.cpp dynamic-BLAS heuristic, BlasEntry debug print) and .empty() (rtx_scene_manager.cpp BVH cache GC predicate). Both work identically on either container.
  • No consumer iterates m_linkedInstances or depends on insertion order.

Test plan

  • Builds clean on top of latest main (REMIX-5504 included).
  • Verified in Painkiller (custom title using dxvk-remix): pre-fix end-of-frame GC scaled quadratically with particle count and caused visible FPS drops in particle-heavy scenes; post-fix the cost is flat with N.
  • CI tests.

Note on attribution

This O(N²) pattern wasn't introduced by recent commits — it has been latent since the first RTX Remix commit. The first-commit author left a "Swap & pop - faster than 'erase'" comment, indicating awareness of the pop-side cost but not concern about the linear std::find. This PR addresses that.

…1) unlink

The container stores every RtInstance currently bound to a BlasEntry,
and BlasEntry::unlinkInstance() called std::find on it (linear scan)
followed by swap-and-pop. With many same-geometry transient instances
sharing one BlasEntry — for example hundreds of single-bone particle
billboards emitting from a monster joint, all sharing geometry and
material so they collapse onto one BlasEntry through exactMatch — the
vector grew to N entries and end-of-frame GC ran N unlinks at O(N)
each, for O(N²) total per frame.

Switching the container to std::unordered_set makes unlinkInstance
O(1) average. linkInstance is now insert() instead of push_back();
getLinkedInstances() returns a set reference. No current consumer
iterates the container or depends on insertion order — they only call
.size() (accel_manager dynamic-BLAS heuristic, BlasEntry debug print)
and .empty() (BVH cache GC predicate in scene_manager), which work
identically on either container.

This was present since the first RTX Remix commit (98314db, Aug 2022)
but doesn't bite in typical NV target games where most meshes have
unique vertex content and end up in their own buckets. Painkiller's
FFP indexed-blend skinning produces many same-geometry single-bone
draws that share a BlasEntry, exposing the quadratic scaling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant