BlasEntry::m_linkedInstances: std::vector → std::unordered_set for O(1) unlink#142
Open
BinqAdams wants to merge 1 commit into
Open
BlasEntry::m_linkedInstances: std::vector → std::unordered_set for O(1) unlink#142BinqAdams wants to merge 1 commit into
BinqAdams wants to merge 1 commit into
Conversation
…1) unlink The container stores every RtInstance currently bound to a BlasEntry, and BlasEntry::unlinkInstance() called std::find on it (linear scan) followed by swap-and-pop. With many same-geometry transient instances sharing one BlasEntry — for example hundreds of single-bone particle billboards emitting from a monster joint, all sharing geometry and material so they collapse onto one BlasEntry through exactMatch — the vector grew to N entries and end-of-frame GC ran N unlinks at O(N) each, for O(N²) total per frame. Switching the container to std::unordered_set makes unlinkInstance O(1) average. linkInstance is now insert() instead of push_back(); getLinkedInstances() returns a set reference. No current consumer iterates the container or depends on insertion order — they only call .size() (accel_manager dynamic-BLAS heuristic, BlasEntry debug print) and .empty() (BVH cache GC predicate in scene_manager), which work identically on either container. This was present since the first RTX Remix commit (98314db, Aug 2022) but doesn't bite in typical NV target games where most meshes have unique vertex content and end up in their own buckets. Painkiller's FFP indexed-blend skinning produces many same-geometry single-bone draws that share a BlasEntry, exposing the quadratic scaling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BlasEntry::unlinkInstance()was O(N) (std::find + swap-and-pop on a vector). With many same-geometry transient instances sharing one BlasEntry, end-of-frame GC ran N unlinks at O(N) each — O(N²) per frame.m_linkedInstancestostd::unordered_set<RtInstance*>makes unlink O(1) average.linkInstancebecomes.insert();unlinkInstancebecomes.erase(). No public-API behavior change.98314db1(Aug 2022). Doesn't bite typical NV target games (Portal RTX, HL2 RTX) because meshes there generally have unique vertex content and don't collide into one BlasEntry viaexactMatch. Surfaces in games with many same-geometry single-bone draws — Painkiller's FFP indexed-blend skinning is one such pattern, where hundreds of particle billboards emitted from skinned joints share geometry+material and collapse onto one BlasEntry.Compatibility
getLinkedInstances()previously returnedconst std::vector<RtInstance*>&. It now returnsconst std::unordered_set<RtInstance*>&..size()(rtx_accel_manager.cpp dynamic-BLAS heuristic, BlasEntry debug print) and.empty()(rtx_scene_manager.cpp BVH cache GC predicate). Both work identically on either container.m_linkedInstancesor depends on insertion order.Test plan
Note on attribution
This O(N²) pattern wasn't introduced by recent commits — it has been latent since the first RTX Remix commit. The first-commit author left a "Swap & pop - faster than 'erase'" comment, indicating awareness of the pop-side cost but not concern about the linear
std::find. This PR addresses that.