You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell (RTX 5090/5080/5070 Ti, RTX PRO 6000; sm_120). Native NVFP4/GGUF, 270 tok/s decode on Qwen3-Coder-30B MoE. Written entirely by Claude Code.
Lna-Lab production pipeline: GGUF -> modelopt-format NVFP4 + working MTP head for vLLM on RTX PRO 6000 Blackwell (SM120). Stages 2 (NVFP4) and 3 (MTP graft) are Lna-Lab originals; stage 1 (GGUF->bf16) reuses li-yifei/gguf-to-nvfp4.
Rust-native MoE inference runtime with custom CUDA kernels for Blackwell GPUs. Includes DFlash speculative decoding, multi-tier Engram memory, and entropy-adaptive routing. Targets Qwen3.5-35B-A3B on a single RTX 5060 Ti 16GB.
Optimized vLLM deployment for NVIDIA Blackwell (RTX 5090) on Linux Kernel 6.14. Resolves SM_120 kernel incompatibilities, P2P deadlocks, and memory fragmentation for high-performance LLM inference.
Den experimental kernel forge — raw inline PTX tensor core path for Blackwell SM120. OMMA.SF.16864 cubins, SASS verification, fragment mapping. Where kernels are proven before promotion to den-nv.