sm120

Here are 11 public repositories matching this topic...

kekzl / imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell (RTX 5090/5080/5070 Ti, RTX PRO 6000; sm_120). Native NVFP4/GGUF, 270 tok/s decode on Qwen3-Coder-30B MoE. Written entirely by Claude Code.

Updated Jun 2, 2026
Cuda

lna-lab / blackwell-geforce-nvfp4-gemm

Star

NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.

gpu-computing quantization cutlass gemm geforce blackwell vllm llm-inference flashinfer rtx-5090 sm120 nvfp4

Updated Apr 27, 2026
Python

lna-lab / GGUF-to-NVFP4-SM120

Star

Lna-Lab production pipeline: GGUF -> modelopt-format NVFP4 + working MTP head for vLLM on RTX PRO 6000 Blackwell (SM120). Stages 2 (NVFP4) and 3 (MTP graft) are Lna-Lab originals; stage 1 (GGUF->bf16) reuses li-yifei/gguf-to-nvfp4.

quantization mtp blackwell vllm gguf qwen3 sm120 nvfp4 modelopt

Updated Apr 27, 2026
Python

AIdevsmartdata / chimere

Star

Rust-native MoE inference runtime with custom CUDA kernels for Blackwell GPUs. Includes DFlash speculative decoding, multi-tier Engram memory, and entropy-adaptive routing. Targets Qwen3.5-35B-A3B on a single RTX 5060 Ti 16GB.

rust ffi cuda inference moe quantization mamba state-space-models deltanet blackwell engram llm qwen speculative-decoding sm120 mamba2 nemotron-h hybrid-ssm

Updated Apr 25, 2026
Rust

informatico-madrid / blackwell-linux-infra-optimizer

Star

Optimized vLLM deployment for NVIDIA Blackwell (RTX 5090) on Linux Kernel 6.14. Resolves SM_120 kernel incompatibilities, P2P deadlocks, and memory fragmentation for high-performance LLM inference.

infrastructure linux-kernel cuda blackwell llm vllm deepseek rtx5090 sm120

Updated Jan 17, 2026
Dockerfile

Yyyzk123 / torch-cu128-sm120

Star

VPN-free Prebuilt PyTorch 2.9.0 for CUDA 12.8 + sm_120 (RTX 5080)

linux deep-learning cuda torch rtx5080 sm120

Updated Aug 5, 2025
Shell

craftogrammer / llama.cpp-adaptive-turboquant

Star

Downstream llama.cpp TurboQuant CUDA fork with adaptive KV layout selection for long-context inference on consumer Blackwell GPUs.

cuda inference moe quantization blackwell kv-cache long-context llama-cpp local-llm rtx-5080 sm120 turboquant

Updated May 1, 2026
C++

Natfii / onnxruntime-gpu-blackwell

Star

Pre-built onnxruntime-gpu 1.24.1 with Blackwell sm_120 CUDA kernels (RTX 5090/5080/5070)

machine-learning gpu cuda nvidia python-wheel onnxruntime blackwell rtx-5070 rtx-5090 rtx-5080 sm120

Updated Feb 14, 2026

AIdevsmartdata / ik_llama.cpp

Star

llama.cpp fork with additional SOTA quants and improved performance

cuda inference llama cuda-kernels quantization ssm mamba state-space-models blackwell llama-cpp gguf sm120 mamba2 nemotron-h hybrid-ssm

Updated Apr 26, 2026
C++

webportalim / ComfyUI-Hunyuan3DWrapper-RTX-50xx-Blackwell-Installation-Guide

Star

Complete installation guide for ComfyUI-Hunyuan3DWrapper on NVIDIA Blackwell GPUs (RTX 5070 Ti, 5080, 5090) Covers custom_rasterizer manual compilation for sm_120 / compute_120 architecture.

windows cuda cuda-toolkit cuda-support windows-11 windows-package blackwell comfyui hunyuan3d hunyuan3dwrapper blackwell-gpu sm120 rtx-5070-ti

Updated Apr 12, 2026

RentedNoodle / llama.den

Star

Den experimental kernel forge — raw inline PTX tensor core path for Blackwell SM120. OMMA.SF.16864 cubins, SASS verification, fragment mapping. Where kernels are proven before promotion to den-nv.

cuda llama blackwell kv-cache llm-inference fp4 omma sm120 nvfp4 rtx-5070-ti

Updated May 29, 2026
C++

Improve this page

Add a description, image, and links to the sm120 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sm120 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sm120

Here are 11 public repositories matching this topic...

kekzl / imp

lna-lab / blackwell-geforce-nvfp4-gemm

lna-lab / GGUF-to-NVFP4-SM120

AIdevsmartdata / chimere

informatico-madrid / blackwell-linux-infra-optimizer

Yyyzk123 / torch-cu128-sm120

craftogrammer / llama.cpp-adaptive-turboquant

Natfii / onnxruntime-gpu-blackwell

AIdevsmartdata / ik_llama.cpp

webportalim / ComfyUI-Hunyuan3DWrapper-RTX-50xx-Blackwell-Installation-Guide

RentedNoodle / llama.den

Improve this page

Add this topic to your repo