Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.

The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.

See the GLM version for more details on theory and comparisons.

Downloads last month
448
GGUF
Model size
121B params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO

Quantized
(13)
this model

Collection including Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO