Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.

The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.

See the GLM version for more details on theory and comparisons.

GGUF

Model size

121B params

Architecture

nemotron_h_moe

Hardware compatibility

We're not able to determine the quantization variants.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO

Base model

Quantized

(13)

this model

Collection including Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO