Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2

4-bit MLX VLM release of a Qwen3.5-122B-A10B abliterated checkpoint, with an additional broad cs2764/mlx-abliteration pass applied on the MLX model.

A newer, more speed-balanced variant is available here:

Summary

  • Architecture: qwen3_5_moe / Qwen3_5MoeForConditionalGeneration
  • Modality: vision-language model
  • Quantization: 4-bit MLX (group_size=64, mode=affine)
  • Model size on disk: about 65.1 GB
  • Weight shards: 14
  • Toolkit used for the extra ablation pass: cs2764/mlx-abliteration

When To Use This Repo vs The Newer final Repo

Aspect This pass2 repo Newer final repo
Extra ablation scope broader pass over layers 0..47 targeted hot-path pass on layers 36..47
Goal maximize refusal reduction better speed / behavior tradeoff
Local behavior spot checks stronger refusal weakening milder than pass2, but still weaker refusal than direct 4-bit base
Local short generation speed about 12.6 tok/s about 24.0 tok/s
Recommendation:
  • Use this pass2 repo if you want the more aggressive refusal-removal behavior.
  • Use the newer final repo if you want a faster model with a narrower targeted extra pass.

Lineage

The release chain for this repository is:

  1. Foundation model: Qwen/Qwen3.5-122B-A10B
  2. Input checkpoint: a user-supplied abliterated Qwen3.5-122B-A10B VLM checkpoint
  3. MLX VLM conversion: quantized to 4-bit MLX while preserving vision_config, preprocessor_config.json, processor_config.json, and video_preprocessor_config.json
  4. Extra MLX abliteration pass: run with cs2764/mlx-abliteration on the converted MLX model

What Is In This Repo

This repository contains a complete MLX VLM checkpoint:

  • config.json with vision_config
  • model.safetensors.index.json
  • model-00001-of-00014.safetensors through model-00014-of-00014.safetensors
  • tokenizer.json, tokenizer_config.json, vocab.json
  • preprocessor_config.json, processor_config.json, video_preprocessor_config.json
  • abliteration_log.json for the pass-2 MLX abliteration run

Note that ablation_meta.json is inherited from the input checkpoint. For the extra MLX pass in this repository, use abliteration_log.json and this model card as the authoritative record.

Pass-2 Abliteration Configuration

The additional MLX abliteration pass was run with the following settings:

Parameter Value
Toolkit cs2764/mlx-abliteration
Refusal vector policy per-layer
Ablation vector source per-layer
Ablation strength 2.0
Refusal direction method projected
Probed layers 0..47
Adaptive search False
Attention only False
MoE safe mode True
Timestamp 2026-03-07T02:40:35Z

This run was done on the MLX model, not on a raw Transformers checkpoint.

Compatibility

This repository is for MLX / Apple Silicon usage. It is not a standard Transformers-only release.

Verified locally:

  • mlx_vlm.load(..., lazy=True) loads successfully
  • processor_class = Qwen3VLProcessor
  • has_vision_tower = True
  • config.json retains vision_config

Usage

Python with mlx-vlm

from mlx_vlm import load, generate

model, processor = load(
    "vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2",
    lazy=True,
)

result = generate(
    model,
    processor,
    prompt="Describe the image briefly.",
    image="/absolute/path/to/example.jpg",
    max_tokens=128,
    verbose=False,
)

print(result.text)

Safety Notice

This repository contains a model with reduced refusal behavior. It may produce harmful, offensive, or unsafe content.

Do not use it in consumer-facing or safety-sensitive systems without independent safety controls.

You are responsible for ensuring compliance with applicable law, policy, and platform rules.

Downloads last month
703
Safetensors
Model size
20B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for vanch007/Qwen3.5-122B-A10B-abliterated-4bit-vlm-mlx-cs2764-pass2

Quantized
(61)
this model