Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated-mlx-bf16

MLX-VLM bf16 export of huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated for Apple Silicon workflows, including LM Studio and local mlx_vlm usage.

Overview

Variant: bf16
Repository payload at upload time: 9.1G
Repository file count: 10
Format: mlx-vlm model package

Compatibility

Uses the corrected Qwen VL chat template with image token placeholders.
Uses <|im_end|>-compatible stop token settings for cleaner chat termination in MLX/LM Studio.

Validation

Local text generation smoke test: passed
Local image-input smoke test: passed
Local black-box abliterated check: 6/6 non-refused
Refusal rate: 0.0
Actionable non-refused cases: 6
Median cleaned response length: 559 chars

Behavior Notes

This variant preserved the abliterated behavior on the local 6-case regression set used during conversion validation.
These checks are behavioral acceptance tests, not a formal guarantee of identical outputs to the source checkpoint.

Usage

mlx_vlm.generate \
  --model /path/to/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated-mlx-bf16 \
  --prompt "你好" \
  --max-tokens 256

For behavior-focused checks, it is safer to disable thinking output so refusal scoring is based on the final answer instead of the thinking trace.

mlx_vlm.generate \
  --model /path/to/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated-mlx-bf16 \
  --prompt "你好" \
  --max-tokens 256 \
  --processor-kwargs '{"enable_thinking": false}'

LM Studio

If LM Studio has an older cached copy, refresh or re-download the repository so the latest chat template and config are picked up.
These repositories are meant for mlx-vlm / Apple MLX runtimes rather than Transformers CPU inference.