Recommended sampler?
There is no mention of sampler in the model card or in generation_config.json, apart from temp 0.1 in the example snippets which seems very low.
Without settings samplers will default to temp 1 and top-k 40 in vLLM and SGLang I think.
But in llama.cpp it will be:
- temperature: 0.8
- top k: 40
- top p: 0.9
- min p: 0.1
potential leading to non-ideal experience for people
The default llama.cpp sampling parameters are, in this order:
- top-k 40
- top-p 0.95
- min-p 0.05
- temperature 0.80
I agree it would be nice to know the recommended samplers for this model.
It is not just parameters, but also the sequence of application that matters. No one ever specifies that.
0.1 is the temperature we recommend for an Instruct checkpoint, if you'd like to have more creative responses and maybe let more room to the model thinking trace you can try to increase it.
Depending on your use cases it is interesting to try out different values but starting low is most likely the most appropriate
@.patrickvonplaten updated the model card with this:
Recommended Settings
- Reasoning Effort:
'none'β Do not use reasoning'high'β Use reasoning (recommended for complex prompts)
Usereasoning_effort="high"for complex tasks
- Temperature: 0.7 for
reasoning_effort="high". Temp between 0.0 and 0.7 forreasoning_effort="none"depending on task.
Hope it can help !