March 5 updates compared to Feb ones?

#11
by tnuvkeg - opened

I see you guys are re-uploading all quants.
I wonder about UD-Q5_K_XL size that is currently bigger than the February one by about 7Gb? why is that? should I update it? (although I probably will need to go for Q4 instead).

edit: thank you for all the quants and updates!!!

I see you guys are re-uploading all quants.
I wonder about UD-Q5_K_XL size that is currently bigger than the February one by about 7Gb? why is that? should I update it? (although I probably will need to go for Q4 instead).

edit: thank you for all the quants and updates!!!

I updated Yesterday but couldn't fit it anymore myself now, did some testing and now ended up on the Q5_K_S which at least doesn't perform worse compared to the previous Q5_K_XL and leave a bit more VRAM space. So went for that myself. But was wondering the same.

Unsloth AI org

Hey yes - we were going to release benchmarks today hopefully soon!

I downloaded UD-Q5_K_XL yesterday for my 128GB system and I really like it so far. I have no experience with the old files though, first time I use it

Yesterday there were more quants, downloaded iq3_s (also fresh upload), which works nicely on my 64gb setup, are they gonna be reuploaded?

I downloaded UD-Q5_K_XL yesterday for my 128GB system and I really like it so far. I have no experience with the old files though, first time I use it

that's actually one of the "old" ones, since the current ones are only 2 hours old :D

Unsloth AI org

Yes sorry they'll be back up - we had some uploading issues sorry

I had some issues with the Q4_xx and ik_llama.cpp where the model became corrupt after some usage.
Then I downloaded the UD_Q5_K_XL, and I have not had any issues whatsover.
Anyone else seen the same on RTX6000 96GBVRAM?
Any changes in the new Q4_xxx that might change things?

Unsloth AI org

The new ones will be the final ones for this line of updates, so they should all be able now - sorry on the issues and the delay!

Actually the one i downloaded yesterday was q3_k_s, and it had iq3_s/iq3_xxs mix for experts. I choose this mix based on your research that iq3_xxs provides large benefit for experts weights over q2. Restoring it would be good if possible. It was 52gb in size (which leave good margin for running OS on 64 gb).

Up - can see it was restored - many thanks!

I downloaded UD-Q5_K_XL yesterday for my 128GB system and I really like it so far. I have no experience with the old files though, first time I use it

that's actually one of the "old" ones, since the current ones are only 2 hours old :D

I just verified that the files from yesterday have the same sha256 hashes that are in the commit, so I believe it's ok! :)

An update on the repition issue. Turns out that it is an issue in ik_llama.cpp. When switching to llama.cpp this morning (after they fixed a problem with VRAM allocation), everything works VERY good!
Stable and the Q5_K_M fits very nicely on my RTX6000 PRO with about 75 tokens/second!

any way i can get to the pre-march 5 quants?

Good job unsloth devs. You can hide in my basement.

Sign up or log in to comment