zai-org/GLM-4.5-Air · Hugging Face

ikt@aussie.zone · edit-2 4 months ago

zai-org/GLM-4.5-Air · Hugging Face

Alex@lemmy.ml · 4 months ago

I’ve moved to using Rama Lama mainly because it promises to do the probing to get the best acceleration possible for whatever model you launch.

brucethemoose@lemmy.world · edit-2 4 months ago

It looks like it just chooses a llama.cpp backend to compile, so technically you are leaving a good bit of performance/model size on the table if you know your GPU, and the backend to choose.

All this stuff is horribly documented though.