Going into the rabbithole of testing local LLMs right now.

tomgag@infosec.exchange

Going into the rabbithole of testing local LLMs right now. I don't have a dedicated GPU, but 32 GiB of RAM should be enough for anyone.

#ai #huggingface #selfhost #localai #ollama #heretic #qwen #mistral

tomgag@infosec.exchange

Heretic quantized versions of Qwen 3.5 have just been released but even the base Qwen 3.5 model seems to have issue with ollama currently, and I don't have bandwidth to do a manual patch now. Trying Mistral 3.2.

tomgag@infosec.exchange

First impressions of Mistral Small 3.2: seems pretty solid, it answers "uncomfortable" political question quite neutrally.

I don't understand why #confer and #euria by #infomaniak are not based on this.

sealjay@fosstodon.org

@tomgag how fast does it feel? I tried using foundry local and ollama but at the time I felt slowed down. I’d be keen to swap back to a local model given how the large providers are slowly catching down the subscription token limits.

tomgag@infosec.exchange

@sealjay well, I'm running on local CPU with 32 GiB of RAM, so I wouldn't call it "fast". 3-5 tokens per second maybe? I guess it's OK if you give it a task and then go to grab a coffee

sealjay@fosstodon.org

@tomgag maybe I’ll check I’m running on renewable energy before I leave a machine running over the weekend then

tomgag@infosec.exchange

@1ad6e959c292f74de615d4c6e5ec43d0b7ec4908a55de93aa2527c46a8bd1d5b I'm not sure, I don't have any beefy GPU you shoulkd ask this in the Ollama Reddit community (or similar).

tomgag@infosec.exchange

Interesting, it seems that Qwen 2.5 Coder is actually less aggressive than Qwen 3.5 in rejecting sensitive topics.

blingblingmk@dresden.network

@tomgag
Good question! Why is #infomaniak not part of the fediverse?!

CIRCLE WITH A DOT

Going into the rabbithole of testing local LLMs right now.