gemma 4 e4b isn't half shabby, but i didn't think it would run in llama.cpp-vulkan in ubuntu on this lenovo yoga laptop with an AMD Radeon 860M GPU.

dunkelstern@corteximplant.com

@lritter i am with you with executing or modifying stuff but tool calling for RAG/information retrieval is useful in my opinion. I mean if you have a choice of actually getting valid information via tool call or hallucination i select tool call. Another valid option is context exhaustion, save some summary to a memory file and retrieve it if you exhaust your context window. Agentic behaviour is just marketing BS, cannot really work in my opinion because it is not intelligent, just fancy text completion…

lritter@mastodon.gamedev.place

@dunkelstern alright. yeah. i guess that might make sense.

lritter@mastodon.gamedev.place

i thought i'd let gemma itself explain to me how to set up a tool client, and it assures me that LangChain is the *shudder* "industry standard" here.

the script it generates for me already doesn't run because the LangChain bros have completely overhauled the API since gemma was trained. yep, models like it conservative. never change a taught system

so we have a bit of a egg-hen problem here because gemma can't curl anything yet. %)

#s0up

lritter@mastodon.gamedev.place

aha. the missing module is still available (you also need to install langchain-community and langchain-openai).

then i had to replace `langchain.agents` with `langchain_classic.agents` in the script and now the loop is closed.

still, it can't see the tool functions i defined for it, and apparently the problem is that the script gemma made must use "create_openai_tools_agent" rather than "create_openai_functions_agent" - which gemma itself tells me later. hilarious!

#s0up

lritter@mastodon.gamedev.place

and now it can use the functions provided - this is also entirely fuzzy. there is no API protocol here. it goes entirely by docstring. spooky!

i had all tool functions generated as well because Why Do Anything Myself Ever Again?

i guess i now have my very own alibaba, knock-off, shein, wish-ordered sloppy Jarvis from iron man

#s0up

lritter@mastodon.gamedev.place

after some confusing back and forth i realize this agent doesn't keep track of our session, and every query is a new one.

i complained and it told me how to change the script. so now that works.

complaining has become my ultimate superweapon. i feel like a newborn with graying hair and back aches!

also i see how this is turning into a game where you mainly use jarvis to fix jarvis. very computer, to have it solve problems you wouldn't have without it.

#s0up

lritter@mastodon.gamedev.place

jarvis, err i mean gemma can now do the original example i proposed.

i added tools to:
* get date and time
* write to file in a special bucket dir
* append to file in the bucket dir
* read files (completely)
* change directory
* list directory

it was pretty useless in understanding my language projects. i asked it to write a tutorial for nudl and despite seeing several examples, it used tokens from C++ and python.

the future - today!

#s0up

lritter@mastodon.gamedev.place

my impression so far is that a lot of infrastructurd is being built on top the assumption that transformer llm's will eventually be replaced by something that actually works and learns. all of this has tech demo quality. i feel sorry for everyone forced by their boss to argue with the machine like they are in a douglas adams novel.

#s0up

allo@chaos.social

@lritter
If you'd like some hints:
- Gemma 4 support was broken some time. Use latest llama.cpp and redownload the quants if they are older than this week.
- Don't use vibe tools (just my personal opinion) but IDE integration like kilocode
- In my experience Qwen3.5 still beats Gemma for coding tasks. Probably depends on the programming language.
- The E4B model is strong for everyday tasks (Simple problems, translation from/to good supported languages, grammar checking)

kitten_tech@fosstodon.org

@lritter I gather the LLM companies are begging for investment on the basis that they're close to building that thing, then spending the money on LLMing harder / buying all the GPUs so their competitors can't LLM as hard / offering services at a loss so they have lots of "users" to impress investors with; they have no idea how to actually produce a more functional AI so just LLM harder and get incremental gains for exponentially rising costs.

lritter@mastodon.gamedev.place

@allo

- i'm aware. this is all new. new llama, new files. i use the exact temperature, top k etc. config as suggested by the vendor. examples in this thread were all 26b based. 34b is too slow for tools.

- i would rather have my fingernails pulled out than put this in a IDE and compromise integrity & copyright. this is strictly entertainment.

- i doubt the speed is the same. i'm going to try a qwen 3.5 35B A3B, let's see if it can understand my work. i doubt it.

- agree on e4b.

neo@soc.psynet.me

@lritter I'm honestly surprised that you even came that far with such a tiny model and probably a tiny context window as well. And yes, everything below the big frontier models still feels very much like a tech demo. Impressive, but not really useful. Even the smaller Claude models (Sonnet, Haiku) are relatively shit when used for anything more complex.

lritter@mastodon.gamedev.place

@neo it's the 26b model, not that tiny. 128k context window. google calls the 34b version a "frontier model".

neo@soc.psynet.me

@lritter That is tiny. The big ones are in the range of > 1 trillion parameters (not all activated at once) and up to 1m context window, and it shows.

lritter@mastodon.gamedev.place

@neo you should know it's not the size that matters.

neo@soc.psynet.me

@lritter Yeah yeah, just use your VRAM smarter. That's what Nvidia said when they released another 8 GB card.

lritter@mastodon.gamedev.place

@allo i set up the qwen model i mentioned with the settings recommended for coding work. it is slower but not impossibly slow. 12t/s

i had it examine the nudl directory, read the sx docs, etc.

tutorial is also full-blown wrong.

(fun fact: when i scolded gemma for the bad quality of it earlier, it wrote it again, and this time, more things were correct.)

but this is a joke. i expect one shot perfection.

lritter@mastodon.gamedev.place

@allo i also told qwen it did a bad job and now it wants to know what it did wrong? if i could only explain, it would understand.

goes to show: these models can only help you when you're not doing anything interesting.

lritter@mastodon.gamedev.place

@allo qwen 3.5

allo@chaos.social

@lritter I am not sure what frontend you are using there. I think one of the advantages of kilocode (or roo) is that it provides good tools for dissecting the source and thought out system prompts. A one-shot in the web interface doesn't do the same than a command in kilocode.

Yeah, 27B/34B dense are too slow for me, too, but the MoE work for me. I need to reevaluate Gemma 4 after the latest fixes, it may now perform better.

And I guess having AI work with a novel programming language is hard.

CIRCLE WITH A DOT

gemma 4 e4b isn't half shabby, but i didn't think it would run in llama.cpp-vulkan in ubuntu on this lenovo yoga laptop with an AMD Radeon 860M GPU.