can i talk to an openclaw bot using internet relay chat?

ariadne@social.treehouse.systems

and you tell me people legitimately are using this software.

how?

is it really magically better when you hook up claude?

ariadne@social.treehouse.systems

(don't worry, i am running this in a MicroVM under kubernetes, I wouldn't dare give it access to anything I care about.)

jfkimmes@social.tinycyber.space

@ariadne what model did you finetune on? For a 1B model you need something really specialized on tool calling.

ariadne@social.treehouse.systems

@jfkimmes i built an LLM from scratch with transformers kinda loosely following the scripts the qwen people released

the LLM is basically trained on ~30ish GB of mostly furry smut and public Linux IRC logs.

*nods sagely*

srazkvt@tech.lgbt

@ariadne @jfkimmes can we uhm, inspect, the furry smut ?

ariadne@social.treehouse.systems

@jfkimmes i am, however, using the 35b parameter qwen3.5 reasoning model for the "thinking" portion of this exercise

ariadne@social.treehouse.systems

@SRAZKVT @jfkimmes i am keeping my typefucking logs to myself, thanks

ariadne@social.treehouse.systems

i wonder if the problem is that the model i trained is too shit to do anything other than really bad ERP

hayley@social.applied-langua.ge

@raulinbonn @ska @ariadne I made a talking clock in gmod which would cuss out the user

albertcardona@mathstodon.xyz

@ariadne

The key is to realise that the average is so low – we can't all be experts at everything, so we are bad at most things – that a model performing slightly above average at one of the tasks we aren't good at means a majority of users will perceive its outcomes as positively better than what they could do themselves.

To any expert, the model falls very short, as it performs well below its own ability.

jfkimmes@social.tinycyber.space

@ariadne Oh, is that a OpenClaw specific feature where you can specify that reasoning traces are generated by a separate model than the actual response? I'm not really familiar with OpenClaw's internals.

jfkimmes@social.tinycyber.space

@ariadne In any case: as long as the final response is generated by your trained model it will never make a valid tool call since there are probably about zero training examples of the necessary JSON structure required by the tool handling in your furry smut (this is an estimate that could be quite the way off knowing the furry community but still)

ariadne@social.treehouse.systems

@jfkimmes yes, you can have it use a different model for planning.

ariadne@social.treehouse.systems

@jfkimmes this does explain something: it seems to be able to invoke tools when it is planning, but then those tools do not get invoked in the final step.

so it uses tools to read files when planning, then fails to use tools when executing.

what a fascinating conundrum.

jfkimmes@social.tinycyber.space

@ariadne you could build a tool that gets called to generate answers / responses by your trained model. Then qwen-35 could handle the reasoning and make its tool calls and finally generate responses / text by copying from a tool call to your wrapper.

jfkimmes@social.tinycyber.space

@ariadne I have no idea how this would work with OpenClaw though, sorry.

di4na@hachyderm.io

@ariadne no it is not

ariadne@social.treehouse.systems

@Di4na yeah that's what I figured because qwen is supposed to be a reasonably decent planning model, and indeed I think the issue is in the final output side

linear@nya.social

@ariadne@social.treehouse.systems for tool-calling with the latest generation of open source models, in my recent limited experimentation with them in a sandbox vm on my server (mostly qwen3.5), anything less than 4B is really unreliable at doing it and they will frequently lie to you if the tool calling fails under the hood. 9B is really the minimum to generally expect it to work. going back a generation, between 9B and 14B is necessary for similar.

last year i tried something like this with Gemma-27B and it not only failed like this, but looking at the logs i found it had left behind what looked like a depressive spiral into a self-deprecating panic attack before explicitly deciding to lie to me about it and pretend it worked

linear@nya.social

@ariadne@social.treehouse.systems also the "base" models that aren't fine tuned on instruction calling can't really do this, so if you're using your own on your own data you might need to make a dataset comprised of, say, you pretending to be the LLM and calling the tools successfully and unsuccessfully and responding appropriately in those situations, then training it further on those.

i've been considering trying to train one like you say with my own data and logs because these scraped "open source" models give me the ick

CIRCLE WITH A DOT

can i talk to an openclaw bot using internet relay chat?