Damn those Mythos benchmarks seem very promising
-
@justin Degradation of creativity is a real problem, yes, but "why are you painting a picture of me when you can just take a photo" is nothing new
@justin Idk this argument has been had like a million times on here and at this point it's getting tiring. It's useful in some contexts. Can be the opposite of that in others. It's being used by more and more projects and people every day with pretty good success lately.
-
Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?
@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?
i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.
-
@pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?
i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.
@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed
-
@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed
@deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing
-
@deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing
@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages
-
@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages
@deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034
-
@deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed
-
@deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034
@pojntfx are you using nanobot for hacking or were you just pointing me to the provider section?
-
@deobald I'm pretty happy about mostly working with higher-level, memory-safe languages
@pojntfx nod. it does have me thinking hard about other forms of baked-in safety. i'll admit this is the first point in my career where i've ever taken elixir seriously.
(well, ok, not really... @abnv ran a team at nilenso that did some amazing work with it for an quiz app that ran in parallel to a tv show. but i've never previously been tempted to learn it.)
-
@justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now
This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it
What's the fix for the people behind it explicitly having the goal of replacing the human mind as a tool of thought?
CC: @justin@toot.io
-
R relay@relay.infosec.exchange shared this topic