Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Damn those Mythos benchmarks seem very promising

Damn those Mythos benchmarks seem very promising

Scheduled Pinned Locked Moved Uncategorized
21 Posts 5 Posters 6 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • pojntfx@mastodon.socialP pojntfx@mastodon.social

    Damn those Mythos benchmarks seem very promising

    pojntfx@mastodon.socialP This user is from outside of this forum
    pojntfx@mastodon.socialP This user is from outside of this forum
    pojntfx@mastodon.social
    wrote last edited by
    #2

    Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

    pojntfx@mastodon.socialP 1 Reply Last reply
    0
    • pojntfx@mastodon.socialP pojntfx@mastodon.social

      Wild that they don't seem to be making it GA, makes me suspect it's probably actually not as good as they say

      pojntfx@mastodon.socialP This user is from outside of this forum
      pojntfx@mastodon.socialP This user is from outside of this forum
      pojntfx@mastodon.social
      wrote last edited by
      #3

      Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

      justin@toot.ioJ deobald@fantastic.earthD 2 Replies Last reply
      0
      • pojntfx@mastodon.socialP pojntfx@mastodon.social

        Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

        justin@toot.ioJ This user is from outside of this forum
        justin@toot.ioJ This user is from outside of this forum
        justin@toot.io
        wrote last edited by
        #4

        @pojntfx I really don't get the excitement around tech that destroys the earth more than we as humanity have in our history so far?

        pojntfx@mastodon.socialP 1 Reply Last reply
        0
        • justin@toot.ioJ justin@toot.io

          @pojntfx I really don't get the excitement around tech that destroys the earth more than we as humanity have in our history so far?

          pojntfx@mastodon.socialP This user is from outside of this forum
          pojntfx@mastodon.socialP This user is from outside of this forum
          pojntfx@mastodon.social
          wrote last edited by
          #5

          @justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now

          This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it

          justin@toot.ioJ pojntfx@mastodon.socialP ori@hj.9fs.netO 3 Replies Last reply
          0
          • pojntfx@mastodon.socialP pojntfx@mastodon.social

            @justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now

            This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it

            justin@toot.ioJ This user is from outside of this forum
            justin@toot.ioJ This user is from outside of this forum
            justin@toot.io
            wrote last edited by
            #6

            @pojntfx AI has far more issues than just energy use.

            pojntfx@mastodon.socialP 1 Reply Last reply
            0
            • pojntfx@mastodon.socialP pojntfx@mastodon.social

              @justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now

              This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it

              pojntfx@mastodon.socialP This user is from outside of this forum
              pojntfx@mastodon.socialP This user is from outside of this forum
              pojntfx@mastodon.social
              wrote last edited by
              #7

              @justin Something changed, either in the harness or the models idk but something changed ~Nov of last year, maybe ~Feb this year I'm not sure, but it's gone from "useless" to "useful" pretty quickly.

              justin@toot.ioJ 1 Reply Last reply
              0
              • justin@toot.ioJ justin@toot.io

                @pojntfx AI has far more issues than just energy use.

                pojntfx@mastodon.socialP This user is from outside of this forum
                pojntfx@mastodon.socialP This user is from outside of this forum
                pojntfx@mastodon.social
                wrote last edited by
                #8

                @justin Meh, the abolition of copyright is a nice side effect

                Endless slop polluting clean datasources is a big problem, yes, but not using LLMs for something that is _not_ that won't change it

                1 Reply Last reply
                0
                • pojntfx@mastodon.socialP pojntfx@mastodon.social

                  @justin Something changed, either in the harness or the models idk but something changed ~Nov of last year, maybe ~Feb this year I'm not sure, but it's gone from "useless" to "useful" pretty quickly.

                  justin@toot.ioJ This user is from outside of this forum
                  justin@toot.ioJ This user is from outside of this forum
                  justin@toot.io
                  wrote last edited by
                  #9

                  @pojntfx useful doesn't excuse theft, degradation of creativity and the amount of garbage that AI causes FOSS to deal with on a daily basis.

                  pojntfx@mastodon.socialP 1 Reply Last reply
                  0
                  • justin@toot.ioJ justin@toot.io

                    @pojntfx useful doesn't excuse theft, degradation of creativity and the amount of garbage that AI causes FOSS to deal with on a daily basis.

                    pojntfx@mastodon.socialP This user is from outside of this forum
                    pojntfx@mastodon.socialP This user is from outside of this forum
                    pojntfx@mastodon.social
                    wrote last edited by
                    #10

                    @justin I don't believe in IP, there is no such thing as "theft" of intellectual "property". Copyleft was a means to get to this at some point and might still be a way to get there but times are changing

                    "garbage AI causes FOSS to deal with on a daily basis" - again, something changed here. It's not useless slop AI security reports anymore like a few months ago. systemd uses it, curl uses, Linux uses because it's useful

                    pojntfx@mastodon.socialP 1 Reply Last reply
                    0
                    • pojntfx@mastodon.socialP pojntfx@mastodon.social

                      @justin I don't believe in IP, there is no such thing as "theft" of intellectual "property". Copyleft was a means to get to this at some point and might still be a way to get there but times are changing

                      "garbage AI causes FOSS to deal with on a daily basis" - again, something changed here. It's not useless slop AI security reports anymore like a few months ago. systemd uses it, curl uses, Linux uses because it's useful

                      pojntfx@mastodon.socialP This user is from outside of this forum
                      pojntfx@mastodon.socialP This user is from outside of this forum
                      pojntfx@mastodon.social
                      wrote last edited by
                      #11

                      @justin Degradation of creativity is a real problem, yes, but "why are you painting a picture of me when you can just take a photo" is nothing new

                      pojntfx@mastodon.socialP 1 Reply Last reply
                      0
                      • pojntfx@mastodon.socialP pojntfx@mastodon.social

                        @justin Degradation of creativity is a real problem, yes, but "why are you painting a picture of me when you can just take a photo" is nothing new

                        pojntfx@mastodon.socialP This user is from outside of this forum
                        pojntfx@mastodon.socialP This user is from outside of this forum
                        pojntfx@mastodon.social
                        wrote last edited by
                        #12

                        @justin Idk this argument has been had like a million times on here and at this point it's getting tiring. It's useful in some contexts. Can be the opposite of that in others. It's being used by more and more projects and people every day with pretty good success lately.

                        1 Reply Last reply
                        0
                        • pojntfx@mastodon.socialP pojntfx@mastodon.social

                          Qwen 3.6 is essentially the same as Opus 4.6 now so I guess we'll see how the new generation stacks up?

                          deobald@fantastic.earthD This user is from outside of this forum
                          deobald@fantastic.earthD This user is from outside of this forum
                          deobald@fantastic.earth
                          wrote last edited by
                          #13

                          @pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

                          i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

                          pojntfx@mastodon.socialP 1 Reply Last reply
                          0
                          • deobald@fantastic.earthD deobald@fantastic.earth

                            @pojntfx have you actually seen qwen perform this well? or are you basing that comment on benchmarks?

                            i think the mythos benchmarks only have to be "some amount better" at finding 0days than the current public models to justify them waiting on ga... quite a few maintainers are already swamped.

                            pojntfx@mastodon.socialP This user is from outside of this forum
                            pojntfx@mastodon.socialP This user is from outside of this forum
                            pojntfx@mastodon.social
                            wrote last edited by
                            #14

                            @deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

                            pojntfx@mastodon.socialP purpleidea@mastodon.socialP 2 Replies Last reply
                            0
                            • pojntfx@mastodon.socialP pojntfx@mastodon.social

                              @deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

                              pojntfx@mastodon.socialP This user is from outside of this forum
                              pojntfx@mastodon.socialP This user is from outside of this forum
                              pojntfx@mastodon.social
                              wrote last edited by
                              #15

                              @deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing

                              pojntfx@mastodon.socialP 1 Reply Last reply
                              0
                              • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                @deobald And yeah re:Mythos I'll believe it when I see it, but current-gen models except free is already a massive value IMHO. Sonnet etc. is still very useful despite the other models existing

                                pojntfx@mastodon.socialP This user is from outside of this forum
                                pojntfx@mastodon.socialP This user is from outside of this forum
                                pojntfx@mastodon.social
                                wrote last edited by
                                #16

                                @deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

                                pojntfx@mastodon.socialP deobald@fantastic.earthD 2 Replies Last reply
                                0
                                • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                  @deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

                                  pojntfx@mastodon.socialP This user is from outside of this forum
                                  pojntfx@mastodon.socialP This user is from outside of this forum
                                  pojntfx@mastodon.social
                                  wrote last edited by
                                  #17

                                  @deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034

                                  deobald@fantastic.earthD 1 Reply Last reply
                                  0
                                  • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                    @deobald Yup, I used Qwen 3.6 with Nanobot via OpenRouter, Alibaba was providing it for free for testing until yesterday. Switched to GLM 5.1 earlier - same thing, beats Opus. GLM's weights are even MIT-licensed

                                    purpleidea@mastodon.socialP This user is from outside of this forum
                                    purpleidea@mastodon.socialP This user is from outside of this forum
                                    purpleidea@mastodon.social
                                    wrote last edited by
                                    #18

                                    @pojntfx @deobald You found glm 5.1 was better than opus4.6 at coding?? Want to split an h200 ?

                                    1 Reply Last reply
                                    0
                                    • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                      @deobald If you'e like to try for yourself I've documented it here: https://gist.github.com/pojntfx/5916ceb7ec35eb010010400447e9c034

                                      deobald@fantastic.earthD This user is from outside of this forum
                                      deobald@fantastic.earthD This user is from outside of this forum
                                      deobald@fantastic.earth
                                      wrote last edited by
                                      #19

                                      @pojntfx are you using nanobot for hacking or were you just pointing me to the provider section?

                                      1 Reply Last reply
                                      0
                                      • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                        @deobald I'm pretty happy about mostly working with higher-level, memory-safe languages

                                        deobald@fantastic.earthD This user is from outside of this forum
                                        deobald@fantastic.earthD This user is from outside of this forum
                                        deobald@fantastic.earth
                                        wrote last edited by
                                        #20

                                        @pojntfx nod. it does have me thinking hard about other forms of baked-in safety. i'll admit this is the first point in my career where i've ever taken elixir seriously.

                                        (well, ok, not really... @abnv ran a team at nilenso that did some amazing work with it for an quiz app that ran in parallel to a tv show. but i've never previously been tempted to learn it.)

                                        1 Reply Last reply
                                        0
                                        • pojntfx@mastodon.socialP pojntfx@mastodon.social

                                          @justin The fix isn't to not use useful tools it's to a) deregulate clean energy infrastructure so that we expand them China-style and b) make sure that the models are open so you can run them on clean energy right now

                                          This is the same argument like with EVs "but the grid is dirty" like yes. Fix that. Don't be anti-EV because of it

                                          ori@hj.9fs.netO This user is from outside of this forum
                                          ori@hj.9fs.netO This user is from outside of this forum
                                          ori@hj.9fs.net
                                          wrote last edited by
                                          #21
                                          What's the fix for the people behind it explicitly having the goal of replacing the human mind as a tool of thought?

                                          CC: @justin@toot.io
                                          1 Reply Last reply
                                          1
                                          0
                                          • R relay@relay.infosec.exchange shared this topic
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups