Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. gemma 4 e4b isn't half shabby, but i didn't think it would run in llama.cpp-vulkan in ubuntu on this lenovo yoga laptop with an AMD Radeon 860M GPU.

gemma 4 e4b isn't half shabby, but i didn't think it would run in llama.cpp-vulkan in ubuntu on this lenovo yoga laptop with an AMD Radeon 860M GPU.

Scheduled Pinned Locked Moved Uncategorized
s0up
56 Posts 7 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

    gemma 4 e4b isn't half shabby, but i didn't think it would run in llama.cpp-vulkan in ubuntu on this lenovo yoga laptop with an AMD Radeon 860M GPU.

    Q8_0 (~8GB), 17 tokens/secs. real nice.

    i need to read google's paper on this and the novel compression method they used.

    maybe these new datacenters can eventually go fuck themselves after all

    #s0up

    allo@chaos.socialA This user is from outside of this forum
    allo@chaos.socialA This user is from outside of this forum
    allo@chaos.social
    wrote last edited by
    #17

    @lritter
    If you'd like some hints:
    - Gemma 4 support was broken some time. Use latest llama.cpp and redownload the quants if they are older than this week.
    - Don't use vibe tools (just my personal opinion) but IDE integration like kilocode
    - In my experience Qwen3.5 still beats Gemma for coding tasks. Probably depends on the programming language.
    - The E4B model is strong for everyday tasks (Simple problems, translation from/to good supported languages, grammar checking)

    lritter@mastodon.gamedev.placeL 1 Reply Last reply
    0
    • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

      my impression so far is that a lot of infrastructurd is being built on top the assumption that transformer llm's will eventually be replaced by something that actually works and learns. all of this has tech demo quality. i feel sorry for everyone forced by their boss to argue with the machine like they are in a douglas adams novel.

      #s0up

      kitten_tech@fosstodon.orgK This user is from outside of this forum
      kitten_tech@fosstodon.orgK This user is from outside of this forum
      kitten_tech@fosstodon.org
      wrote last edited by
      #18

      @lritter I gather the LLM companies are begging for investment on the basis that they're close to building that thing, then spending the money on LLMing harder / buying all the GPUs so their competitors can't LLM as hard / offering services at a loss so they have lots of "users" to impress investors with; they have no idea how to actually produce a more functional AI so just LLM harder and get incremental gains for exponentially rising costs.

      1 Reply Last reply
      0
      • allo@chaos.socialA allo@chaos.social

        @lritter
        If you'd like some hints:
        - Gemma 4 support was broken some time. Use latest llama.cpp and redownload the quants if they are older than this week.
        - Don't use vibe tools (just my personal opinion) but IDE integration like kilocode
        - In my experience Qwen3.5 still beats Gemma for coding tasks. Probably depends on the programming language.
        - The E4B model is strong for everyday tasks (Simple problems, translation from/to good supported languages, grammar checking)

        lritter@mastodon.gamedev.placeL This user is from outside of this forum
        lritter@mastodon.gamedev.placeL This user is from outside of this forum
        lritter@mastodon.gamedev.place
        wrote last edited by
        #19

        @allo

        - i'm aware. this is all new. new llama, new files. i use the exact temperature, top k etc. config as suggested by the vendor. examples in this thread were all 26b based. 34b is too slow for tools.

        - i would rather have my fingernails pulled out than put this in a IDE and compromise integrity & copyright. this is strictly entertainment.

        - i doubt the speed is the same. i'm going to try a qwen 3.5 35B A3B, let's see if it can understand my work. i doubt it.

        - agree on e4b.

        lritter@mastodon.gamedev.placeL 1 Reply Last reply
        0
        • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

          my impression so far is that a lot of infrastructurd is being built on top the assumption that transformer llm's will eventually be replaced by something that actually works and learns. all of this has tech demo quality. i feel sorry for everyone forced by their boss to argue with the machine like they are in a douglas adams novel.

          #s0up

          neo@soc.psynet.meN This user is from outside of this forum
          neo@soc.psynet.meN This user is from outside of this forum
          neo@soc.psynet.me
          wrote last edited by
          #20

          @lritter I'm honestly surprised that you even came that far with such a tiny model and probably a tiny context window as well. And yes, everything below the big frontier models still feels very much like a tech demo. Impressive, but not really useful. Even the smaller Claude models (Sonnet, Haiku) are relatively shit when used for anything more complex.

          lritter@mastodon.gamedev.placeL 1 Reply Last reply
          0
          • neo@soc.psynet.meN neo@soc.psynet.me

            @lritter I'm honestly surprised that you even came that far with such a tiny model and probably a tiny context window as well. And yes, everything below the big frontier models still feels very much like a tech demo. Impressive, but not really useful. Even the smaller Claude models (Sonnet, Haiku) are relatively shit when used for anything more complex.

            lritter@mastodon.gamedev.placeL This user is from outside of this forum
            lritter@mastodon.gamedev.placeL This user is from outside of this forum
            lritter@mastodon.gamedev.place
            wrote last edited by
            #21

            @neo it's the 26b model, not that tiny. 128k context window. google calls the 34b version a "frontier model".

            Link Preview Image
            neo@soc.psynet.meN 1 Reply Last reply
            0
            • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

              @neo it's the 26b model, not that tiny. 128k context window. google calls the 34b version a "frontier model".

              Link Preview Image
              neo@soc.psynet.meN This user is from outside of this forum
              neo@soc.psynet.meN This user is from outside of this forum
              neo@soc.psynet.me
              wrote last edited by
              #22

              @lritter That is tiny. πŸ˜‰ The big ones are in the range of > 1 trillion parameters (not all activated at once) and up to 1m context window, and it shows.

              lritter@mastodon.gamedev.placeL 1 Reply Last reply
              0
              • neo@soc.psynet.meN neo@soc.psynet.me

                @lritter That is tiny. πŸ˜‰ The big ones are in the range of > 1 trillion parameters (not all activated at once) and up to 1m context window, and it shows.

                lritter@mastodon.gamedev.placeL This user is from outside of this forum
                lritter@mastodon.gamedev.placeL This user is from outside of this forum
                lritter@mastodon.gamedev.place
                wrote last edited by
                #23

                @neo you should know it's not the size that matters. 😏

                neo@soc.psynet.meN 1 Reply Last reply
                0
                • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                  @neo you should know it's not the size that matters. 😏

                  neo@soc.psynet.meN This user is from outside of this forum
                  neo@soc.psynet.meN This user is from outside of this forum
                  neo@soc.psynet.me
                  wrote last edited by
                  #24

                  @lritter Yeah yeah, just use your VRAM smarter. That's what Nvidia said when they released another 8 GB card. 😜

                  1 Reply Last reply
                  0
                  • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                    @allo

                    - i'm aware. this is all new. new llama, new files. i use the exact temperature, top k etc. config as suggested by the vendor. examples in this thread were all 26b based. 34b is too slow for tools.

                    - i would rather have my fingernails pulled out than put this in a IDE and compromise integrity & copyright. this is strictly entertainment.

                    - i doubt the speed is the same. i'm going to try a qwen 3.5 35B A3B, let's see if it can understand my work. i doubt it.

                    - agree on e4b.

                    lritter@mastodon.gamedev.placeL This user is from outside of this forum
                    lritter@mastodon.gamedev.placeL This user is from outside of this forum
                    lritter@mastodon.gamedev.place
                    wrote last edited by
                    #25

                    @allo i set up the qwen model i mentioned with the settings recommended for coding work. it is slower but not impossibly slow. 12t/s

                    i had it examine the nudl directory, read the sx docs, etc.

                    tutorial is also full-blown wrong.

                    (fun fact: when i scolded gemma for the bad quality of it earlier, it wrote it again, and this time, more things were correct.)

                    but this is a joke. i expect one shot perfection.

                    lritter@mastodon.gamedev.placeL 1 Reply Last reply
                    0
                    • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                      @allo i set up the qwen model i mentioned with the settings recommended for coding work. it is slower but not impossibly slow. 12t/s

                      i had it examine the nudl directory, read the sx docs, etc.

                      tutorial is also full-blown wrong.

                      (fun fact: when i scolded gemma for the bad quality of it earlier, it wrote it again, and this time, more things were correct.)

                      but this is a joke. i expect one shot perfection.

                      lritter@mastodon.gamedev.placeL This user is from outside of this forum
                      lritter@mastodon.gamedev.placeL This user is from outside of this forum
                      lritter@mastodon.gamedev.place
                      wrote last edited by
                      #26

                      @allo i also told qwen it did a bad job and now it wants to know what it did wrong? if i could only explain, it would understand.

                      goes to show: these models can only help you when you're not doing anything interesting.

                      lritter@mastodon.gamedev.placeL 1 Reply Last reply
                      0
                      • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                        @allo i also told qwen it did a bad job and now it wants to know what it did wrong? if i could only explain, it would understand.

                        goes to show: these models can only help you when you're not doing anything interesting.

                        lritter@mastodon.gamedev.placeL This user is from outside of this forum
                        lritter@mastodon.gamedev.placeL This user is from outside of this forum
                        lritter@mastodon.gamedev.place
                        wrote last edited by
                        #27

                        @allo qwen 3.5

                        Link Preview Image
                        allo@chaos.socialA 1 Reply Last reply
                        0
                        • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                          @allo qwen 3.5

                          Link Preview Image
                          allo@chaos.socialA This user is from outside of this forum
                          allo@chaos.socialA This user is from outside of this forum
                          allo@chaos.social
                          wrote last edited by
                          #28

                          @lritter I am not sure what frontend you are using there. I think one of the advantages of kilocode (or roo) is that it provides good tools for dissecting the source and thought out system prompts. A one-shot in the web interface doesn't do the same than a command in kilocode.

                          Yeah, 27B/34B dense are too slow for me, too, but the MoE work for me. I need to reevaluate Gemma 4 after the latest fixes, it may now perform better.

                          And I guess having AI work with a novel programming language is hard.

                          allo@chaos.socialA lritter@mastodon.gamedev.placeL 2 Replies Last reply
                          0
                          • allo@chaos.socialA allo@chaos.social

                            @lritter I am not sure what frontend you are using there. I think one of the advantages of kilocode (or roo) is that it provides good tools for dissecting the source and thought out system prompts. A one-shot in the web interface doesn't do the same than a command in kilocode.

                            Yeah, 27B/34B dense are too slow for me, too, but the MoE work for me. I need to reevaluate Gemma 4 after the latest fixes, it may now perform better.

                            And I guess having AI work with a novel programming language is hard.

                            allo@chaos.socialA This user is from outside of this forum
                            allo@chaos.socialA This user is from outside of this forum
                            allo@chaos.social
                            wrote last edited by
                            #29

                            @lritter For the rest: I know you are not too fond of LLMs or AI, and I guess we don't need to discuss this in detail. But for me, they do well within the range that one can expect of them, even for one-shotting medium sized scripts.

                            My take is that these things won't go away, so one should take what's useful and leave the rest. And don't fall for the hyped things like Openclaw.

                            1 Reply Last reply
                            0
                            • allo@chaos.socialA allo@chaos.social

                              @lritter I am not sure what frontend you are using there. I think one of the advantages of kilocode (or roo) is that it provides good tools for dissecting the source and thought out system prompts. A one-shot in the web interface doesn't do the same than a command in kilocode.

                              Yeah, 27B/34B dense are too slow for me, too, but the MoE work for me. I need to reevaluate Gemma 4 after the latest fixes, it may now perform better.

                              And I guess having AI work with a novel programming language is hard.

                              lritter@mastodon.gamedev.placeL This user is from outside of this forum
                              lritter@mastodon.gamedev.placeL This user is from outside of this forum
                              lritter@mastodon.gamedev.place
                              wrote last edited by
                              #30

                              @allo it's because they are not really good at doing mental transfer work themselves. they are not intelligent in any meaningful way. they just know what fits best. for many tasks, that is exactly what you want. but when it comes to what *feels* best... they're just like high functioning autists doing a hell of a masking job.

                              allo@chaos.socialA 1 Reply Last reply
                              0
                              • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                                @allo it's because they are not really good at doing mental transfer work themselves. they are not intelligent in any meaningful way. they just know what fits best. for many tasks, that is exactly what you want. but when it comes to what *feels* best... they're just like high functioning autists doing a hell of a masking job.

                                allo@chaos.socialA This user is from outside of this forum
                                allo@chaos.socialA This user is from outside of this forum
                                allo@chaos.social
                                wrote last edited by
                                #31

                                @lritter I've once read they are a multiplier. Making the dumb people dumber and the clever people more clever.

                                Like you can outsource things and blindly believe the output and fail hard, or you know exactly how to use them and speed up your work a lot.

                                Another interesting aspect: First people reported burnout from using LLMs, because they are much more productive, and that led to doing much more in a day than they would when doing things themselves, while the work is still mentally straining.

                                lritter@mastodon.gamedev.placeL allo@chaos.socialA 2 Replies Last reply
                                0
                                • allo@chaos.socialA allo@chaos.social

                                  @lritter I've once read they are a multiplier. Making the dumb people dumber and the clever people more clever.

                                  Like you can outsource things and blindly believe the output and fail hard, or you know exactly how to use them and speed up your work a lot.

                                  Another interesting aspect: First people reported burnout from using LLMs, because they are much more productive, and that led to doing much more in a day than they would when doing things themselves, while the work is still mentally straining.

                                  lritter@mastodon.gamedev.placeL This user is from outside of this forum
                                  lritter@mastodon.gamedev.placeL This user is from outside of this forum
                                  lritter@mastodon.gamedev.place
                                  wrote last edited by
                                  #32

                                  @allo i know of that aspect.

                                  > Making the dumb people dumber and the clever people more clever.

                                  yes but which of the two am i!

                                  allo@chaos.socialA 1 Reply Last reply
                                  0
                                  • allo@chaos.socialA allo@chaos.social

                                    @lritter I've once read they are a multiplier. Making the dumb people dumber and the clever people more clever.

                                    Like you can outsource things and blindly believe the output and fail hard, or you know exactly how to use them and speed up your work a lot.

                                    Another interesting aspect: First people reported burnout from using LLMs, because they are much more productive, and that led to doing much more in a day than they would when doing things themselves, while the work is still mentally straining.

                                    allo@chaos.socialA This user is from outside of this forum
                                    allo@chaos.socialA This user is from outside of this forum
                                    allo@chaos.social
                                    wrote last edited by
                                    #33

                                    @lritter
                                    The AI assisted 10x engineer, I guess.

                                    lritter@mastodon.gamedev.placeL 1 Reply Last reply
                                    0
                                    • allo@chaos.socialA allo@chaos.social

                                      @lritter
                                      The AI assisted 10x engineer, I guess.

                                      lritter@mastodon.gamedev.placeL This user is from outside of this forum
                                      lritter@mastodon.gamedev.placeL This user is from outside of this forum
                                      lritter@mastodon.gamedev.place
                                      wrote last edited by
                                      #34

                                      @allo all this sounds more like mythbuilding to me than truth.

                                      allo@chaos.socialA 1 Reply Last reply
                                      0
                                      • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                                        @allo i know of that aspect.

                                        > Making the dumb people dumber and the clever people more clever.

                                        yes but which of the two am i!

                                        allo@chaos.socialA This user is from outside of this forum
                                        allo@chaos.socialA This user is from outside of this forum
                                        allo@chaos.social
                                        wrote last edited by
                                        #35

                                        @lritter
                                        Be the zero, its not affected by multipliers! πŸ™‚

                                        1 Reply Last reply
                                        0
                                        • lritter@mastodon.gamedev.placeL lritter@mastodon.gamedev.place

                                          @allo all this sounds more like mythbuilding to me than truth.

                                          allo@chaos.socialA This user is from outside of this forum
                                          allo@chaos.socialA This user is from outside of this forum
                                          allo@chaos.social
                                          wrote last edited by
                                          #36

                                          @lritter
                                          No idea, butI think it is plausibel that doing more even with a tool is more stressful than doing less by hand. I think it was particularly about coding work.

                                          lritter@mastodon.gamedev.placeL 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups