Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. My lab mate, Jackson Dean, has been doing some really fun research into image generation.

My lab mate, Jackson Dean, has been doing some really fun research into image generation.

Scheduled Pinned Locked Moved Uncategorized
sciencegenerativeart
15 Posts 4 Posters 46 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

    LLM image generators can make a picture of anything you ask for. The results often look pretty good at first glance. They're generic and the details are usually off, but folks overlook that easily.

    This hides something important: these models can't see like we do. The main limitation is how they're trained. We show them millions of pictures, paired with text descriptions.

    The problem is, humans don't describe images literally. We might say "a picture of a dog playing frisbee" but we didn't mention the setting, the composition, or the squirrel in the background.

    Most of what's there visually is unsaid. The model sees those pixels, but they're just "stuff that goes along" with the text. Dogs play in parks, so the AI learns that dogs have green backgrounds.

    This is why it's so hard to control an image generator. It isn't intentionally placing all those objects and choosing their attributes, it's just extra fluff that seems to "go with" what you asked for.

    (2/3)
    #science #ai #generativeart

    kevinrns@mstdn.socialK This user is from outside of this forum
    kevinrns@mstdn.socialK This user is from outside of this forum
    kevinrns@mstdn.social
    wrote last edited by
    #4

    @ngaylinn

    Remember they are scraping alt text on Mastodon images, so poison your descriptions in weird ways humans see but AI can't, because it's stupid.

    Link Preview Image
    ngaylinn@tech.lgbtN 1 Reply Last reply
    0
    • kevinrns@mstdn.socialK kevinrns@mstdn.social

      @ngaylinn

      Remember they are scraping alt text on Mastodon images, so poison your descriptions in weird ways humans see but AI can't, because it's stupid.

      Link Preview Image
      ngaylinn@tech.lgbtN This user is from outside of this forum
      ngaylinn@tech.lgbtN This user is from outside of this forum
      ngaylinn@tech.lgbt
      wrote last edited by
      #5

      @kevinrns Just don't poison alt text for humans! It serves an important purpose.

      kevinrns@mstdn.socialK 1 Reply Last reply
      0
      • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

        Jackson's research is great for exploring this, because we get to see abstract synthetic images that very strongly stimulate the AI to see... whatever it "wants" to see.

        Often, the results are recognizable. The image with oddly shaped pink blobs does sorta resemble flamingos. But there are also many examples where the AI fixates on some small detail of color or texture, and becomes convinced it's seeing something totally implausible.

        This is relates to "adversarial examples", another great way to see this.

        With real images, it seems like the AI "sees" like we do. But as soon as we venture beyond its training data, the illusion is broken, and it feels a bit like a parlor trick. Clearly AI doesn't see like we do.

        This is a great practice for AI generally: seek out the edge cases where the model fails. This breaks the spell of "general intelligence" and gives us a clearer idea of what's actually happening inside the black box.

        (3/3)
        #science #ai #generativeart

        lilacperegrine@clockwork.monsterL This user is from outside of this forum
        lilacperegrine@clockwork.monsterL This user is from outside of this forum
        lilacperegrine@clockwork.monster
        wrote last edited by
        #6

        @ngaylinn Are there images i can look at? there were some in the article but they appeared to be referencing previous work

        ngaylinn@tech.lgbtN 1 Reply Last reply
        0
        • lilacperegrine@clockwork.monsterL lilacperegrine@clockwork.monster

          @ngaylinn Are there images i can look at? there were some in the article but they appeared to be referencing previous work

          ngaylinn@tech.lgbtN This user is from outside of this forum
          ngaylinn@tech.lgbtN This user is from outside of this forum
          ngaylinn@tech.lgbt
          wrote last edited by
          #7

          @lilacperegrine Alas, this is still very early work in progress! I'll share this once Jackson does! If you look him up on Google scholar, though, you can see some of his other image generation projects, like this one: https://direct.mit.edu/isal/proceedings/isal2024/36/86/123507

          1 Reply Last reply
          0
          • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

            My lab mate, Jackson Dean, has been doing some really fun research into image generation.

            Unlike the common AI-generated images that mash together stolen artwork to make something sorta photo realistic, he's producing abstract art that's entirely novel. The general idea (inspired by innovation engines) is to generate an image from scratch, then ask a vision / language model what it sees. He generates lots of images with different descriptions, and refines those images to more closely resemble their descriptions.

            Not only is he making some really cool generative art, but he's learning something about what "novelty" is and how to produce it in a computer.

            Beyond that, though, I'm fascinated because it gives a window into the strange way computers "see" images.

            (1/3)
            #science #ai #generativeart

            fishface@ioc.exchangeF This user is from outside of this forum
            fishface@ioc.exchangeF This user is from outside of this forum
            fishface@ioc.exchange
            wrote last edited by
            #8

            @ngaylinn the approach of "generate an image, see what the CV thinks it is, and iterate" is, at that high level, just like how diffusion models work.

            ngaylinn@tech.lgbtN 1 Reply Last reply
            0
            • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

              @kevinrns Just don't poison alt text for humans! It serves an important purpose.

              kevinrns@mstdn.socialK This user is from outside of this forum
              kevinrns@mstdn.socialK This user is from outside of this forum
              kevinrns@mstdn.social
              wrote last edited by
              #9

              @ngaylinn

              But always poison.

              1 Reply Last reply
              0
              • fishface@ioc.exchangeF fishface@ioc.exchange

                @ngaylinn the approach of "generate an image, see what the CV thinks it is, and iterate" is, at that high level, just like how diffusion models work.

                ngaylinn@tech.lgbtN This user is from outside of this forum
                ngaylinn@tech.lgbtN This user is from outside of this forum
                ngaylinn@tech.lgbt
                wrote last edited by
                #10

                @FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.

                This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.

                ngaylinn@tech.lgbtN fishface@ioc.exchangeF 2 Replies Last reply
                0
                • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

                  @FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.

                  This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.

                  ngaylinn@tech.lgbtN This user is from outside of this forum
                  ngaylinn@tech.lgbtN This user is from outside of this forum
                  ngaylinn@tech.lgbt
                  wrote last edited by
                  #11

                  @FishFace Also, the model isn't guided towards any particular prompt. The prompts are discovered through random search, then used to refine those starting points.

                  1 Reply Last reply
                  0
                  • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

                    @FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.

                    This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.

                    fishface@ioc.exchangeF This user is from outside of this forum
                    fishface@ioc.exchangeF This user is from outside of this forum
                    fishface@ioc.exchange
                    wrote last edited by
                    #12

                    @ngaylinn doesn't the CV model's training dataset have the same issues though? Whether the model has learnt "denoising" or "image to text", it still has to contain a hell of a lot of information about images, right?

                    ngaylinn@tech.lgbtN 1 Reply Last reply
                    0
                    • fishface@ioc.exchangeF fishface@ioc.exchange

                      @ngaylinn doesn't the CV model's training dataset have the same issues though? Whether the model has learnt "denoising" or "image to text", it still has to contain a hell of a lot of information about images, right?

                      ngaylinn@tech.lgbtN This user is from outside of this forum
                      ngaylinn@tech.lgbtN This user is from outside of this forum
                      ngaylinn@tech.lgbt
                      wrote last edited by
                      #13

                      @FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent. ๐Ÿ™‚

                      In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.

                      In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.

                      The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.

                      ngaylinn@tech.lgbtN fishface@ioc.exchangeF 2 Replies Last reply
                      0
                      • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

                        @FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent. ๐Ÿ™‚

                        In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.

                        In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.

                        The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.

                        ngaylinn@tech.lgbtN This user is from outside of this forum
                        ngaylinn@tech.lgbtN This user is from outside of this forum
                        ngaylinn@tech.lgbt
                        wrote last edited by
                        #14

                        @FishFace Another way of looking at this is a diffusion model is trying to make an image that resembles known images for a prompt. That's its loss function: minimize deviation from the target image distribution.

                        In this experiment, we're just asking "what would you call this thing?" without concern for how much it resembles other images with the same description. The fitness function is to get a confident response. You're evolving a Rorschach test where the model always sees a bird, even though it looks nothing like a picture of a bird.

                        1 Reply Last reply
                        0
                        • ngaylinn@tech.lgbtN ngaylinn@tech.lgbt

                          @FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent. ๐Ÿ™‚

                          In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.

                          In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.

                          The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.

                          fishface@ioc.exchangeF This user is from outside of this forum
                          fishface@ioc.exchangeF This user is from outside of this forum
                          fishface@ioc.exchange
                          wrote last edited by
                          #15

                          @ngaylinn ah thanks for the explanation

                          1 Reply Last reply
                          1
                          0
                          • R relay@relay.infosec.exchange shared this topic
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups