My lab mate, Jackson Dean, has been doing some really fun research into image generation.
-
Jackson's research is great for exploring this, because we get to see abstract synthetic images that very strongly stimulate the AI to see... whatever it "wants" to see.
Often, the results are recognizable. The image with oddly shaped pink blobs does sorta resemble flamingos. But there are also many examples where the AI fixates on some small detail of color or texture, and becomes convinced it's seeing something totally implausible.
This is relates to "adversarial examples", another great way to see this.
With real images, it seems like the AI "sees" like we do. But as soon as we venture beyond its training data, the illusion is broken, and it feels a bit like a parlor trick. Clearly AI doesn't see like we do.
This is a great practice for AI generally: seek out the edge cases where the model fails. This breaks the spell of "general intelligence" and gives us a clearer idea of what's actually happening inside the black box.
(3/3)
#science #ai #generativeart@ngaylinn Are there images i can look at? there were some in the article but they appeared to be referencing previous work
-
@ngaylinn Are there images i can look at? there were some in the article but they appeared to be referencing previous work
@lilacperegrine Alas, this is still very early work in progress! I'll share this once Jackson does! If you look him up on Google scholar, though, you can see some of his other image generation projects, like this one: https://direct.mit.edu/isal/proceedings/isal2024/36/86/123507
-
My lab mate, Jackson Dean, has been doing some really fun research into image generation.
Unlike the common AI-generated images that mash together stolen artwork to make something sorta photo realistic, he's producing abstract art that's entirely novel. The general idea (inspired by innovation engines) is to generate an image from scratch, then ask a vision / language model what it sees. He generates lots of images with different descriptions, and refines those images to more closely resemble their descriptions.
Not only is he making some really cool generative art, but he's learning something about what "novelty" is and how to produce it in a computer.
Beyond that, though, I'm fascinated because it gives a window into the strange way computers "see" images.
(1/3)
#science #ai #generativeart@ngaylinn the approach of "generate an image, see what the CV thinks it is, and iterate" is, at that high level, just like how diffusion models work.
-
@kevinrns Just don't poison alt text for humans! It serves an important purpose.
But always poison.
-
@ngaylinn the approach of "generate an image, see what the CV thinks it is, and iterate" is, at that high level, just like how diffusion models work.
@FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.
This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.
-
@FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.
This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.
@FishFace Also, the model isn't guided towards any particular prompt. The prompts are discovered through random search, then used to refine those starting points.
-
@FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.
This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.
@ngaylinn doesn't the CV model's training dataset have the same issues though? Whether the model has learnt "denoising" or "image to text", it still has to contain a hell of a lot of information about images, right?
-
@ngaylinn doesn't the CV model's training dataset have the same issues though? Whether the model has learnt "denoising" or "image to text", it still has to contain a hell of a lot of information about images, right?
@FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent.

In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.
In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.
The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.
-
@FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent.

In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.
In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.
The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.
@FishFace Another way of looking at this is a diffusion model is trying to make an image that resembles known images for a prompt. That's its loss function: minimize deviation from the target image distribution.
In this experiment, we're just asking "what would you call this thing?" without concern for how much it resembles other images with the same description. The fitness function is to get a confident response. You're evolving a Rorschach test where the model always sees a bird, even though it looks nothing like a picture of a bird.
-
@FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent.

In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.
In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.
The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.
@ngaylinn ah thanks for the explanation
-
R relay@relay.infosec.exchange shared this topic