My lab mate, Jackson Dean, has been doing some really fun research into image generation.

kevinrns@mstdn.social

Remember they are scraping alt text on Mastodon images, so poison your descriptions in weird ways humans see but AI can't, because it's stupid.

ngaylinn@tech.lgbt

@kevinrns Just don't poison alt text for humans! It serves an important purpose.

lilacperegrine@clockwork.monster

@ngaylinn Are there images i can look at? there were some in the article but they appeared to be referencing previous work

ngaylinn@tech.lgbt

@lilacperegrine Alas, this is still very early work in progress! I'll share this once Jackson does! If you look him up on Google scholar, though, you can see some of his other image generation projects, like this one: https://direct.mit.edu/isal/proceedings/isal2024/36/86/123507

fishface@ioc.exchange

@ngaylinn the approach of "generate an image, see what the CV thinks it is, and iterate" is, at that high level, just like how diffusion models work.

kevinrns@mstdn.social

@ngaylinn

But always poison.

ngaylinn@tech.lgbt

@FishFace That's true! What's different here, though, is that the generation procedure isn't attempting to sample from the distribution of "all natural images" learned from its training data. Instead, a CPPN is used to generate a "random" image with spatially coherent structure from scratch.

This is nice, because it means the images are novel, not remixes of stolen data. Also, it allows us to explore the limitations of computer vision, since we're straying far from the distribution of images the model was trained on.

ngaylinn@tech.lgbt

@FishFace Also, the model isn't guided towards any particular prompt. The prompts are discovered through random search, then used to refine those starting points.

fishface@ioc.exchange

@ngaylinn doesn't the CV model's training dataset have the same issues though? Whether the model has learnt "denoising" or "image to text", it still has to contain a hell of a lot of information about images, right?

ngaylinn@tech.lgbt

@FishFace Yes, and it is a subtle difference. I wish I could share the images, since I think that would make it more apparent.

In a diffusion model, you iteratively tweak an image of some static until the result is statistically similar to the images used in training.

In this experiment, you generate "random" images, but with the unique bias of CPPN networks, so they look more like "organic shapes" than static. You treat them like Rorschach tests, asking the CV model what it sees. Then, for each different answer, you iterate the image so the CV model is even more confident. Except, you aren't tweaking pixels to approach the target distribution, you're just giving hot / cold feedback to an evolutionary search.

The resulting images are far outside the distribution of the original dataset and look like abstract art, but still stimulate the CV model to be very confident about what it's seeing.

ngaylinn@tech.lgbt

@FishFace Another way of looking at this is a diffusion model is trying to make an image that resembles known images for a prompt. That's its loss function: minimize deviation from the target image distribution.

In this experiment, we're just asking "what would you call this thing?" without concern for how much it resembles other images with the same description. The fitness function is to get a confident response. You're evolving a Rorschach test where the model always sees a bird, even though it looks nothing like a picture of a bird.

fishface@ioc.exchange

@ngaylinn ah thanks for the explanation

CIRCLE WITH A DOT

My lab mate, Jackson Dean, has been doing some really fun research into image generation.